feihu.hf
commited on
Commit
•
aaab942
1
Parent(s):
515fcfd
update README & LICENSE
Browse files
README.md
CHANGED
@@ -9,6 +9,7 @@ base_model: Qwen/Qwen2.5-72B-Instruct
|
|
9 |
tags:
|
10 |
- chat
|
11 |
---
|
|
|
12 |
# Qwen2.5-72B-Instruct-GGUF
|
13 |
|
14 |
## Introduction
|
@@ -29,6 +30,7 @@ Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we rele
|
|
29 |
- Number of Layers: 80
|
30 |
- Number of Attention Heads (GQA): 64 for Q and 8 for KV
|
31 |
- Context Length: Full 32,768 tokens and generation 8192 tokens
|
|
|
32 |
- Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
|
33 |
|
34 |
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
|
|
9 |
tags:
|
10 |
- chat
|
11 |
---
|
12 |
+
|
13 |
# Qwen2.5-72B-Instruct-GGUF
|
14 |
|
15 |
## Introduction
|
|
|
30 |
- Number of Layers: 80
|
31 |
- Number of Attention Heads (GQA): 64 for Q and 8 for KV
|
32 |
- Context Length: Full 32,768 tokens and generation 8192 tokens
|
33 |
+
- Note: Currently, only vLLM supports YARN for length extrapolating. If you want to process sequences up to 131,072 tokens, please refer to non-GGUF models.
|
34 |
- Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
|
35 |
|
36 |
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|