Qwen
/

Qwen2.5-72B-Instruct-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

feihu.hf commited on Sep 18, 2024

Commit

aaab942

·

1 Parent(s): 515fcfd

update README & LICENSE

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -9,6 +9,7 @@ base_model: Qwen/Qwen2.5-72B-Instruct
 tags:
 - chat
 ---
 # Qwen2.5-72B-Instruct-GGUF
 ## Introduction
@@ -29,6 +30,7 @@ Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we rele
 - Number of Layers: 80
 - Number of Attention Heads (GQA): 64 for Q and 8 for KV
 - Context Length: Full 32,768 tokens and generation 8192 tokens
 - Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).

 tags:
 - chat
 ---
 # Qwen2.5-72B-Instruct-GGUF
 ## Introduction
 - Number of Layers: 80
 - Number of Attention Heads (GQA): 64 for Q and 8 for KV
 - Context Length: Full 32,768 tokens and generation 8192 tokens
+  - Note: Currently, only vLLM supports YARN for length extrapolating. If you want to process sequences up to 131,072 tokens, please refer to non-GGUF models.
 - Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).