bartowski
/

Qwen2-72B-Instruct-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

bartowski commited on Jun 7

Commit

13c6e83

•

1 Parent(s): 4e83ff2

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -10,9 +10,11 @@ tags:
 quantized_by: bartowski
 ---
 ## Llamacpp imatrix Quantizations of Qwen2-72B-Instruct
-Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/master">master</a> for quantization.
 Original model: https://huggingface.co/Qwen/Qwen2-72B-Instruct

 quantized_by: bartowski
 ---
+# <b>Heads up:</b> currently CUDA offloading is broken unless you enable flash attention
 ## Llamacpp imatrix Quantizations of Qwen2-72B-Instruct
+Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> commit <a href="https://github.com/ggerganov/llama.cpp/commit/ee459f40f65810a810151b24eba5b8bd174ceffe">ee459f40f65810a810151b24eba5b8bd174ceffe</a> for quantization.
 Original model: https://huggingface.co/Qwen/Qwen2-72B-Instruct