Update README.md
Browse files
README.md
CHANGED
@@ -10,9 +10,11 @@ tags:
|
|
10 |
quantized_by: bartowski
|
11 |
---
|
12 |
|
|
|
|
|
13 |
## Llamacpp imatrix Quantizations of Qwen2-72B-Instruct
|
14 |
|
15 |
-
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a>
|
16 |
|
17 |
Original model: https://huggingface.co/Qwen/Qwen2-72B-Instruct
|
18 |
|
|
|
10 |
quantized_by: bartowski
|
11 |
---
|
12 |
|
13 |
+
# <b>Heads up:</b> currently CUDA offloading is broken unless you enable flash attention
|
14 |
+
|
15 |
## Llamacpp imatrix Quantizations of Qwen2-72B-Instruct
|
16 |
|
17 |
+
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> commit <a href="https://github.com/ggerganov/llama.cpp/commit/ee459f40f65810a810151b24eba5b8bd174ceffe">ee459f40f65810a810151b24eba5b8bd174ceffe</a> for quantization.
|
18 |
|
19 |
Original model: https://huggingface.co/Qwen/Qwen2-72B-Instruct
|
20 |
|