Update README.md
Browse files
README.md
CHANGED
@@ -12,9 +12,9 @@ This is a quantized gguf version of gemma-2-2b-it using an importance matrix (iM
|
|
12 |
also, using latest llama.cpp and a new technique called speculative decoding, we can speed up larger models.
|
13 |
|
14 |
|
15 |
-
windows command sample
|
16 |
```
|
17 |
-
.\llama-server.exe ^
|
18 |
-m .\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
|
19 |
-md .\gemma-2-2b-it-IQ3_XXS.gguf ^
|
20 |
-ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
|
@@ -24,15 +24,17 @@ windows command sample
|
|
24 |
私のテストプロンプトの実行時間: 1576.67秒
|
25 |
My test prompt execution time: 1576.67 seconds
|
26 |
|
27 |
-
windows command sample
|
28 |
```
|
29 |
.\llama-server.exe ^
|
30 |
-
-m
|
31 |
-
-
|
32 |
-
-ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
|
33 |
-
--draft-max 16 --draft-min 5
|
34 |
```
|
35 |
|
|
|
|
|
|
|
|
|
36 |
CUDAのサンプルについては[dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K)をみてください
|
37 |
See [dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K) for CUDA examles.
|
38 |
|
|
|
12 |
also, using latest llama.cpp and a new technique called speculative decoding, we can speed up larger models.
|
13 |
|
14 |
|
15 |
+
## windows speculative decoding command sample(ROCm compiled version)
|
16 |
```
|
17 |
+
set HSA_OVERRIDE_GFX_VERSION=gfx1103 && .\llama-server.exe ^
|
18 |
-m .\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
|
19 |
-md .\gemma-2-2b-it-IQ3_XXS.gguf ^
|
20 |
-ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
|
|
|
24 |
私のテストプロンプトの実行時間: 1576.67秒
|
25 |
My test prompt execution time: 1576.67 seconds
|
26 |
|
27 |
+
## windows normal command sample
|
28 |
```
|
29 |
.\llama-server.exe ^
|
30 |
+
-m ..\gemma\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
|
31 |
+
-e --temp 0 -c 4096
|
|
|
|
|
32 |
```
|
33 |
|
34 |
+
私のテストプロンプトの実行時間: 4591.58秒
|
35 |
+
My test prompt execution time: 4591.58 seconds
|
36 |
+
|
37 |
+
|
38 |
CUDAのサンプルについては[dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K)をみてください
|
39 |
See [dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K) for CUDA examles.
|
40 |
|