dahara1 commited on
Commit
cd868e4
1 Parent(s): db44f0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -12,9 +12,9 @@ This is a quantized gguf version of gemma-2-2b-it using an importance matrix (iM
12
  also, using latest llama.cpp and a new technique called speculative decoding, we can speed up larger models.
13
 
14
 
15
- windows command sample
16
  ```
17
- .\llama-server.exe ^
18
  -m .\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
19
  -md .\gemma-2-2b-it-IQ3_XXS.gguf ^
20
  -ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
@@ -24,15 +24,17 @@ windows command sample
24
  私のテストプロンプトの実行時間: 1576.67秒
25
  My test prompt execution time: 1576.67 seconds
26
 
27
- windows command sample
28
  ```
29
  .\llama-server.exe ^
30
- -m .\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
31
- -md .\gemma-2-2b-it-IQ3_XXS.gguf ^
32
- -ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
33
- --draft-max 16 --draft-min 5
34
  ```
35
 
 
 
 
 
36
  CUDAのサンプルについては[dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K)をみてください
37
  See [dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K) for CUDA examles.
38
 
 
12
  also, using latest llama.cpp and a new technique called speculative decoding, we can speed up larger models.
13
 
14
 
15
+ ## windows speculative decoding command sample(ROCm compiled version)
16
  ```
17
+ set HSA_OVERRIDE_GFX_VERSION=gfx1103 && .\llama-server.exe ^
18
  -m .\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
19
  -md .\gemma-2-2b-it-IQ3_XXS.gguf ^
20
  -ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
 
24
  私のテストプロンプトの実行時間: 1576.67秒
25
  My test prompt execution time: 1576.67 seconds
26
 
27
+ ## windows normal command sample
28
  ```
29
  .\llama-server.exe ^
30
+ -m ..\gemma\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
31
+ -e --temp 0 -c 4096
 
 
32
  ```
33
 
34
+ 私のテストプロンプトの実行時間: 4591.58秒
35
+ My test prompt execution time: 4591.58 seconds
36
+
37
+
38
  CUDAのサンプルについては[dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K)をみてください
39
  See [dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K) for CUDA examles.
40