dahara1
/

gemma-2-2b-it-gguf-japanese-imatrix

Inference Endpoints

Model card Files Files and versions Community

dahara1 commited on Nov 29, 2024

Commit

cd868e4

·

verified ·

1 Parent(s): db44f0e

Update README.md

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -12,9 +12,9 @@ This is a quantized gguf version of gemma-2-2b-it using an importance matrix (iM
 also, using latest llama.cpp and a new technique called speculative decoding, we can speed up larger models.
-windows command sample
 ```
-.\llama-server.exe ^
     -m  .\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
     -md .\gemma-2-2b-it-IQ3_XXS.gguf ^
     -ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
@@ -24,15 +24,17 @@ windows command sample
 私のテストプロンプトの実行時間: 1576.67秒
 My test prompt execution time: 1576.67 seconds
-windows command sample
 ```
 .\llama-server.exe ^
-    -m  .\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
-    -md .\gemma-2-2b-it-IQ3_XXS.gguf ^
-    -ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
-    --draft-max 16 --draft-min 5
 ```
 CUDAのサンプルについては[dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K)をみてください
 See [dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K) for CUDA examles.

 also, using latest llama.cpp and a new technique called speculative decoding, we can speed up larger models.
+## windows speculative decoding command sample(ROCm compiled version)
 ```
+set HSA_OVERRIDE_GFX_VERSION=gfx1103 && .\llama-server.exe ^
     -m  .\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
     -md .\gemma-2-2b-it-IQ3_XXS.gguf ^
     -ngl 10 -ngld 10 -e --temp 0 -c 4096 ^
 私のテストプロンプトの実行時間: 1576.67秒
 My test prompt execution time: 1576.67 seconds
+## windows normal command sample
 ```
 .\llama-server.exe ^
+    -m  ..\gemma\gemma-2-27B-it-Q4_K_M-fp16.gguf ^
+    -e --temp 0 -c 4096
 ```
+私のテストプロンプトの実行時間: 4591.58秒
+My test prompt execution time: 4591.58 seconds
 CUDAのサンプルについては[dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K)をみてください
 See [dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K) for CUDA examles.