starble-dev
/

Mistral-Nemo-12B-Instruct-2407-GGUF

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

starble-dev commited on Jul 20

Commit

7310274

•

1 Parent(s): 5332a93

Update README.md

Files changed (1) hide show

README.md +24 -1

README.md CHANGED Viewed

@@ -13,7 +13,30 @@ tags:
 This model is the original Mistral-Nemo-Instruct-2407 converted to GGUF and quantized using **llama.cpp**.
 **How to Use:**
-As of July 19, 2024, llama.cpp does not support Mistral-Nemo-Instruct-2407. However, you can build from source using iamlemec's branch **mistral-nemo** at [llama.cpp GitHub repository](https://github.com/iamlemec/llama.cpp/tree/mistral-nemo).
 **License:**
 Apache 2.0

 This model is the original Mistral-Nemo-Instruct-2407 converted to GGUF and quantized using **llama.cpp**.
 **How to Use:**
+As of July 19, 2024, llama.cpp does not support Mistral-Nemo-Instruct-2407. However, you can use it by building from source using iamlemec's branch **mistral-nemo** at [llama.cpp GitHub repository](https://github.com/iamlemec/llama.cpp/tree/mistral-nemo).
+```
+git clone -b mistral-nemo https://github.com/iamlemec/llama.cpp.git
+cd llama.cpp
+cmake -B build
+cmake --build build --config Release
+```
+Recommended to use `cmake -B build -DGGML_CUDA=ON` if you're using a CUDA compatible GPU.
+If the build takes too long use `cmake -B build --config Release -j 4`, which uses 4 threads to build. Adjust the number to the amount of physical cores on your CPU.
+Use:
+```
+llama-server.exe -m .\models\Mistral-Nemo-12B-Instruct-2407-Q8_0.gguf -b 512 -ub 512 -c 4096 -ngl 100
+```
+Set `-b` to batch size
+Set `-ub` to physical batch size
+Set `-c` to context size
+Set `-ngl` to amount of layers to load onto GPU
+Change the path to where the model is actually stored.
+If you need more clarification on parameters check out the [llama.cpp Server Docs](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md)
 **License:**
 Apache 2.0