starble-dev commited on
Commit
7310274
1 Parent(s): 5332a93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -1
README.md CHANGED
@@ -13,7 +13,30 @@ tags:
13
  This model is the original Mistral-Nemo-Instruct-2407 converted to GGUF and quantized using **llama.cpp**.
14
 
15
  **How to Use:**
16
- As of July 19, 2024, llama.cpp does not support Mistral-Nemo-Instruct-2407. However, you can build from source using iamlemec's branch **mistral-nemo** at [llama.cpp GitHub repository](https://github.com/iamlemec/llama.cpp/tree/mistral-nemo).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  **License:**
19
  Apache 2.0
 
13
  This model is the original Mistral-Nemo-Instruct-2407 converted to GGUF and quantized using **llama.cpp**.
14
 
15
  **How to Use:**
16
+ As of July 19, 2024, llama.cpp does not support Mistral-Nemo-Instruct-2407. However, you can use it by building from source using iamlemec's branch **mistral-nemo** at [llama.cpp GitHub repository](https://github.com/iamlemec/llama.cpp/tree/mistral-nemo).
17
+
18
+ ```
19
+ git clone -b mistral-nemo https://github.com/iamlemec/llama.cpp.git
20
+ cd llama.cpp
21
+ cmake -B build
22
+ cmake --build build --config Release
23
+ ```
24
+
25
+ Recommended to use `cmake -B build -DGGML_CUDA=ON` if you're using a CUDA compatible GPU.
26
+
27
+ If the build takes too long use `cmake -B build --config Release -j 4`, which uses 4 threads to build. Adjust the number to the amount of physical cores on your CPU.
28
+
29
+ Use:
30
+ ```
31
+ llama-server.exe -m .\models\Mistral-Nemo-12B-Instruct-2407-Q8_0.gguf -b 512 -ub 512 -c 4096 -ngl 100
32
+ ```
33
+
34
+ Set `-b` to batch size
35
+ Set `-ub` to physical batch size
36
+ Set `-c` to context size
37
+ Set `-ngl` to amount of layers to load onto GPU
38
+ Change the path to where the model is actually stored.
39
+ If you need more clarification on parameters check out the [llama.cpp Server Docs](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md)
40
 
41
  **License:**
42
  Apache 2.0