Update README.md
Browse files
README.md
CHANGED
@@ -50,10 +50,11 @@ AMD64.
|
|
50 |
|
51 |
## About Quantization Formats
|
52 |
|
53 |
-
Your choice of quantization format depends on
|
54 |
|
55 |
1. Will it fit in RAM or VRAM?
|
56 |
2. Is your use case reading (e.g. summarization) or writing (e.g. chatbot)?
|
|
|
57 |
|
58 |
Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
|
59 |
generation is bounded by memory speed, so smaller quants help.
|
|
|
50 |
|
51 |
## About Quantization Formats
|
52 |
|
53 |
+
Your choice of quantization format depends on three things:
|
54 |
|
55 |
1. Will it fit in RAM or VRAM?
|
56 |
2. Is your use case reading (e.g. summarization) or writing (e.g. chatbot)?
|
57 |
+
3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
|
58 |
|
59 |
Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
|
60 |
generation is bounded by memory speed, so smaller quants help.
|