Mozilla
/

Meta-Llama-3-8B-Instruct-llamafile

Text Generation

Model card Files Files and versions Community

jartine commited on Apr 19

Commit

dfd4efd

•

1 Parent(s): 2979e80

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -50,10 +50,11 @@ AMD64.
 ## About Quantization Formats
-Your choice of quantization format depends on two things:
 1. Will it fit in RAM or VRAM?
 2. Is your use case reading (e.g. summarization) or writing (e.g. chatbot)?
 Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
 generation is bounded by memory speed, so smaller quants help.

 ## About Quantization Formats
+Your choice of quantization format depends on three things:
 1. Will it fit in RAM or VRAM?
 2. Is your use case reading (e.g. summarization) or writing (e.g. chatbot)?
+3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
 Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
 generation is bounded by memory speed, so smaller quants help.