Mozilla
/

Meta-Llama-3-70B-Instruct-llamafile

Text Generation

Model card Files Files and versions Community

jartine commited on Apr 21

Commit

24bcc3f

•

1 Parent(s): 35800b1

Update README.md

Files changed (1) hide show

README.md +8 -13

README.md CHANGED Viewed

@@ -79,19 +79,14 @@ Note: BF16 is currently only supported on CPU.
 ## Hardware Choices (LLaMA3 70B Specific)
-Any Macbook with a Metal GPU and 32GB of RAM should in theory be able to
-run Meta-Llama-3-70B-Instruct.Q2\_K.llamafile reasonably well, provided
-you close all your browser tabs. At this lowliest of quantization
-levels, llama3 is still smart enough to solve math riddles, but you
-should expect more hallucinations than usual.
-If you want to run Q4\_0 you'll probably be able to squeeze it on a
-$3,999.00 Macbook Pro M3 Max w/ 48GB of RAM.
-If you want to run Q5\_K\_M or or Q8\_0 the best choice is probably Mac
-Studio. An Apple M2 Ultra w/ 24-core CPU, 60-core GPU, 128GB RAM (costs
-$8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4\_0.llamafile
-at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
 Just want to try it? You can go on vast.ai and rent a system with 4x RTX
 4090's for a few bucks an hour. That'll run these 70b llamafiles. Be

 ## Hardware Choices (LLaMA3 70B Specific)
+Don't bother if you're using a Macbook M1 with 32GB of RAM. The Q2\_K
+weights might work slowly if you run in CPU mode (pass `-ngl 0`) but
+you're not going to have a good experience.
+Mac Studio is recommended. An Apple M2 Ultra w/ 24-core CPU, 60-core
+GPU, 128GB RAM (costs $8000 with the monitor) runs
+Meta-Llama-3-70B-Instruct.Q4\_0.llamafile at 14 tok/sec (prompt eval is
+82 tok/sec) thanks to the Metal GPU.
 Just want to try it? You can go on vast.ai and rent a system with 4x RTX
 4090's for a few bucks an hour. That'll run these 70b llamafiles. Be