Update README.md
Browse files
README.md
CHANGED
@@ -70,26 +70,25 @@ Note: BF16 is currently only supported on CPU.
|
|
70 |
## Hardware Choices
|
71 |
|
72 |
Any Macbook with 32GB should be able to run
|
73 |
-
Meta-Llama-3-70B-Instruct.Q2\_K.llamafile
|
74 |
-
|
75 |
-
|
76 |
|
77 |
If you want to run Q4\_0 you'll probably be able to squeeze it on a
|
78 |
$3,999.00 Macbook Pro M3 Max w/ 48GB of RAM.
|
79 |
|
80 |
If you want to run Q5\_K\_M or or Q8\_0 the best choice is probably Mac
|
81 |
-
Studio.
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
prompt eval goes 65 tok/sec.
|
93 |
|
94 |
---
|
95 |
|
|
|
70 |
## Hardware Choices
|
71 |
|
72 |
Any Macbook with 32GB should be able to run
|
73 |
+
Meta-Llama-3-70B-Instruct.Q2\_K.llamafile. It's smart enough to solve
|
74 |
+
math riddles, but at this level of quantization you should expect
|
75 |
+
hallucinations.
|
76 |
|
77 |
If you want to run Q4\_0 you'll probably be able to squeeze it on a
|
78 |
$3,999.00 Macbook Pro M3 Max w/ 48GB of RAM.
|
79 |
|
80 |
If you want to run Q5\_K\_M or or Q8\_0 the best choice is probably Mac
|
81 |
+
Studio. An Apple M2 Ultra w/ 24-core CPU, 60-core GPU, 128GB RAM (costs
|
82 |
+
$8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4\_0.llamafile
|
83 |
+
at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
|
84 |
+
|
85 |
+
Just want to try it? You can go on vast.ai and rent a system with 4x RTX
|
86 |
+
4090's for a few bucks an hour. That'll run these 70b llamafiles. Or you
|
87 |
+
could build your own, but the graphics cards alone will cost $10k+.
|
88 |
+
|
89 |
+
AMD Threadripper Pro 7995WX ($10k) does a good job too at 5.9 tok/sec
|
90 |
+
eval with Q4\_0 (49 tok/sec prompt). With F16 weights the prompt eval
|
91 |
+
goes 65 tok/sec.
|
|
|
92 |
|
93 |
---
|
94 |
|