Update README.md
Browse files
README.md
CHANGED
@@ -79,19 +79,14 @@ Note: BF16 is currently only supported on CPU.
|
|
79 |
|
80 |
## Hardware Choices (LLaMA3 70B Specific)
|
81 |
|
82 |
-
|
83 |
-
run
|
84 |
-
you
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
If you want to run Q5\_K\_M or or Q8\_0 the best choice is probably Mac
|
92 |
-
Studio. An Apple M2 Ultra w/ 24-core CPU, 60-core GPU, 128GB RAM (costs
|
93 |
-
$8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4\_0.llamafile
|
94 |
-
at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
|
95 |
|
96 |
Just want to try it? You can go on vast.ai and rent a system with 4x RTX
|
97 |
4090's for a few bucks an hour. That'll run these 70b llamafiles. Be
|
|
|
79 |
|
80 |
## Hardware Choices (LLaMA3 70B Specific)
|
81 |
|
82 |
+
Don't bother if you're using a Macbook M1 with 32GB of RAM. The Q2\_K
|
83 |
+
weights might work slowly if you run in CPU mode (pass `-ngl 0`) but
|
84 |
+
you're not going to have a good experience.
|
85 |
+
|
86 |
+
Mac Studio is recommended. An Apple M2 Ultra w/ 24-core CPU, 60-core
|
87 |
+
GPU, 128GB RAM (costs $8000 with the monitor) runs
|
88 |
+
Meta-Llama-3-70B-Instruct.Q4\_0.llamafile at 14 tok/sec (prompt eval is
|
89 |
+
82 tok/sec) thanks to the Metal GPU.
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
Just want to try it? You can go on vast.ai and rent a system with 4x RTX
|
92 |
4090's for a few bucks an hour. That'll run these 70b llamafiles. Be
|