Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ tags:
|
|
7 |
|
8 |
This is converted **Llama 3.1 405B Instruct** model to [Distributed Llama](https://github.com/b4rtaz/distributed-llama) format. The model is quantized to Q40 format. Due to Huggingface limitations, the model is split into 56 parts. Before use, you need to combine the parts together.
|
9 |
|
10 |
-
To run this model, you need approximately 240 GB of RAM on a single device, or approximately 240 GB distributed across 2, 4, 8, or 16 devices connected in a cluster (more informations how to do it you can find [here](https://github.com/b4rtaz/distributed-llama)).
|
11 |
|
12 |
## 🚀 How to Run?
|
13 |
|
|
|
7 |
|
8 |
This is converted **Llama 3.1 405B Instruct** model to [Distributed Llama](https://github.com/b4rtaz/distributed-llama) format. The model is quantized to Q40 format. Due to Huggingface limitations, the model is split into 56 parts. Before use, you need to combine the parts together.
|
9 |
|
10 |
+
To run this model, you need approximately 240 GB of RAM on a single device, or approximately 240 GB of RAM distributed across 2, 4, 8, or 16 devices connected in a cluster (more informations how to do it you can find [here](https://github.com/b4rtaz/distributed-llama)).
|
11 |
|
12 |
## 🚀 How to Run?
|
13 |
|