b4rtaz
/

Llama-3_1-405B-Q40-Instruct-Distributed-Llama

Text Generation

distributed-inference

Model card Files Files and versions Community

b4rtaz commited on Jul 31, 2024

Commit

3c9ee74

•

1 Parent(s): 5672a7d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 This is converted **Llama 3.1 405B Instruct** model to [Distributed Llama](https://github.com/b4rtaz/distributed-llama) format. The model is quantized to Q40 format. Due to Huggingface limitations, the model is split into 56 parts. Before use, you need to combine the parts together.
-To run this model, you need approximately 240 GB of RAM on a single device, or approximately 240 GB distributed across 2, 4, 8, or 16 devices connected in a cluster (more informations how to do it you can find [here](https://github.com/b4rtaz/distributed-llama)).
 ## 🚀 How to Run?

 This is converted **Llama 3.1 405B Instruct** model to [Distributed Llama](https://github.com/b4rtaz/distributed-llama) format. The model is quantized to Q40 format. Due to Huggingface limitations, the model is split into 56 parts. Before use, you need to combine the parts together.
+To run this model, you need approximately 240 GB of RAM on a single device, or approximately 240 GB of RAM distributed across 2, 4, 8, or 16 devices connected in a cluster (more informations how to do it you can find [here](https://github.com/b4rtaz/distributed-llama)).
 ## 🚀 How to Run?