b4rtaz commited on
Commit
5672a7d
β€’
1 Parent(s): 9b010dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -3
README.md CHANGED
@@ -1,3 +1,31 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ tags:
4
+ - distributed-inference
5
+ - text-generation
6
+ ---
7
+
8
+ This is converted **Llama 3.1 405B Instruct** model to [Distributed Llama](https://github.com/b4rtaz/distributed-llama) format. The model is quantized to Q40 format. Due to Huggingface limitations, the model is split into 56 parts. Before use, you need to combine the parts together.
9
+
10
+ To run this model, you need approximately 240 GB of RAM on a single device, or approximately 240 GB distributed across 2, 4, 8, or 16 devices connected in a cluster (more informations how to do it you can find [here](https://github.com/b4rtaz/distributed-llama)).
11
+
12
+ ## πŸš€ How to Run?
13
+
14
+ 1. ⏬ Download the model. You have two options:
15
+ * Download this repository and combine all parts together by using the `cat` command.
16
+ * Download the model by using the `launch.py` script from Distributed Llama repository: `python launch.py llama3_1_405b_instruct_q40`
17
+ 4. ⏬ Download [Distributed Llama](https://github.com/b4rtaz/distributed-llama) repository.
18
+ 5. πŸ”¨ Build Distributed Llama:
19
+ ```
20
+ make dllama
21
+ ```
22
+ 4. πŸš€ Run Distributed Llama:
23
+ ```
24
+ ./dllama chat --model dllama_model_llama31_405b_q40.m --tokenizer dllama_tokenizer_llama_3_1.t --buffer-float-type q80 --max-seq-len 2048 --nthreads 64
25
+ ```
26
+
27
+ ## 🎩 License
28
+
29
+ You need to accept the Llama 3.1 license before downloading this model.
30
+
31
+ [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)