SweatyCrayfish commited on
Commit
c33e797
1 Parent(s): 92cc1dc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 4-bit Quantized Llama 3 Model
2
+
3
+ ## Description
4
+ This repository hosts the 4-bit quantized version of the Llama 3 model. Optimized for reduced memory usage and faster inference, this model is suitable for deployment in environments where computational resources are limited.
5
+
6
+ ## Model Details
7
+ - **Model Type**: Transformer-based language model.
8
+ - **Quantization**: 4-bit precision.
9
+ - **Advantages**:
10
+ - **Memory Efficiency**: Reduces memory usage significantly, allowing deployment on devices with limited RAM.
11
+ - **Inference Speed**: Accelerates inference times, depending on the hardware's ability to process low-bit computations.
12
+
13
+ ## How to Use
14
+ To utilize this model efficiently, follow the steps below:
15
+
16
+ ### Loading the Quantized Model
17
+ Load the model with specific parameters to ensure it utilizes 4-bit precision:
18
+ ```python
19
+ from transformers import AutoModelForCausalLM
20
+
21
+ model_4bit = AutoModelForCausalLM.from_pretrained("SweatyCrayfish/llama-3-8b-quantized", device_map="auto", load_in_4bit=True)
22
+ ```
23
+ ## Adjusting Precision of Components
24
+ Adjust the precision of other components, which are by default converted to torch.float16:
25
+ ```python
26
+ import torch
27
+ from transformers import AutoModelForCausalLM
28
+
29
+ model_4bit = AutoModelForCausalLM.from_pretrained("SweatyCrayfish/llama-3-8b-quantized", load_in_4bit=True, torch_dtype=torch.float32)
30
+ print(model_4bit.model.decoder.layers[-1].final_layer_norm.weight.dtype)
31
+ ```
32
+ ## Citation
33
+ Original repository and citations:
34
+ @article{llama3modelcard,
35
+ title={Llama 3 Model Card},
36
+ author={AI@Meta},
37
+ year={2024},
38
+ url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
39
+ }