SweatyCrayfish
commited on
Commit
•
c33e797
1
Parent(s):
92cc1dc
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 4-bit Quantized Llama 3 Model
|
2 |
+
|
3 |
+
## Description
|
4 |
+
This repository hosts the 4-bit quantized version of the Llama 3 model. Optimized for reduced memory usage and faster inference, this model is suitable for deployment in environments where computational resources are limited.
|
5 |
+
|
6 |
+
## Model Details
|
7 |
+
- **Model Type**: Transformer-based language model.
|
8 |
+
- **Quantization**: 4-bit precision.
|
9 |
+
- **Advantages**:
|
10 |
+
- **Memory Efficiency**: Reduces memory usage significantly, allowing deployment on devices with limited RAM.
|
11 |
+
- **Inference Speed**: Accelerates inference times, depending on the hardware's ability to process low-bit computations.
|
12 |
+
|
13 |
+
## How to Use
|
14 |
+
To utilize this model efficiently, follow the steps below:
|
15 |
+
|
16 |
+
### Loading the Quantized Model
|
17 |
+
Load the model with specific parameters to ensure it utilizes 4-bit precision:
|
18 |
+
```python
|
19 |
+
from transformers import AutoModelForCausalLM
|
20 |
+
|
21 |
+
model_4bit = AutoModelForCausalLM.from_pretrained("SweatyCrayfish/llama-3-8b-quantized", device_map="auto", load_in_4bit=True)
|
22 |
+
```
|
23 |
+
## Adjusting Precision of Components
|
24 |
+
Adjust the precision of other components, which are by default converted to torch.float16:
|
25 |
+
```python
|
26 |
+
import torch
|
27 |
+
from transformers import AutoModelForCausalLM
|
28 |
+
|
29 |
+
model_4bit = AutoModelForCausalLM.from_pretrained("SweatyCrayfish/llama-3-8b-quantized", load_in_4bit=True, torch_dtype=torch.float32)
|
30 |
+
print(model_4bit.model.decoder.layers[-1].final_layer_norm.weight.dtype)
|
31 |
+
```
|
32 |
+
## Citation
|
33 |
+
Original repository and citations:
|
34 |
+
@article{llama3modelcard,
|
35 |
+
title={Llama 3 Model Card},
|
36 |
+
author={AI@Meta},
|
37 |
+
year={2024},
|
38 |
+
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
|
39 |
+
}
|