File size: 4,249 Bytes
5109337
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
license: llama3.2
language:
- en
base_model:
- meta-llama/Llama-3.2-3B-Instruct
pipeline_tag: text-generation
tags:
- meta
- SLM
- conversational
- Quantized
---
# SandLogic Technology - Quantized meta-llama/Llama-3.2-3B-Instruct

## Model Description

We have quantized the meta-llama/Llama-3.2-3B-Instruct model into three variants:

1. Q5_KM
2. Q4_KM
3. IQ4_XS

These quantized models offer improved efficiency while maintaining performance.
Discover our full range of quantized language models by visiting our [SandLogic Lexicon](https://github.com/sandlogic/SandLogic-Lexicon) GitHub.
To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com).

## Original Model Information

- **Name**: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- **Developer**: Meta
- **Model Type**: Multilingual large language model (LLM)
- **Architecture**: Auto-regressive language model with optimized transformer architecture
- **Parameters**: 3 billion
- **Training Approach**: Supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF)
- **Data Freshness**: Pretraining data cutoff of December 2023

## Model Capabilities

Llama-3.2-3B-Instruct is optimized for multilingual dialogue use cases, including:

- Agentic retrieval
- Summarization tasks
- Assistant-like chat applications
- Knowledge retrieval
- Query and prompt rewriting

## Intended Use

1. Commercial and research applications in multiple languages
2. Mobile AI-powered writing assistants
3. Natural language generation tasks (with further adaptation)


## Training Data

- Pretrained on up to 9 trillion tokens from publicly available sources
- Incorporates knowledge distillation from larger Llama 3.1 models
- Fine-tuned with human-generated and synthetic data for safety

## Safety Considerations

- Implements safety mitigations as in Llama 3
- Emphasis on appropriate refusals and tone in responses
- Includes safeguards against borderline and adversarial prompts

## Quantized Variants

1. **Q5_KM**: 5-bit quantization using the KM method
2. **Q4_KM**: 4-bit quantization using the KM method
3. **IQ4_XS**: 4-bit quantization using the IQ4_XS method

These quantized models aim to reduce model size and improve inference speed while maintaining performance as close to the original model as possible.

## Usage

```bash
pip install llama-cpp-python 
```
Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support.

### Basic Text Completion
Here's an example demonstrating how to use the high-level API for basic text completion:

```bash
from llama_cpp import Llama

llm = Llama(
    model_path="./models/7B/Llama-3.2-3B-Instruct-Q5_K_M.gguf",
    verbose=False,
    # n_gpu_layers=-1, # Uncomment to use GPU acceleration
    # n_ctx=2048, # Uncomment to increase the context window
)

output = llm.create_chat_completion(
    messages =[
    {
        "role": "system",
        "content": "You are a pirate chatbot who always responds in pirate speak!",
    },
    {"role": "user", "content": "Who are you?"},
]
)

print(output["choices"][0]['message']['content'])
```

## Download
You can download `Llama` models in `gguf` format directly from Hugging Face using the `from_pretrained` method. This feature requires the `huggingface-hub` package.

To install it, run: `pip install huggingface-hub`

```bash
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="SandLogicTechnologies/Llama-3.2-3B-Instruct-GGUF",
    filename="*Llama-3.2-3B-Instruct-Q5_K_M.gguf",
    verbose=False
)
```
By default, from_pretrained will download the model to the Hugging Face cache directory. You can manage installed model files using the huggingface-cli tool.




## Acknowledgements

We thank Meta for developing the original Llama-3.2-3B-Instruct model.
Special thanks to  [Georgi Gerganov](https://github.com/ggerganov) and the entire [llama.cpp](https://github.com/ggerganov/llama.cpp/) development team for their outstanding contributions.

## Contact

For any inquiries or support, please contact us at support@sandlogic.com or visit our [Website](https://www.sandlogic.com/).