SandLogicTechnologies commited on
Commit
5109337
1 Parent(s): d85fbdb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ language:
4
+ - en
5
+ base_model:
6
+ - meta-llama/Llama-3.2-3B-Instruct
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - meta
10
+ - SLM
11
+ - conversational
12
+ - Quantized
13
+ ---
14
+ # SandLogic Technology - Quantized meta-llama/Llama-3.2-3B-Instruct
15
+
16
+ ## Model Description
17
+
18
+ We have quantized the meta-llama/Llama-3.2-3B-Instruct model into three variants:
19
+
20
+ 1. Q5_KM
21
+ 2. Q4_KM
22
+ 3. IQ4_XS
23
+
24
+ These quantized models offer improved efficiency while maintaining performance.
25
+ Discover our full range of quantized language models by visiting our [SandLogic Lexicon](https://github.com/sandlogic/SandLogic-Lexicon) GitHub.
26
+ To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com).
27
+
28
+ ## Original Model Information
29
+
30
+ - **Name**: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
31
+ - **Developer**: Meta
32
+ - **Model Type**: Multilingual large language model (LLM)
33
+ - **Architecture**: Auto-regressive language model with optimized transformer architecture
34
+ - **Parameters**: 3 billion
35
+ - **Training Approach**: Supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF)
36
+ - **Data Freshness**: Pretraining data cutoff of December 2023
37
+
38
+ ## Model Capabilities
39
+
40
+ Llama-3.2-3B-Instruct is optimized for multilingual dialogue use cases, including:
41
+
42
+ - Agentic retrieval
43
+ - Summarization tasks
44
+ - Assistant-like chat applications
45
+ - Knowledge retrieval
46
+ - Query and prompt rewriting
47
+
48
+ ## Intended Use
49
+
50
+ 1. Commercial and research applications in multiple languages
51
+ 2. Mobile AI-powered writing assistants
52
+ 3. Natural language generation tasks (with further adaptation)
53
+
54
+
55
+ ## Training Data
56
+
57
+ - Pretrained on up to 9 trillion tokens from publicly available sources
58
+ - Incorporates knowledge distillation from larger Llama 3.1 models
59
+ - Fine-tuned with human-generated and synthetic data for safety
60
+
61
+ ## Safety Considerations
62
+
63
+ - Implements safety mitigations as in Llama 3
64
+ - Emphasis on appropriate refusals and tone in responses
65
+ - Includes safeguards against borderline and adversarial prompts
66
+
67
+ ## Quantized Variants
68
+
69
+ 1. **Q5_KM**: 5-bit quantization using the KM method
70
+ 2. **Q4_KM**: 4-bit quantization using the KM method
71
+ 3. **IQ4_XS**: 4-bit quantization using the IQ4_XS method
72
+
73
+ These quantized models aim to reduce model size and improve inference speed while maintaining performance as close to the original model as possible.
74
+
75
+ ## Usage
76
+
77
+ ```bash
78
+ pip install llama-cpp-python
79
+ ```
80
+ Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support.
81
+
82
+ ### Basic Text Completion
83
+ Here's an example demonstrating how to use the high-level API for basic text completion:
84
+
85
+ ```bash
86
+ from llama_cpp import Llama
87
+
88
+ llm = Llama(
89
+ model_path="./models/7B/Llama-3.2-3B-Instruct-Q5_K_M.gguf",
90
+ verbose=False,
91
+ # n_gpu_layers=-1, # Uncomment to use GPU acceleration
92
+ # n_ctx=2048, # Uncomment to increase the context window
93
+ )
94
+
95
+ output = llm.create_chat_completion(
96
+ messages =[
97
+ {
98
+ "role": "system",
99
+ "content": "You are a pirate chatbot who always responds in pirate speak!",
100
+ },
101
+ {"role": "user", "content": "Who are you?"},
102
+ ]
103
+ )
104
+
105
+ print(output["choices"][0]['message']['content'])
106
+ ```
107
+
108
+ ## Download
109
+ You can download `Llama` models in `gguf` format directly from Hugging Face using the `from_pretrained` method. This feature requires the `huggingface-hub` package.
110
+
111
+ To install it, run: `pip install huggingface-hub`
112
+
113
+ ```bash
114
+ from llama_cpp import Llama
115
+
116
+ llm = Llama.from_pretrained(
117
+ repo_id="SandLogicTechnologies/Llama-3.2-3B-Instruct-GGUF",
118
+ filename="*Llama-3.2-3B-Instruct-Q5_K_M.gguf",
119
+ verbose=False
120
+ )
121
+ ```
122
+ By default, from_pretrained will download the model to the Hugging Face cache directory. You can manage installed model files using the huggingface-cli tool.
123
+
124
+
125
+
126
+
127
+ ## Acknowledgements
128
+
129
+ We thank Meta for developing the original Llama-3.2-3B-Instruct model.
130
+ Special thanks to [Georgi Gerganov](https://github.com/ggerganov) and the entire [llama.cpp](https://github.com/ggerganov/llama.cpp/) development team for their outstanding contributions.
131
+
132
+ ## Contact
133
+
134
+ For any inquiries or support, please contact us at support@sandlogic.com or visit our [Website](https://www.sandlogic.com/).
135
+
136
+