prithivMLmods commited on
Commit
d8d492f
Β·
verified Β·
1 Parent(s): 186f66d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +156 -3
README.md CHANGED
@@ -1,3 +1,156 @@
1
- ---
2
- license: creativeml-openrail-m
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: creativeml-openrail-m
3
+ language:
4
+ - en
5
+ - de
6
+ - fr
7
+ - it
8
+ - pt
9
+ - hi
10
+ - es
11
+ - th
12
+ library_name: transformers
13
+ pipeline_tag: text-generation
14
+ tags:
15
+ - '1.0e-5'
16
+ - chain_of_thought
17
+ - ollama
18
+ base_model:
19
+ - meta-llama/Llama-3.2-3B-Instruct
20
+ ---
21
+ # **Llama-Thinker-3B-Preview GGUF**
22
+
23
+ Llama-Thinker-3B-Preview-GGUF is a pretrained and instruction-tuned generative model designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
24
+
25
+ Model Architecture: [ Based on Llama 3.2 ] is an autoregressive language model that uses an optimized transformer architecture. The tuned versions undergo supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
26
+
27
+ # **Use with transformers**
28
+
29
+ Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
30
+
31
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
32
+
33
+ ```python
34
+ import torch
35
+ from transformers import pipeline
36
+
37
+ model_id = "prithivMLmods/Llama-Thinker-3B-Preview"
38
+ pipe = pipeline(
39
+ "text-generation",
40
+ model=model_id,
41
+ torch_dtype=torch.bfloat16,
42
+ device_map="auto",
43
+ )
44
+ messages = [
45
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
46
+ {"role": "user", "content": "Who are you?"},
47
+ ]
48
+ outputs = pipe(
49
+ messages,
50
+ max_new_tokens=256,
51
+ )
52
+ print(outputs[0]["generated_text"][-1])
53
+ ```
54
+
55
+ Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes)
56
+
57
+ # **Use with `llama`**
58
+
59
+ Please, follow the instructions in the [repository](https://github.com/meta-llama/llama)
60
+
61
+ To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
62
+
63
+ ```
64
+ huggingface-cli download prithivMLmods/Llama-Thinker-3B-Preview --include "original/*" --local-dir Llama-Thinker-3B-Preview
65
+ ```
66
+
67
+ Here’s a version tailored for the **Llama-Thinker-3B-Preview-GGUF** model:
68
+
69
+ ---
70
+
71
+ # **How to Run Llama-Thinker-3B-Preview on Ollama Locally**
72
+
73
+ This guide demonstrates how to run the **Llama-Thinker-3B-Preview-GGUF** model locally using Ollama. The model is instruction-tuned for multilingual tasks and complex reasoning, making it highly versatile for a wide range of use cases. By the end, you'll be equipped to run this and other open-source models with ease.
74
+
75
+ ---
76
+
77
+ ## Example 1: How to Run the Llama-Thinker-3B-Preview Model
78
+
79
+ The **Llama-Thinker-3B** model is a pretrained and instruction-tuned LLM, designed for complex reasoning tasks across multiple languages. In this guide, we'll interact with it locally using Ollama, with support for quantized models.
80
+
81
+ ### Step 1: Download the Model
82
+
83
+ First, download the **Llama-Thinker-3B-Preview-GGUF** model using the following command:
84
+
85
+ ```bash
86
+ ollama run llama-thinker-3b-preview.gguf
87
+ ```
88
+
89
+ ### Step 2: Model Initialization and Download
90
+
91
+ Once the command is executed, Ollama will initialize and download the necessary model files. You should see output similar to this:
92
+
93
+ ```plaintext
94
+ pulling manifest
95
+ pulling a12cd3456efg... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 3.2 GB
96
+ pulling 9f87ghijklmn... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 6.5 KB
97
+ verifying sha256 digest
98
+ writing manifest
99
+ removing any unused layers
100
+ success
101
+ >>> Send a message (/? for help)
102
+ ```
103
+
104
+ ### Step 3: Interact with the Model
105
+
106
+ Once the model is fully loaded, you can interact with it by sending prompts. For example, let's ask:
107
+
108
+ ```plaintext
109
+ >>> How can you assist me today?
110
+ ```
111
+
112
+ A sample response might look like this [may / maynot be identical]:
113
+
114
+ ```plaintext
115
+ I am Llama-Thinker-3B, an advanced AI language model designed to assist with complex reasoning, multilingual tasks, and general-purpose queries. Here are a few things I can help you with:
116
+
117
+ 1. Answering complex questions in multiple languages.
118
+ 2. Assisting with creative writing, content generation, and problem-solving.
119
+ 3. Providing detailed summaries and explanations.
120
+ 4. Translating text across different languages.
121
+ 5. Generating ideas for personal or professional use.
122
+ 6. Offering insights on technical topics.
123
+
124
+ Feel free to ask me anything you'd like assistance with!
125
+ ```
126
+
127
+ ### Step 4: Exit the Program
128
+
129
+ To exit the program, simply type:
130
+
131
+ ```plaintext
132
+ /exit
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Example 2: Using Multi-Modal Models (Future Use)
138
+
139
+ In the future, Ollama may support multi-modal models where you can input both text and images for advanced interactions. This section will be updated as new capabilities become available.
140
+
141
+ ---
142
+
143
+ ## Notes on Using Quantized Models
144
+
145
+ Quantized models like **llama-thinker-3b-preview.gguf** are optimized for efficient performance on local systems with limited resources. Here are some key points to ensure smooth operation:
146
+
147
+ 1. **VRAM/CPU Requirements**: Ensure your system has adequate VRAM or CPU resources to handle model inference.
148
+ 2. **Model Format**: Use the `.gguf` model format for compatibility with Ollama.
149
+
150
+ ---
151
+
152
+ # **Conclusion**
153
+
154
+ Running the **Llama-Thinker-3B-Preview** model locally using Ollama provides a powerful way to leverage open-source LLMs for complex reasoning and multilingual tasks. By following this guide, you can explore other models and expand your use cases as new models become available.
155
+
156
+ ---