File size: 10,634 Bytes
09f6781 0d7eef8 09f6781 baa031f 09f6781 baa031f 09f6781 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
---
license: creativeml-openrail-m
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
pipeline_tag: text-generation
tags:
- triangulum_1b
- sft
- chain_of_thought
- ollama
- text-generation-inference
- llama_for_causal_lm
- reasoning
- CoT
library_name: transformers
metrics:
- code_eval
- accuracy
- competition_math
- character
---
![Triangulum-5b.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/By0OJ1lMvP5ZvVvfEGvz5.png)
<pre align="center">
__ .__ .__
_/ |_ _______ |__|_____ ____ ____ __ __ | | __ __ _____
\ __\\_ __ \| |\__ \ / \ / ___\ | | \| | | | \ / \
| | | | \/| | / __ \_| | \/ /_/ >| | /| |__| | /| Y Y \
|__| |__| |__|(____ /|___| /\___ / |____/ |____/|____/ |__|_| /
\/ \//_____/ \/
</pre>
# **Triangulum 1B: Multilingual Large Language Models (LLMs)**
Triangulum 1B is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
# **Key Features & Model Architecture**
- **Foundation Model**: Built upon LLaMA's autoregressive language model, leveraging an optimized transformer architecture for enhanced performance.
- **Instruction Tuning**: Includes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align model outputs with human preferences for helpfulness and safety.
- **Multilingual Support**: Designed to handle multiple languages, ensuring broad applicability across diverse linguistic contexts.
---
- Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
# **Training Approach**
1. **Synthetic Datasets**: Utilizes long chain-of-thought synthetic data to enhance reasoning capabilities.
2. **Supervised Fine-Tuning (SFT)**: Aligns the model to specific tasks through curated datasets.
3. **Reinforcement Learning with Human Feedback (RLHF)**: Ensures the model adheres to human values and safety guidelines through iterative training processes.
# **How to use with transformers**
Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
Make sure to update your transformers installation via `pip install --upgrade transformers`.
```python
import torch
from transformers import pipeline
model_id = "prithivMLmods/Triangulum-1B"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are the kind and tri-intelligent assistant helping people to understand complex concepts."},
{"role": "user", "content": "Who are you?"},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
```
# **Demo Inference LlamaForCausalLM**
```python
import torch
from transformers import AutoTokenizer, LlamaForCausalLM
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('prithivMLmods/Triangulum-1B', trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(
"prithivMLmods/Triangulum-1B",
torch_dtype=torch.float16,
device_map="auto",
load_in_8bit=False,
load_in_4bit=True,
use_flash_attention_2=True
)
# Define a list of system and user prompts
prompts = [
"""<|im_start|>system
You are the kind and tri-intelligent assistant helping people to understand complex concepts.<|im_end|>
<|im_start|>user
Can you explain the concept of eigenvalues and eigenvectors in a simple way?<|im_end|>
<|im_start|>assistant"""
]
# Generate responses for each prompt
for chat in prompts:
print(f"Prompt:\n{chat}\n")
input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response:\n{response}\n{'-'*80}\n")
```
# **Key Adjustments**
1. **System Prompts:** Each prompt defines a different role or persona for the AI to adopt.
2. **User Prompts:** These specify the context or task for the assistant, ranging from teaching to storytelling or career advice.
3. **Looping Through Prompts:** Each prompt is processed in a loop to showcase the model's versatility.
You can expand the list of prompts to explore a variety of scenarios and responses.
# **Use Cases for T5B**
- Multilingual content generation
- Question answering and dialogue systems
- Text summarization and analysis
- Translation and localization tasks
# **Technical Details**
Triangulum 1B employs a state-of-the-art autoregressive architecture inspired by LLaMA. The optimized transformer framework ensures both efficiency and scalability, making it suitable for a variety of use cases.
# **How to Run Triangulum 5B on Ollama Locally**
```markdown
# How to Run Ollama Locally
This guide demonstrates the power of using open-source LLMs locally, showcasing examples with different open-source models for various use cases. By the end, you'll be equipped to run any future open-source LLM models with ease.
---
## Example 1: How to Run the Triangulum-1B Model
The **Triangulum-10B** model is an open-source LLM known for its capabilities across text-based tasks. We'll interact with it similarly to ChatGPT, but run it locally with support for quants.
### Step 1: Download the Model
First, download the **Triangulum-1B-F16.gguf** model using the following command:
```bash
ollama run triangulum-1b-f16.gguf
```
### Step 2: Model Initialization and Download
Upon running the command, Ollama will initialize and download the model files. You should see output similar to the following:
```plaintext
pulling manifest
pulling 8934d96d3f08... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 3.8 GB
pulling 8c17c2ebb0ea... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 7.0 KB
pulling 7c23fb36d801... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 4.8 KB
pulling 2e0493f67d0c... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 59 B
pulling fa304d675061... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 91 B
pulling 42ba7f8a01dd... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 557 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Send a message (/? for help)
```
### Step 3: Interact with the Model
Once the model is ready, you can send a prompt. For example, let's ask:
```plaintext
>>> What can you do for me?
```
If you asked the same question, you should receive a response like this:
```plaintext
As a responsible AI language model, I am here to assist you with any questions or tasks you may have. Here are some examples of things I can help with:
1. Answering questions: I can provide information on a wide range of topics, from science and technology to history and culture.
2. Generating ideas: I can help you brainstorm ideas for creative projects, or provide suggestions for solving problems.
3. Writing assistance: I can help you with writing tasks such as proofreading, editing, and suggesting alternative words or phrases.
4. Translation: I can translate text from one language to another.
5. Summarizing content: I can summarize long pieces of text, such as articles or documents, into shorter, more digestible versions.
6. Creativity: I can help you generate creative ideas for stories, poems, or other forms of writing.
7. Language learning: I can assist you in learning a new language by providing grammar explanations, vocabulary lists, and practice exercises.
8. Chatting: I'm here to chat with you and provide a response to any question or topic you'd like to discuss.
Please let me know if there is anything specific you would like me to help you with.
```
### Step 4: Exit the Program
To exit the program, simply type:
```plaintext
/exit
```
## Example 2: Running Multi-Modal Models (Future Use)
Ollama supports running multi-modal models where you can send images and ask questions based on them. This section will be updated as more models become available.
## Notes on Using Quantized Models
Quantized models like **triangulum-1b-f16.gguf** are optimized for performance on resource-constrained hardware, making it accessible for local inference.
1. Ensure your system has sufficient VRAM or CPU resources.
2. Use the `.gguf` model format for compatibility with Ollama.
# **Conclusion**
Running the **Triangulum-5B** model with Ollama provides a robust way to leverage open-source LLMs locally for diverse use cases. By following these steps, you can explore the capabilities of other open-source models in the future. |