metadata

license: creativeml-openrail-m
language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - th
pipeline_tag: text-generation
tags:
  - triangulum_10b
  - sft
  - chain_of_thought
  - ollama
  - text-generation-inference
  - llama_for_causal_lm
library_name: transformers

  __           .__                                .__                   
_/  |_ _______ |__|_____     ____    ____   __ __ |  |   __ __   _____  
\   __\\_  __ \|  |\__  \   /    \  / ___\ |  |  \|  |  |  |  \ /     \ 
 |  |   |  | \/|  | / __ \_|   |  \/ /_/  >|  |  /|  |__|  |  /|  Y Y  \
 |__|   |__|   |__|(____  /|___|  /\___  / |____/ |____/|____/ |__|_|  /
                        \/      \//_____/                            \/

Triangulum 10B: Multilingual Large Language Models (LLMs)

Triangulum 10B is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.

Key Features

Foundation Model: Built upon LLaMA's autoregressive language model, leveraging an optimized transformer architecture for enhanced performance.
Instruction Tuning: Includes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align model outputs with human preferences for helpfulness and safety.
Multilingual Support: Designed to handle multiple languages, ensuring broad applicability across diverse linguistic contexts.

Training Approach

Synthetic Datasets: Utilizes long chain-of-thought synthetic data to enhance reasoning capabilities.
Supervised Fine-Tuning (SFT): Aligns the model to specific tasks through curated datasets.
Reinforcement Learning with Human Feedback (RLHF): Ensures the model adheres to human values and safety guidelines through iterative training processes.

How to use with transformers

Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

Make sure to update your transformers installation via pip install --upgrade transformers.

import torch
from transformers import pipeline

model_id = "prithivMLmods/Triangulum-10B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are the kind and tri-intelligent assistant helping people to understand complex concepts."},
    {"role": "user", "content": "Who are you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Demo Inference LlamaForCausalLM

import torch
from transformers import AutoTokenizer, LlamaForCausalLM

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('prithivMLmods/Triangulum-10B', trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(
    "prithivMLmods/Triangulum-10B",
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=True,
    use_flash_attention_2=True
)

# Define a list of system and user prompts
prompts = [
    """<|im_start|>system
You are the kind and tri-intelligent assistant helping people to understand complex concepts.<|im_end|>
<|im_start|>user
Can you explain the concept of eigenvalues and eigenvectors in a simple way?<|im_end|>
<|im_start|>assistant"""
]

# Generate responses for each prompt
for chat in prompts:
    print(f"Prompt:\n{chat}\n")
    input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
    generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
    print(f"Response:\n{response}\n{'-'*80}\n")

Key Adjustments:

System Prompts: Each prompt defines a different role or persona for the AI to adopt.
User Prompts: These specify the context or task for the assistant, ranging from teaching to storytelling or career advice.
Looping Through Prompts: Each prompt is processed in a loop to showcase the model's versatility.

You can expand the list of prompts to explore a variety of scenarios and responses.

Use Cases

Multilingual content generation
Question answering and dialogue systems
Text summarization and analysis
Translation and localization tasks

Technical Details

Triangulum 10B employs a state-of-the-art autoregressive architecture inspired by LLaMA. The optimized transformer framework ensures both efficiency and scalability, making it suitable for a variety of use cases.