Llamazing-3.1-8B-Instruct

img

Overview

Llamazing-3.1-8B-Instruct balances reasoning, creativity, and conversational capabilities to deliver exceptional performance across various applications.

Usage

The following Python code demonstrates how to use Llamazing-3.1-8B-Instruct with the Divine Intellect preset:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

class LlamazingAssistant:
    def __init__(self, model_name="model_here", device="cuda"):
        self.device = device
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        if self.tokenizer.pad_token_id is None:
            self.tokenizer.pad_token_id = 11
            self.tokenizer.eos_token_id = 128009
        
        self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
        self.model.generation_config.pad_token_id = self.tokenizer.pad_token_id
        self.model.generation_config.eos_token_id = 128009
        self.sys_message = ''' 
        '''
        # Divine Intellect preset parameters
        self.temperature = 1.31
        self.top_p = 0.14
        self.epsilon_cutoff = 1.49
        self.eta_cutoff = 10.42
        self.repetition_penalty = 1.17
        self.top_k = 49

    def format_prompt(self, question):
        messages = [
            {"role": "system", "content": self.sys_message},
            {"role": "user", "content": question}
        ]
        prompt = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        return prompt

    def recursive_reflection(self, initial_response, question, max_new_tokens=512):
        reflection_prompt = f'''
        Initial Response: {initial_response}

        Reflect on the above response. Identify any inaccuracies, weaknesses, or areas for improvement. 
        If the response is strong, justify why it is valid. Otherwise, provide a revised and improved version.
        
        User Question: {question}
        '''
        inputs = self.tokenizer(reflection_prompt, return_tensors="pt").to(self.device)
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs, 
                max_new_tokens=max_new_tokens, 
                temperature=self.temperature,
                top_p=self.top_p,
                repetition_penalty=self.repetition_penalty,
                top_k=self.top_k,
                eta_cutoff=self.eta_cutoff,
                epsilon_cutoff=self.epsilon_cutoff,
                do_sample=True,
                use_cache=True
            )
        refined_answer = self.tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
        return refined_answer

    def generate_response(self, question, max_new_tokens=512, enable_reflection=True):
        # Generate the initial response
        prompt = self.format_prompt(question)
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs, 
                max_new_tokens=max_new_tokens, 
                temperature=self.temperature,
                top_p=self.top_p,
                repetition_penalty=self.repetition_penalty,
                top_k=self.top_k,
                eta_cutoff=self.eta_cutoff,
                epsilon_cutoff=self.epsilon_cutoff,
                do_sample=True,
                use_cache=True
            )
        initial_response = self.tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
        # Perform recursive self-reflection if enabled
        if enable_reflection:
            return self.recursive_reflection(initial_response, question, max_new_tokens)
        else:
            return initial_response

Key Features

  1. Multi-Model Integration: Combines the expertise of several specialized models to excel in reasoning, creativity, and conversational capabilities.
  2. Balanced Density and Weighting: Ensures that no single model dominates the final output, leading to coherent and well-rounded responses.
  3. Optimized Generation Parameters: Pre-tuned for superior performance with the Divine Intellect preset.
  4. Ease of Use: Simplified setup and execution for efficient utilization.

License

Released under the Llama 3.1 Community License.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 19.78
IFEval (0-Shot) 33.42
BBH (3-Shot) 32.51
MATH Lvl 5 (4-Shot) 5.82
GPQA (0-shot) 6.38
MuSR (0-shot) 11.82
MMLU-PRO (5-shot) 28.75
Downloads last month
14
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for sethuiyer/Llamazing-3.1-8B-Instruct

Finetuned
(1)
this model
Quantizations
3 models

Evaluation results