---
library_name: transformers
tags:
- narration
- Truthful
base_model:
- sethuiyer/Llamaverse-3.1-8B-Instruct
model-index:
- name: Llamazing-3.1-8B-Instruct
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 33.42
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 32.51
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 5.82
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 6.38
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 11.82
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 28.75
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
      name: Open LLM Leaderboard
---
# Llamazing-3.1-8B-Instruct

![img](./image.webp)

### Overview
Llamazing-3.1-8B-Instruct balances reasoning, creativity, and conversational capabilities to deliver exceptional performance across various applications.

### Usage
The following Python code demonstrates how to use Llamazing-3.1-8B-Instruct with the Divine Intellect preset:

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

class LlamazingAssistant:
    def __init__(self, model_name="model_here", device="cuda"):
        self.device = device
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        if self.tokenizer.pad_token_id is None:
            self.tokenizer.pad_token_id = 11
            self.tokenizer.eos_token_id = 128009
        
        self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
        self.model.generation_config.pad_token_id = self.tokenizer.pad_token_id
        self.model.generation_config.eos_token_id = 128009
        self.sys_message = ''' 
        '''
        # Divine Intellect preset parameters
        self.temperature = 1.31
        self.top_p = 0.14
        self.epsilon_cutoff = 1.49
        self.eta_cutoff = 10.42
        self.repetition_penalty = 1.17
        self.top_k = 49

    def format_prompt(self, question):
        messages = [
            {"role": "system", "content": self.sys_message},
            {"role": "user", "content": question}
        ]
        prompt = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        return prompt

    def recursive_reflection(self, initial_response, question, max_new_tokens=512):
        reflection_prompt = f'''
        Initial Response: {initial_response}

        Reflect on the above response. Identify any inaccuracies, weaknesses, or areas for improvement. 
        If the response is strong, justify why it is valid. Otherwise, provide a revised and improved version.
        
        User Question: {question}
        '''
        inputs = self.tokenizer(reflection_prompt, return_tensors="pt").to(self.device)
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs, 
                max_new_tokens=max_new_tokens, 
                temperature=self.temperature,
                top_p=self.top_p,
                repetition_penalty=self.repetition_penalty,
                top_k=self.top_k,
                eta_cutoff=self.eta_cutoff,
                epsilon_cutoff=self.epsilon_cutoff,
                do_sample=True,
                use_cache=True
            )
        refined_answer = self.tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
        return refined_answer

    def generate_response(self, question, max_new_tokens=512, enable_reflection=True):
        # Generate the initial response
        prompt = self.format_prompt(question)
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs, 
                max_new_tokens=max_new_tokens, 
                temperature=self.temperature,
                top_p=self.top_p,
                repetition_penalty=self.repetition_penalty,
                top_k=self.top_k,
                eta_cutoff=self.eta_cutoff,
                epsilon_cutoff=self.epsilon_cutoff,
                do_sample=True,
                use_cache=True
            )
        initial_response = self.tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
        # Perform recursive self-reflection if enabled
        if enable_reflection:
            return self.recursive_reflection(initial_response, question, max_new_tokens)
        else:
            return initial_response
```

### Key Features
1. **Multi-Model Integration**: Combines the expertise of several specialized models to excel in reasoning, creativity, and conversational capabilities.
2. **Balanced Density and Weighting**: Ensures that no single model dominates the final output, leading to coherent and well-rounded responses.
3. **Optimized Generation Parameters**: Pre-tuned for superior performance with the Divine Intellect preset.
4. **Ease of Use**: Simplified setup and execution for efficient utilization.

### License
Released under the Llama 3.1 Community License.

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/sethuiyer__Llamazing-3.1-8B-Instruct-details)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |19.78|
|IFEval (0-Shot)    |33.42|
|BBH (3-Shot)       |32.51|
|MATH Lvl 5 (4-Shot)| 5.82|
|GPQA (0-shot)      | 6.38|
|MuSR (0-shot)      |11.82|
|MMLU-PRO (5-shot)  |28.75|