sethuiyer's picture
Adding Evaluation Results (#1)
4b9e0ea verified
metadata
library_name: transformers
tags:
  - narration
  - Truthful
base_model:
  - sethuiyer/Llamaverse-3.1-8B-Instruct
model-index:
  - name: Llamazing-3.1-8B-Instruct
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 33.42
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 32.51
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 5.82
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 6.38
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 11.82
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 28.75
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llamazing-3.1-8B-Instruct
          name: Open LLM Leaderboard

Llamazing-3.1-8B-Instruct

img

Overview

Llamazing-3.1-8B-Instruct balances reasoning, creativity, and conversational capabilities to deliver exceptional performance across various applications.

Usage

The following Python code demonstrates how to use Llamazing-3.1-8B-Instruct with the Divine Intellect preset:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

class LlamazingAssistant:
    def __init__(self, model_name="model_here", device="cuda"):
        self.device = device
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        if self.tokenizer.pad_token_id is None:
            self.tokenizer.pad_token_id = 11
            self.tokenizer.eos_token_id = 128009
        
        self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
        self.model.generation_config.pad_token_id = self.tokenizer.pad_token_id
        self.model.generation_config.eos_token_id = 128009
        self.sys_message = ''' 
        '''
        # Divine Intellect preset parameters
        self.temperature = 1.31
        self.top_p = 0.14
        self.epsilon_cutoff = 1.49
        self.eta_cutoff = 10.42
        self.repetition_penalty = 1.17
        self.top_k = 49

    def format_prompt(self, question):
        messages = [
            {"role": "system", "content": self.sys_message},
            {"role": "user", "content": question}
        ]
        prompt = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        return prompt

    def recursive_reflection(self, initial_response, question, max_new_tokens=512):
        reflection_prompt = f'''
        Initial Response: {initial_response}

        Reflect on the above response. Identify any inaccuracies, weaknesses, or areas for improvement. 
        If the response is strong, justify why it is valid. Otherwise, provide a revised and improved version.
        
        User Question: {question}
        '''
        inputs = self.tokenizer(reflection_prompt, return_tensors="pt").to(self.device)
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs, 
                max_new_tokens=max_new_tokens, 
                temperature=self.temperature,
                top_p=self.top_p,
                repetition_penalty=self.repetition_penalty,
                top_k=self.top_k,
                eta_cutoff=self.eta_cutoff,
                epsilon_cutoff=self.epsilon_cutoff,
                do_sample=True,
                use_cache=True
            )
        refined_answer = self.tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
        return refined_answer

    def generate_response(self, question, max_new_tokens=512, enable_reflection=True):
        # Generate the initial response
        prompt = self.format_prompt(question)
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs, 
                max_new_tokens=max_new_tokens, 
                temperature=self.temperature,
                top_p=self.top_p,
                repetition_penalty=self.repetition_penalty,
                top_k=self.top_k,
                eta_cutoff=self.eta_cutoff,
                epsilon_cutoff=self.epsilon_cutoff,
                do_sample=True,
                use_cache=True
            )
        initial_response = self.tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
        # Perform recursive self-reflection if enabled
        if enable_reflection:
            return self.recursive_reflection(initial_response, question, max_new_tokens)
        else:
            return initial_response

Key Features

  1. Multi-Model Integration: Combines the expertise of several specialized models to excel in reasoning, creativity, and conversational capabilities.
  2. Balanced Density and Weighting: Ensures that no single model dominates the final output, leading to coherent and well-rounded responses.
  3. Optimized Generation Parameters: Pre-tuned for superior performance with the Divine Intellect preset.
  4. Ease of Use: Simplified setup and execution for efficient utilization.

License

Released under the Llama 3.1 Community License.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 19.78
IFEval (0-Shot) 33.42
BBH (3-Shot) 32.51
MATH Lvl 5 (4-Shot) 5.82
GPQA (0-shot) 6.38
MuSR (0-shot) 11.82
MMLU-PRO (5-shot) 28.75