csabakecskemeti's picture
Update README.md
f01ec6b verified
|
raw
history blame
3.17 kB
metadata
base_model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit
datasets:
  - microsoft/orca-agentinstruct-1M-v1
pipeline_tag: text-generation
library_name: transformers
license: llama3.2
tags:
  - unsloth
  - transformers
model-index:
  - name: analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit
    results:
      - task:
          type: text-generation
        dataset:
          type: lm-evaluation-harness
          name: bbh
        metrics:
          - name: acc_norm
            type: acc_norm
            value: 0.4168
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-evaluation-harness
          name: gpqa
        metrics:
          - name: acc_norm
            type: acc_norm
            value: 0.2691
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-evaluation-harness
          name: math
        metrics:
          - name: exact_match
            type: exact_match
            value: 0.0867
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-evaluation-harness
          name: mmlu
        metrics:
          - name: acc_norm
            type: acc_norm
            value: 0.2822
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-evaluation-harness
          name: musr
        metrics:
          - name: acc_norm
            type: acc_norm
            value: 0.3648
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-evaluation-harness
          name: hellaswag
        metrics:
          - name: acc
            type: acc
            value: 0.5141
            verified: false
          - name: acc_norm
            type: acc_norm
            value: 0.6793
            verified: false

image/png

eval

The fine tuned model (DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit) has gained performace over the base model (unsloth/Llama-3.2-3B-Instruct-bnb-4bit) in the following tasks.

Test Base Model Fine-Tuned Model Performance Gain
leaderboard_bbh_logical_deduction_seven_objects 0.2520 0.4360 0.1840
leaderboard_bbh_logical_deduction_five_objects 0.3560 0.4560 0.1000
leaderboard_musr_team_allocation 0.2200 0.3200 0.1000
leaderboard_bbh_disambiguation_qa 0.3040 0.3760 0.0720
leaderboard_gpqa_diamond 0.2222 0.2727 0.0505
leaderboard_bbh_movie_recommendation 0.5960 0.6360 0.0400
leaderboard_bbh_formal_fallacies 0.5080 0.5400 0.0320
leaderboard_bbh_tracking_shuffled_objects_three_objects 0.3160 0.3440 0.0280
leaderboard_bbh_causal_judgement 0.5455 0.5668 0.0214
leaderboard_bbh_web_of_lies 0.4960 0.5160 0.0200
leaderboard_math_geometry_hard 0.0455 0.0606 0.0152
leaderboard_math_num_theory_hard 0.0519 0.0649 0.0130
leaderboard_musr_murder_mysteries 0.5280 0.5400 0.0120
leaderboard_gpqa_extended 0.2711 0.2802 0.0092
leaderboard_bbh_sports_understanding 0.5960 0.6040 0.0080
leaderboard_math_intermediate_algebra_hard 0.0107 0.0143 0.0036

Framework versions

  • unsloth 2024.11.5
  • trl 0.12.0

Training HW

  • V100