metadata
base_model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit
datasets:
- microsoft/orca-agentinstruct-1M-v1
pipeline_tag: text-generation
library_name: transformers
license: llama3.2
tags:
- unsloth
- transformers
model-index:
- name: analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit
results:
- task:
type: text-generation
dataset:
type: lm-evaluation-harness
name: hellaswag
metrics:
- name: acc
type: acc
value: 0.5141
verified: false
- name: acc_norm
type: acc_norm
value: 0.6793
verified: false
eval
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
hellaswag | 1 | none | 0 | acc | ↑ | 0.5141 | ± | 0.0050 |
none | 0 | acc_norm | ↑ | 0.6793 | ± | 0.0047 | ||
leaderboard_bbh | N/A | |||||||
- leaderboard_bbh_boolean_expressions | 1 | none | 3 | acc_norm | ↑ | 0.6040 | ± | 0.0310 |
- leaderboard_bbh_causal_judgement | 1 | none | 3 | acc_norm | ↑ | 0.5668 | ± | 0.0363 |
- leaderboard_bbh_date_understanding | 1 | none | 3 | acc_norm | ↑ | 0.4880 | ± | 0.0317 |
- leaderboard_bbh_disambiguation_qa | 1 | none | 3 | acc_norm | ↑ | 0.3760 | ± | 0.0307 |
- leaderboard_bbh_formal_fallacies | 1 | none | 3 | acc_norm | ↑ | 0.5400 | ± | 0.0316 |
- leaderboard_bbh_geometric_shapes | 1 | none | 3 | acc_norm | ↑ | 0.2200 | ± | 0.0263 |
- leaderboard_bbh_hyperbaton | 1 | none | 3 | acc_norm | ↑ | 0.5640 | ± | 0.0314 |
- leaderboard_bbh_logical_deduction_five_objects | 1 | none | 3 | acc_norm | ↑ | 0.4560 | ± | 0.0316 |
- leaderboard_bbh_logical_deduction_seven_objects | 1 | none | 3 | acc_norm | ↑ | 0.4360 | ± | 0.0314 |
- leaderboard_bbh_logical_deduction_three_objects | 1 | none | 3 | acc_norm | ↑ | 0.4880 | ± | 0.0317 |
- leaderboard_bbh_movie_recommendation | 1 | none | 3 | acc_norm | ↑ | 0.6360 | ± | 0.0305 |
- leaderboard_bbh_navigate | 1 | none | 3 | acc_norm | ↑ | 0.6200 | ± | 0.0308 |
- leaderboard_bbh_object_counting | 1 | none | 3 | acc_norm | ↑ | 0.4120 | ± | 0.0312 |
- leaderboard_bbh_penguins_in_a_table | 1 | none | 3 | acc_norm | ↑ | 0.3219 | ± | 0.0388 |
- leaderboard_bbh_reasoning_about_colored_objects | 1 | none | 3 | acc_norm | ↑ | 0.3440 | ± | 0.0301 |
- leaderboard_bbh_ruin_names | 1 | none | 3 | acc_norm | ↑ | 0.3240 | ± | 0.0297 |
- leaderboard_bbh_salient_translation_error_detection | 1 | none | 3 | acc_norm | ↑ | 0.3120 | ± | 0.0294 |
- leaderboard_bbh_snarks | 1 | none | 3 | acc_norm | ↑ | 0.4494 | ± | 0.0374 |
- leaderboard_bbh_sports_understanding | 1 | none | 3 | acc_norm | ↑ | 0.6040 | ± | 0.0310 |
- leaderboard_bbh_temporal_sequences | 1 | none | 3 | acc_norm | ↑ | 0.1000 | ± | 0.0190 |
- leaderboard_bbh_tracking_shuffled_objects_five_objects | 1 | none | 3 | acc_norm | ↑ | 0.1600 | ± | 0.0232 |
- leaderboard_bbh_tracking_shuffled_objects_seven_objects | 1 | none | 3 | acc_norm | ↑ | 0.1200 | ± | 0.0206 |
- leaderboard_bbh_tracking_shuffled_objects_three_objects | 1 | none | 3 | acc_norm | ↑ | 0.3440 | ± | 0.0301 |
- leaderboard_bbh_web_of_lies | 1 | none | 3 | acc_norm | ↑ | 0.5160 | ± | 0.0317 |
leaderboard_gpqa | N/A | |||||||
- leaderboard_gpqa_diamond | 1 | none | 0 | acc_norm | ↑ | 0.2727 | ± | 0.0317 |
- leaderboard_gpqa_extended | 1 | none | 0 | acc_norm | ↑ | 0.2802 | ± | 0.0192 |
- leaderboard_gpqa_main | 1 | none | 0 | acc_norm | ↑ | 0.2545 | ± | 0.0206 |
leaderboard_ifeval | 3 | none | 0 | inst_level_loose_acc | ↑ | 0.5252 | ± | N/A |
none | 0 | inst_level_strict_acc | ↑ | 0.4748 | ± | N/A | ||
none | 0 | prompt_level_loose_acc | ↑ | 0.3919 | ± | 0.0210 | ||
none | 0 | prompt_level_strict_acc | ↑ | 0.3420 | ± | 0.0204 | ||
leaderboard_math_hard | N/A | |||||||
- leaderboard_math_algebra_hard | 2 | none | 4 | exact_match | ↑ | 0.2150 | ± | 0.0235 |
- leaderboard_math_counting_and_prob_hard | 2 | none | 4 | exact_match | ↑ | 0.0244 | ± | 0.0140 |
- leaderboard_math_geometry_hard | 2 | none | 4 | exact_match | ↑ | 0.0606 | ± | 0.0208 |
- leaderboard_math_intermediate_algebra_hard | 2 | none | 4 | exact_match | ↑ | 0.0143 | ± | 0.0071 |
- leaderboard_math_num_theory_hard | 2 | none | 4 | exact_match | ↑ | 0.0649 | ± | 0.0199 |
- leaderboard_math_prealgebra_hard | 2 | none | 4 | exact_match | ↑ | 0.1762 | ± | 0.0275 |
- leaderboard_math_precalculus_hard | 2 | none | 4 | exact_match | ↑ | 0.0519 | ± | 0.0192 |
leaderboard_mmlu_pro | 0.1 | none | 5 | acc | ↑ | 0.2822 | ± | 0.0041 |
leaderboard_musr | N/A | |||||||
- leaderboard_musr_murder_mysteries | 1 | none | 0 | acc_norm | ↑ | 0.5400 | ± | 0.0316 |
- leaderboard_musr_object_placements | 1 | none | 0 | acc_norm | ↑ | 0.2344 | ± | 0.0265 |
- leaderboard_musr_team_allocation | 1 | none | 0 | acc_norm | ↑ | 0.3200 | ± | 0.0296 |
Framework versions
- unsloth 2024.11.5
- trl 0.12.0
Training HW
- V100