csabakecskemeti's picture
Update README.md
5a69800 verified
|
raw
history blame
7.39 kB
metadata
base_model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit
datasets:
  - microsoft/orca-agentinstruct-1M-v1
pipeline_tag: text-generation
library_name: transformers
license: llama3.2
tags:
  - unsloth
  - transformers
model-index:
  - name: analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit
    results:
      - task:
          type: text-generation
        dataset:
          type: lm-evaluation-harness
          name: hellaswag
        metrics:
          - name: acc
            type: acc
            value: 0.5141
            verified: false
          - name: acc_norm
            type: acc_norm
            value: 0.6793
            verified: false

image/png

eval

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.5141 ± 0.0050
none 0 acc_norm 0.6793 ± 0.0047
leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.6040 ± 0.0310
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.5668 ± 0.0363
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.4880 ± 0.0317
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.3760 ± 0.0307
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.5400 ± 0.0316
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.2200 ± 0.0263
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.5640 ± 0.0314
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.4560 ± 0.0316
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.4360 ± 0.0314
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.4880 ± 0.0317
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.6360 ± 0.0305
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.6200 ± 0.0308
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.4120 ± 0.0312
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.3219 ± 0.0388
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.3440 ± 0.0301
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.3240 ± 0.0297
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.3120 ± 0.0294
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.4494 ± 0.0374
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.6040 ± 0.0310
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.1000 ± 0.0190
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.1600 ± 0.0232
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1200 ± 0.0206
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3440 ± 0.0301
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.5160 ± 0.0317
leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.2727 ± 0.0317
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.2802 ± 0.0192
- leaderboard_gpqa_main 1 none 0 acc_norm 0.2545 ± 0.0206
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.5252 ± N/A
none 0 inst_level_strict_acc 0.4748 ± N/A
none 0 prompt_level_loose_acc 0.3919 ± 0.0210
none 0 prompt_level_strict_acc 0.3420 ± 0.0204
leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 2 none 4 exact_match 0.2150 ± 0.0235
- leaderboard_math_counting_and_prob_hard 2 none 4 exact_match 0.0244 ± 0.0140
- leaderboard_math_geometry_hard 2 none 4 exact_match 0.0606 ± 0.0208
- leaderboard_math_intermediate_algebra_hard 2 none 4 exact_match 0.0143 ± 0.0071
- leaderboard_math_num_theory_hard 2 none 4 exact_match 0.0649 ± 0.0199
- leaderboard_math_prealgebra_hard 2 none 4 exact_match 0.1762 ± 0.0275
- leaderboard_math_precalculus_hard 2 none 4 exact_match 0.0519 ± 0.0192
leaderboard_mmlu_pro 0.1 none 5 acc 0.2822 ± 0.0041
leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5400 ± 0.0316
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.2344 ± 0.0265
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.3200 ± 0.0296

Framework versions

  • unsloth 2024.11.5
  • trl 0.12.0

Training HW

  • V100