csabakecskemeti commited on
Commit
3b9fbdb
1 Parent(s): dc97aaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -2
README.md CHANGED
@@ -8,11 +8,103 @@ license: llama3.2
8
  tags:
9
  - unsloth
10
  - transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
-
14
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e6d37e02dee9bcb9d9fa18/X4WG8AnMFqJuWkRvA0CrW.png)
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ### Framework versions
18
 
@@ -20,4 +112,5 @@ tags:
20
  - trl 0.12.0
21
 
22
  ### Training HW
23
- - V100
 
 
8
  tags:
9
  - unsloth
10
  - transformers
11
+ model-index:
12
+ - name: analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ dataset:
17
+ type: lm-evaluation-harness
18
+ name: bbh
19
+ metrics:
20
+ - name: acc_norm
21
+ type: acc_norm
22
+ value: 0.4168
23
+ verified: false
24
+ - task:
25
+ type: text-generation
26
+ dataset:
27
+ type: lm-evaluation-harness
28
+ name: gpqa
29
+ metrics:
30
+ - name: acc_norm
31
+ type: acc_norm
32
+ value: 0.2691
33
+ verified: false
34
+ - task:
35
+ type: text-generation
36
+ dataset:
37
+ type: lm-evaluation-harness
38
+ name: math
39
+ metrics:
40
+ - name: exact_match
41
+ type: exact_match
42
+ value: 0.0867
43
+ verified: false
44
+ - task:
45
+ type: text-generation
46
+ dataset:
47
+ type: lm-evaluation-harness
48
+ name: mmlu
49
+ metrics:
50
+ - name: acc_norm
51
+ type: acc_norm
52
+ value: 0.2822
53
+ verified: false
54
+ - task:
55
+ type: text-generation
56
+ dataset:
57
+ type: lm-evaluation-harness
58
+ name: musr
59
+ metrics:
60
+ - name: acc_norm
61
+ type: acc_norm
62
+ value: 0.3648
63
+ verified: false
64
+ - task:
65
+ type: text-generation
66
+ dataset:
67
+ type: lm-evaluation-harness
68
+ name: hellaswag
69
+ metrics:
70
+ - name: acc
71
+ type: acc
72
+ value: 0.5141
73
+ verified: false
74
+ - name: acc_norm
75
+ type: acc_norm
76
+ value: 0.6793
77
+ verified: false
78
+
79
  ---
80
 
 
81
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e6d37e02dee9bcb9d9fa18/X4WG8AnMFqJuWkRvA0CrW.png)
82
 
83
+ ### Eval
84
+
85
+ The fine tuned model (DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit)
86
+ has gained performace over the base model (unsloth/Llama-3.2-3B-Instruct-bnb-4bit)
87
+ in the following tasks.
88
+
89
+ | Test | Base Model | Fine-Tuned Model | Performance Gain |
90
+ |---|---|---|---|
91
+ | leaderboard_bbh_logical_deduction_seven_objects | 0.2520 | 0.4360 | 0.1840 |
92
+ | leaderboard_bbh_logical_deduction_five_objects | 0.3560 | 0.4560 | 0.1000 |
93
+ | leaderboard_musr_team_allocation | 0.2200 | 0.3200 | 0.1000 |
94
+ | leaderboard_bbh_disambiguation_qa | 0.3040 | 0.3760 | 0.0720 |
95
+ | leaderboard_gpqa_diamond | 0.2222 | 0.2727 | 0.0505 |
96
+ | leaderboard_bbh_movie_recommendation | 0.5960 | 0.6360 | 0.0400 |
97
+ | leaderboard_bbh_formal_fallacies | 0.5080 | 0.5400 | 0.0320 |
98
+ | leaderboard_bbh_tracking_shuffled_objects_three_objects | 0.3160 | 0.3440 | 0.0280 |
99
+ | leaderboard_bbh_causal_judgement | 0.5455 | 0.5668 | 0.0214 |
100
+ | leaderboard_bbh_web_of_lies | 0.4960 | 0.5160 | 0.0200 |
101
+ | leaderboard_math_geometry_hard | 0.0455 | 0.0606 | 0.0152 |
102
+ | leaderboard_math_num_theory_hard | 0.0519 | 0.0649 | 0.0130 |
103
+ | leaderboard_musr_murder_mysteries | 0.5280 | 0.5400 | 0.0120 |
104
+ | leaderboard_gpqa_extended | 0.2711 | 0.2802 | 0.0092 |
105
+ | leaderboard_bbh_sports_understanding | 0.5960 | 0.6040 | 0.0080 |
106
+ | leaderboard_math_intermediate_algebra_hard | 0.0107 | 0.0143 | 0.0036 |
107
+
108
 
109
  ### Framework versions
110
 
 
112
  - trl 0.12.0
113
 
114
  ### Training HW
115
+ - V100
116
+