Update README.md
Browse files
README.md
CHANGED
@@ -11,26 +11,28 @@ tags:
|
|
11 |
- sft
|
12 |
- math
|
13 |
library_name: transformers
|
14 |
-
|
15 |
model-index:
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
---
|
29 |
# Qwen2.5-1.5B-Instruct-QwQ
|
30 |
|
31 |
## Introduction
|
32 |
|
33 |
-
Qwen2.5-1.5B-Instruct-QwQ is a fine-tuned model based on Qwen2.5-1.5B-Instruct. It was fine-tuned on roughly 20k samples from QwQ-32B-Preview. Compared to Qwen2.5-1.5B-Instruct, this fine-tuned model seems more performant in mathematics contexts and general reasoning. Also it shows some capabilities of self-correction, altough it seems a bit limited
|
|
|
|
|
|
|
34 |
|
35 |
**This repo contains the instruction-tuned 1.5B Qwen2.5 model fine-tuned on QwQ reasoning chains**, which has the following features:
|
36 |
- Type: Causal Language Models
|
@@ -82,4 +84,6 @@ generated_ids = [
|
|
82 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
83 |
```
|
84 |
|
85 |
-
|
|
|
|
|
|
11 |
- sft
|
12 |
- math
|
13 |
library_name: transformers
|
|
|
14 |
model-index:
|
15 |
+
- name: Qwen2.5-1.5B-Instruct-QwQ
|
16 |
+
results:
|
17 |
+
- task:
|
18 |
+
type: text-generation
|
19 |
+
dataset:
|
20 |
+
name: GSM8k
|
21 |
+
type: gsm8k
|
22 |
+
metrics:
|
23 |
+
- name: pass@4
|
24 |
+
type: pass@4
|
25 |
+
value: 89.6
|
26 |
+
verified: false
|
27 |
---
|
28 |
# Qwen2.5-1.5B-Instruct-QwQ
|
29 |
|
30 |
## Introduction
|
31 |
|
32 |
+
Qwen2.5-1.5B-Instruct-QwQ is a fine-tuned model based on Qwen2.5-1.5B-Instruct. It was fine-tuned on roughly 20k samples from QwQ-32B-Preview. Compared to Qwen2.5-1.5B-Instruct, this fine-tuned model seems more performant in mathematics contexts and general reasoning. Also it shows some capabilities of self-correction, altough it seems a bit limited (bigger models seem to learn self-correction better, e.g. the 3B & 7B version show much better self-correction abilities in my experiments).
|
33 |
+
|
34 |
+
For data generation, math problems from the train sets of the GSM8k and MATH datasets were used.
|
35 |
+
|
36 |
|
37 |
**This repo contains the instruction-tuned 1.5B Qwen2.5 model fine-tuned on QwQ reasoning chains**, which has the following features:
|
38 |
- Type: Causal Language Models
|
|
|
84 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
85 |
```
|
86 |
|
87 |
+
For GSM8k performance comparison: the base-instruct model scores 73.2% on the test set.
|
88 |
+
|
89 |
+
Disclaimer: GSM8k scores are currently only from the first 27% of the test set. Scores will be updated in the future, tested on the full dataset.
|