micaebe commited on
Commit
54f7bba
1 Parent(s): f6325e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -15
README.md CHANGED
@@ -11,26 +11,28 @@ tags:
11
  - sft
12
  - math
13
  library_name: transformers
14
-
15
  model-index:
16
- - name: Qwen2.5-1.5B-Instruct-QwQ
17
- results:
18
- - task:
19
- type: text-generation
20
- dataset:
21
- name: GSM8k
22
- type: gsm8k
23
- metrics:
24
- - name: pass@4
25
- type: pass@4
26
- value: 85.15
27
- verified: false
28
  ---
29
  # Qwen2.5-1.5B-Instruct-QwQ
30
 
31
  ## Introduction
32
 
33
- Qwen2.5-1.5B-Instruct-QwQ is a fine-tuned model based on Qwen2.5-1.5B-Instruct. It was fine-tuned on roughly 20k samples from QwQ-32B-Preview. Compared to Qwen2.5-1.5B-Instruct, this fine-tuned model seems more performant in mathematics contexts and general reasoning. Also it shows some capabilities of self-correction, altough it seems a bit limited because of the size (bigger models seem to learn self-correction more easily, e.g. the 3B & 7B version show much better self-correction abilities).
 
 
 
34
 
35
  **This repo contains the instruction-tuned 1.5B Qwen2.5 model fine-tuned on QwQ reasoning chains**, which has the following features:
36
  - Type: Causal Language Models
@@ -82,4 +84,6 @@ generated_ids = [
82
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
83
  ```
84
 
85
- Disclaimer: GSM scores are currently only fro the first 20% of the dataset. Will run the tests on all samples and adjust the score.
 
 
 
11
  - sft
12
  - math
13
  library_name: transformers
 
14
  model-index:
15
+ - name: Qwen2.5-1.5B-Instruct-QwQ
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ dataset:
20
+ name: GSM8k
21
+ type: gsm8k
22
+ metrics:
23
+ - name: pass@4
24
+ type: pass@4
25
+ value: 89.6
26
+ verified: false
27
  ---
28
  # Qwen2.5-1.5B-Instruct-QwQ
29
 
30
  ## Introduction
31
 
32
+ Qwen2.5-1.5B-Instruct-QwQ is a fine-tuned model based on Qwen2.5-1.5B-Instruct. It was fine-tuned on roughly 20k samples from QwQ-32B-Preview. Compared to Qwen2.5-1.5B-Instruct, this fine-tuned model seems more performant in mathematics contexts and general reasoning. Also it shows some capabilities of self-correction, altough it seems a bit limited (bigger models seem to learn self-correction better, e.g. the 3B & 7B version show much better self-correction abilities in my experiments).
33
+
34
+ For data generation, math problems from the train sets of the GSM8k and MATH datasets were used.
35
+
36
 
37
  **This repo contains the instruction-tuned 1.5B Qwen2.5 model fine-tuned on QwQ reasoning chains**, which has the following features:
38
  - Type: Causal Language Models
 
84
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
85
  ```
86
 
87
+ For GSM8k performance comparison: the base-instruct model scores 73.2% on the test set.
88
+
89
+ Disclaimer: GSM8k scores are currently only from the first 27% of the test set. Scores will be updated in the future, tested on the full dataset.