micaebe
/

Qwen2.5-1.5B-Instruct-QwQ

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

micaebe commited on 15 days ago

Commit

54f7bba

•

1 Parent(s): f6325e0

Update README.md

Files changed (1) hide show

README.md +19 -15

README.md CHANGED Viewed

@@ -11,26 +11,28 @@ tags:
 - sft
 - math
 library_name: transformers
 model-index:
-  - name: Qwen2.5-1.5B-Instruct-QwQ
-    results:
-      - task:
-          type: text-generation
-        dataset:
-          name: GSM8k
-          type: gsm8k
-        metrics:
-          - name: pass@4
-            type: pass@4
-            value: 85.15
-            verified: false
 ---
 # Qwen2.5-1.5B-Instruct-QwQ
 ## Introduction
-Qwen2.5-1.5B-Instruct-QwQ is a fine-tuned model based on Qwen2.5-1.5B-Instruct. It was fine-tuned on roughly 20k samples from QwQ-32B-Preview. Compared to Qwen2.5-1.5B-Instruct, this fine-tuned model seems more performant in mathematics contexts and general reasoning. Also it shows some capabilities of self-correction, altough it seems a bit limited because of the size (bigger models seem to learn self-correction more easily, e.g. the 3B & 7B version show much better self-correction abilities).
 **This repo contains the instruction-tuned 1.5B Qwen2.5 model fine-tuned on QwQ reasoning chains**, which has the following features:
 - Type: Causal Language Models
@@ -82,4 +84,6 @@ generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
-Disclaimer: GSM scores are currently only fro the first 20% of the dataset. Will run the tests on all samples and adjust the score.

 - sft
 - math
 library_name: transformers
 model-index:
+- name: Qwen2.5-1.5B-Instruct-QwQ
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      name: GSM8k
+      type: gsm8k
+    metrics:
+    - name: pass@4
+      type: pass@4
+      value: 89.6
+      verified: false
 ---
 # Qwen2.5-1.5B-Instruct-QwQ
 ## Introduction
+Qwen2.5-1.5B-Instruct-QwQ is a fine-tuned model based on Qwen2.5-1.5B-Instruct. It was fine-tuned on roughly 20k samples from QwQ-32B-Preview. Compared to Qwen2.5-1.5B-Instruct, this fine-tuned model seems more performant in mathematics contexts and general reasoning. Also it shows some capabilities of self-correction, altough it seems a bit limited (bigger models seem to learn self-correction better, e.g. the 3B & 7B version show much better self-correction abilities in my experiments).
+For data generation, math problems from the train sets of the GSM8k and MATH datasets were used.
 **This repo contains the instruction-tuned 1.5B Qwen2.5 model fine-tuned on QwQ reasoning chains**, which has the following features:
 - Type: Causal Language Models
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
+For GSM8k performance comparison: the base-instruct model scores 73.2% on the test set.
+Disclaimer: GSM8k scores are currently only from the first 27% of the test set. Scores will be updated in the future, tested on the full dataset.