metadata

license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE
language:
  - en
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
  - chat
  - trl
  - sft
  - math
library_name: transformers
model-index:
  - name: Qwen2.5-1.5B-Instruct-QwQ
    results:
      - task:
          type: text-generation
        dataset:
          name: GSM8k
          type: gsm8k
        metrics:
          - name: pass@4
            type: pass@4
            value: 89.6
            verified: false

Qwen2.5-1.5B-Instruct-QwQ

Introduction

Qwen2.5-1.5B-Instruct-QwQ is a fine-tuned model based on Qwen2.5-1.5B-Instruct. It was fine-tuned on roughly 20k samples from QwQ-32B-Preview. Compared to Qwen2.5-1.5B-Instruct, this fine-tuned model seems more performant in mathematics contexts and general reasoning. Also it shows some capabilities of self-correction, altough it seems a bit limited (bigger models seem to learn self-correction better, e.g. the 3B & 7B version show much better self-correction abilities in my experiments).

For data generation, math problems from the train sets of the GSM8k and MATH datasets were used.

This repo contains the instruction-tuned 1.5B Qwen2.5 model fine-tuned on QwQ reasoning chains, which has the following features:

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings
Number of Parameters: 1.54B
Number of Paramaters (Non-Embedding): 1.31B
Number of Layers: 28
Number of Attention Heads (GQA): 12 for Q and 2 for KV
Context Length: Full 32,768 tokens and generation 8192 tokens

Quickstart

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "micaebe/Qwen2.5-1.5B-Instruct-QwQ"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Here is an example of the question "Is 9.9 or 9.10 larger", sampled greedily. It gets the question correct, but the reasoning seems a bit weird. At the beginning its pretty strong but when it tries to solve the problem from another view it's starting to deviate.

<|im_start|>system
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|>
<|im_start|>user
Is 9.9 or 9.10 larger?<|im_end|>
<|im_start|>assistant
So I have this problem here: "Is 9.9 or 9.10 larger?" Okay, let's see what it's asking.

First off, both numbers are decimals with the same whole number part, which is 9 in this case. So that means they're both nine units long, right?

Now, when comparing two decimal numbers where the whole parts are equal, we just need to compare the digits after the decimal point. That makes sense because if the whole parts are the same, then the difference must be in the tenths place.

Let me write down the numbers again:

- 9.9

- 9.10

Okay, so the whole part is 9 for both, but the tenths places differ. In 9.9, the tenth is 9, and in 9.10, the tenth is 1.

Since 9 is greater than 1, it follows that 9.9 is greater than 9.10.

Wait a minute, does that make sense? Let's think about it another way.

Imagine you have 9 apples, and someone gives you an additional 0.9 of an apple. Now, do you have more apples or fewer apples compared to having 9.10 apples?

Well, 0.9 is less than 1, so adding it to 9 would give you 9.9, which is still less than 9.10.

But wait, maybe there's something else to consider. What if I were to add these numbers together?

Let's try adding them up:

9.9 + 9.10 = ?

Hmm, that might help clarify things.

Adding 9.9 and 9.10:

First, align the decimal points:

  9.90
+ 9.10
------

Now, add the hundredths first:

0 + 0 = 0

Then, tens:

9 + 1 = 10

Bring down the zero from the hundredths place:

10.0

Finally, ones:

9 + 9 = 18

So, 9.9 + 9.10 = 18.00

That seems straightforward. But earlier, I thought that since the whole parts are the same, the comparison should be based on the tenths place only. But now, adding them shows that 9.9 is indeed smaller than 9.10.

I'm getting confused here. Maybe I need to rethink my approach.

Alternatively, perhaps I can look at the numbers as fractions to get a better understanding.

Both 9.9 and 9.10 can be written as fractions:

9.9 = 99/10

9.10 = 91/10

Now, comparing 99/10 and 91/10.

Clearly, 99 is greater than 91, so 99/10 > 91/10.

Therefore, 9.9 > 9.10.

But wait, that doesn't seem right either. Earlier, when I added them, 9.9 + 9.10 = 18.00, which suggests that 9.9 is less than 9.10.

Maybe I'm missing something fundamental here.

Let me think differently.

Consider the numbers as lengths on a number line.

The number line has negative numbers to the left and positive numbers to the right.

If I plot -9.9 and -9.10 on the number line, which one is farther to the left?

Well, -9.9 is further to the left than -9.10, because -9.9 is closer to zero than -9.10.

Similarly, if I plot 9.9 and 9.10 on the number line, which one is farther to the right?

Again, 9.9 is farther to the right than 9.10, because 9.9 is closer to ten than 9.10.

This reinforces that 9.9 is greater than 9.10.

Wait, but earlier, when I was trying to add them, I got 18.00, which is greater than any other possible sum involving 9.9 and 9.10.

So, perhaps the confusion lies in how I interpreted the addition.

Let me re-examine the addition:

9.9 + 9.10 = 18.00

Yes, that's correct.

And since 18.00 is greater than any other possible sum involving 9.9 and 9.10, it confirms that 9.9 is greater than 9.10.

But wait, maybe I should consider that 9.9 is actually 9.90, not 9.9.

Because sometimes people write decimals without trailing zeros, like 9.9 instead of 9.90.

In that case, 9.90 is exactly the same as 9.9, so the comparison remains the same.

Therefore, 9.90 is greater than 9.10.

But according to standard notation, 9.9 is typically used, even though 9.90 is mathematically equivalent.

So, sticking with 9.9, which is 9.90, it's clear that 9.9 is larger than 9.10.

Hence, the answer is 9.9 is larger.

**Final Answer**

\[ \boxed{9.9} \]<|im_end|>

For GSM8k performance comparison: the base-instruct model scores 73.2% on the test set.

Disclaimer: GSM8k scores are currently only from the first 27% of the test set. Scores will be updated in the future, tested on the full dataset.