Qwen2.5-1.5B-Instruct-QwQ

Introduction

Qwen2.5-1.5B-Instruct-QwQ is a fine-tuned model based on Qwen2.5-1.5B-Instruct. It was fine-tuned on roughly 20k samples from QwQ-32B-Preview. Compared to Qwen2.5-1.5B-Instruct, this fine-tuned model seems more performant in mathematics contexts and general reasoning. Also it shows some capabilities of self-correction, altough it seems a bit limited (bigger models seem to learn self-correction better, e.g. the 3B & 7B version show much better self-correction abilities in my experiments).

For data generation, math problems from the train sets of the GSM8k and MATH datasets were used.

This repo contains the instruction-tuned 1.5B Qwen2.5 model fine-tuned on QwQ reasoning chains, which has the following features:

  • Type: Causal Language Models
  • Training Stage: Pretraining & Post-training
  • Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings
  • Number of Parameters: 1.54B
  • Number of Paramaters (Non-Embedding): 1.31B
  • Number of Layers: 28
  • Number of Attention Heads (GQA): 12 for Q and 2 for KV
  • Context Length: Full 32,768 tokens and generation 8192 tokens

Quickstart

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "micaebe/Qwen2.5-1.5B-Instruct-QwQ"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Here is an example of the question "Is 9.9 or 9.10 larger", sampled greedily. It gets the question correct, but the reasoning seems a bit weird at places. At the beginning its pretty strong but when it tries to solve the problem from other views it's starting to deviate sometimes and has some incorrect reasoning chains:

<|im_start|>system
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|>
<|im_start|>user
Is 9.9 or 9.10 larger?<|im_end|>
<|im_start|>assistant
So I have this problem here: "Is 9.9 or 9.10 larger?" Okay, let's see what it's asking.

First off, both numbers are decimals with the same whole number part, which is 9 in this case. So that means they're both nine units long, right?

Now, when comparing two decimal numbers where the whole parts are equal, we just need to compare the digits after the decimal point. That makes sense because if the whole parts are the same, then the difference must be in the tenths place.

Let me write down the numbers again:

- 9.9

- 9.10

Okay, so the whole part is 9 for both, but the tenths places differ. In 9.9, the tenth is 9, and in 9.10, the tenth is 1.

Since 9 is greater than 1, it follows that 9.9 is greater than 9.10.

Wait a minute, does that make sense? Let's think about it another way.

Imagine you have 9 apples, and someone gives you an additional 0.9 of an apple. Now, do you have more apples or fewer apples compared to having 9.10 apples?

Well, 0.9 is less than 1, so adding it to 9 would give you 9.9, which is still less than 9.10.

But wait, maybe there's something else to consider. What if I were to add these numbers together?

Let's try adding them up:

9.9 + 9.10 = ?

Hmm, that might help clarify things.

Adding 9.9 and 9.10:

First, align the decimal points:

  9.90
+ 9.10
------

Now, add the hundredths first:

0 + 0 = 0

Then, tens:

9 + 1 = 10

Bring down the zero from the hundredths place:

10.0

Finally, ones:

9 + 9 = 18

So, 9.9 + 9.10 = 18.00

That seems straightforward. But earlier, I thought that since the whole parts are the same, the comparison should be based on the tenths place only. But now, adding them shows that 9.9 is indeed smaller than 9.10.

I'm getting confused here. Maybe I need to rethink my approach.

Alternatively, perhaps I can look at the numbers as fractions to get a better understanding.

Both 9.9 and 9.10 can be written as fractions:

9.9 = 99/10

9.10 = 91/10

Now, comparing 99/10 and 91/10.

Clearly, 99 is greater than 91, so 99/10 > 91/10.

Therefore, 9.9 > 9.10.

But wait, that doesn't seem right either. Earlier, when I added them, 9.9 + 9.10 = 18.00, which suggests that 9.9 is less than 9.10.

Maybe I'm missing something fundamental here.

Let me think differently.

Consider the numbers as lengths on a number line.

The number line has negative numbers to the left and positive numbers to the right.

If I plot -9.9 and -9.10 on the number line, which one is farther to the left?

Well, -9.9 is further to the left than -9.10, because -9.9 is closer to zero than -9.10.

Similarly, if I plot 9.9 and 9.10 on the number line, which one is farther to the right?

Again, 9.9 is farther to the right than 9.10, because 9.9 is closer to ten than 9.10.

This reinforces that 9.9 is greater than 9.10.

Wait, but earlier, when I was trying to add them, I got 18.00, which is greater than any other possible sum involving 9.9 and 9.10.

So, perhaps the confusion lies in how I interpreted the addition.

Let me re-examine the addition:

9.9 + 9.10 = 18.00

Yes, that's correct.

And since 18.00 is greater than any other possible sum involving 9.9 and 9.10, it confirms that 9.9 is greater than 9.10.

But wait, maybe I should consider that 9.9 is actually 9.90, not 9.9.

Because sometimes people write decimals without trailing zeros, like 9.9 instead of 9.90.

In that case, 9.90 is exactly the same as 9.9, so the comparison remains the same.

Therefore, 9.90 is greater than 9.10.

But according to standard notation, 9.9 is typically used, even though 9.90 is mathematically equivalent.

So, sticking with 9.9, which is 9.90, it's clear that 9.9 is larger than 9.10.

Hence, the answer is 9.9 is larger.

**Final Answer**

\[ \boxed{9.9} \]<|im_end|>

For GSM8k performance comparison: the base-instruct model scores 73.2% on the test set.

Disclaimer: GSM8k scores are currently only from the first 27% of the test set. Scores will be updated in the future, tested on the full dataset.

Downloads last month
75
Safetensors
Model size
1.54B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for micaebe/Qwen2.5-1.5B-Instruct-QwQ

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(57)
this model
Merges
1 model

Evaluation results