File size: 3,950 Bytes
59a94c1 08d9b6a 761974b 08d9b6a 2f06609 08d9b6a 761974b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
---
license: apache-2.0
datasets:
- Gen-Verse/ReasonFlux-V2-Reasoner-DPO
language:
- en
- zh
base_model:
- Qwen/Qwen3-1.7B
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation-inference
- code
- trl
- DPO
---

# **ReasonFlux-Qwen3-dpo**
> **ReasonFlux-Qwen3-dpo** is a fine-tuned version of **Qwen3-1.7B**, trained on the [**Gen-Verse/ReasonFlux-V2-Reasoner-DPO**](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO) dataset.
> It adopts a **template-augmented reasoning paradigm**, internalizing structured **thought templates** through **iterative hierarchical reinforcement learning** and **direct preference optimization (DPO)**.
> This design enables the model to reason more transparently, consistently, and adaptively across multi-domain scientific and mathematical tasks.
> \[!note]
> GGUF: [https://huggingface.co/prithivMLmods/ReasonFlux-Qwen3-dpo-GGUF](https://huggingface.co/prithivMLmods/ReasonFlux-Qwen3-dpo-GGUF)
---
## **Key Features**
1. **Template-Augmented Reasoning**
Incorporates structured **reasoning templates** that guide step-by-step thinking, improving coherence and reducing hallucinations.
2. **DPO Fine-Tuning with Hierarchical Reinforcement**
Leverages **direct preference optimization** along with **iterative reinforcement learning**, internalizing high-quality reasoning behaviors.
3. **Scientific & Mathematical Expertise**
Excels at symbolic derivations, step-by-step proofs, and multi-domain STEM reasoning (physics, chemistry, biology, mathematics).
4. **Code Understanding & Generation**
Provides detailed coding explanations, debugging support, and optimization hints across multiple programming languages.
5. **Structured Output Mastery**
Fluent in producing outputs across **LaTeX**, **Markdown**, **JSON**, **CSV**, and **YAML** for seamless integration in research and technical workflows.
6. **Efficient Deployment**
Lightweight yet powerful, designed for **mid-range GPUs**, **research clusters**, and **edge AI environments**.
---
## **Quickstart with Transformers**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/ReasonFlux-Qwen3-dpo"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain how reinforcement learning differs from supervised learning with real-world examples."
messages = [
{"role": "system", "content": "You are a reasoning tutor skilled in science, math, and coding."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
---
## **Intended Use**
* Advanced reasoning tutor for mathematics, coding, and scientific research
* Research assistant capable of structured problem-solving with template-guided reasoning
* Technical documentation and structured data generation
* STEM-focused chatbot or API for research and education workflows
* Deployment in environments requiring transparent reasoning with efficient compute use
## **Limitations**
* Not optimized for casual or creative writing
* Context limitations may restrict multi-document or full codebase comprehension
* Specializes in structured reasoning—general chit-chat may underperform
* Optimized for **clarity of reasoning** rather than **natural conversational tone** |