Update README.md

761974b verified 2 days ago

3.95 kB

	---
	license: apache-2.0
	datasets:
	- Gen-Verse/ReasonFlux-V2-Reasoner-DPO
	language:
	- en
	- zh
	base_model:
	- Qwen/Qwen3-1.7B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- text-generation-inference
	- code
	- trl
	- DPO
	---

	![1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/JdFSTIRr6eR0sp9hJ7xAV.png)

	# ReasonFlux-Qwen3-dpo

	> ReasonFlux-Qwen3-dpo is a fine-tuned version of Qwen3-1.7B, trained on the [Gen-Verse/ReasonFlux-V2-Reasoner-DPO](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO) dataset.
	> It adopts a template-augmented reasoning paradigm, internalizing structured thought templates through iterative hierarchical reinforcement learning and direct preference optimization (DPO).
	> This design enables the model to reason more transparently, consistently, and adaptively across multi-domain scientific and mathematical tasks.

	> \[!note]
	> GGUF: [https://huggingface.co/prithivMLmods/ReasonFlux-Qwen3-dpo-GGUF](https://huggingface.co/prithivMLmods/ReasonFlux-Qwen3-dpo-GGUF)

	---

	## Key Features

	1. Template-Augmented Reasoning
	Incorporates structured reasoning templates that guide step-by-step thinking, improving coherence and reducing hallucinations.

	2. DPO Fine-Tuning with Hierarchical Reinforcement
	Leverages direct preference optimization along with iterative reinforcement learning, internalizing high-quality reasoning behaviors.

	3. Scientific & Mathematical Expertise
	Excels at symbolic derivations, step-by-step proofs, and multi-domain STEM reasoning (physics, chemistry, biology, mathematics).

	4. Code Understanding & Generation
	Provides detailed coding explanations, debugging support, and optimization hints across multiple programming languages.

	5. Structured Output Mastery
	Fluent in producing outputs across LaTeX, Markdown, JSON, CSV, and YAML for seamless integration in research and technical workflows.

	6. Efficient Deployment
	Lightweight yet powerful, designed for mid-range GPUs, research clusters, and edge AI environments.

	---

	## Quickstart with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "prithivMLmods/ReasonFlux-Qwen3-dpo"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Explain how reinforcement learning differs from supervised learning with real-world examples."

	messages = [
	{"role": "system", "content": "You are a reasoning tutor skilled in science, math, and coding."},
	{"role": "user", "content": prompt}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	---

	## Intended Use

	* Advanced reasoning tutor for mathematics, coding, and scientific research
	* Research assistant capable of structured problem-solving with template-guided reasoning
	* Technical documentation and structured data generation
	* STEM-focused chatbot or API for research and education workflows
	* Deployment in environments requiring transparent reasoning with efficient compute use

	## Limitations

	* Not optimized for casual or creative writing
	* Context limitations may restrict multi-document or full codebase comprehension
	* Specializes in structured reasoning—general chit-chat may underperform
	* Optimized for clarity of reasoning rather than natural conversational tone