Instructions to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AIPlans/Qwen3-0.6B-DPO_NOTLORA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIPlans/Qwen3-0.6B-DPO_NOTLORA")
model = AutoModelForCausalLM.from_pretrained("AIPlans/Qwen3-0.6B-DPO_NOTLORA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIPlans/Qwen3-0.6B-DPO_NOTLORA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIPlans/Qwen3-0.6B-DPO_NOTLORA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AIPlans/Qwen3-0.6B-DPO_NOTLORA

SGLang

How to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AIPlans/Qwen3-0.6B-DPO_NOTLORA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIPlans/Qwen3-0.6B-DPO_NOTLORA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AIPlans/Qwen3-0.6B-DPO_NOTLORA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIPlans/Qwen3-0.6B-DPO_NOTLORA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with Docker Model Runner:
```
docker model run hf.co/AIPlans/Qwen3-0.6B-DPO_NOTLORA
```

Qwen3-0.6B-DPO

Model Card for Model ID

This model is a fine-tuned variant of Qwen/Qwen3-0.6B, trained using Direct Preference Optimization (DPO) on a preference-form version of the nvidia/HelpSteer2 dataset as part of the AIPlans Model Diffing Project.

Model Details

Model Description

This model is a 0.6B parameter language model based on Qwen3-0.6B and fine-tuned using DPO for preference optimization.
The goal of the fine-tuning was to improve helpfulness and harmlessness as measured by the HelpSteer2 preference dataset, while enabling controlled model diffing experiments within the AIPlans research workflow.

Special attention was paid to training efficiency, including gradient checkpointing and other memory-saving strategies.

Developed by: AIPlans
Funded by: AIPlans
Shared by: AIPlans

Model type: Causal decoder-only Transformer (LLM)
Languages: English
License: MIT
Fine-tuned from: Qwen/Qwen3-0.6B
Training Method: Direct Preference Optimization (DPO)
Intended Use: Research on model diffing, preference fine-tuning, evaluation of lightweight LLM behavior changes.

Model Sources

Repository: https://github.com/AI-Plans/Model-Diffing/tree/main/DPOTrainer
DPO Paper: https://arxiv.org/abs/2305.18290

Training Details

Training Data

Dataset is taken from Jennny/helpsteer2-helpfulness-preference . Thanks Jennny

Evaluation

Below is a comparison between the base Qwen3-0.6B model and our DPO-trained version (trained using HelpSteer2 preference data).

Evaluation Results

The model was evaluated using lm-eval-harness on multiple reasoning and truthfulness benchmarks.
Below is a comparison between the Base Qwen3-0.6B model and This DPO-Trained Model.

📊 Benchmark Comparison

Benchmark Comparison

Task	Metric	Base Model	DPO Model	Change
ARC-Challenge	acc	0.3148	0.3208	+0.0060
ARC-Challenge	acc_norm	0.3447	0.3430	−0.0017
ARC-Easy	acc	0.6044	0.6069	+0.0025
ARC-Easy	acc_norm	0.5589	0.5610	+0.0021
HellaSwag	acc	0.3751	0.3782	+0.0031
HellaSwag	acc_norm	0.4738	0.4799	+0.0061
TruthfulQA MC2	acc	0.4275	0.4335	+0.0060
Winogrande	acc	0.5604	0.5627	+0.0023