Instructions to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AIPlans/Qwen3-0.6B-DPO_NOTLORA") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AIPlans/Qwen3-0.6B-DPO_NOTLORA") model = AutoModelForCausalLM.from_pretrained("AIPlans/Qwen3-0.6B-DPO_NOTLORA") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AIPlans/Qwen3-0.6B-DPO_NOTLORA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIPlans/Qwen3-0.6B-DPO_NOTLORA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AIPlans/Qwen3-0.6B-DPO_NOTLORA
- SGLang
How to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AIPlans/Qwen3-0.6B-DPO_NOTLORA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIPlans/Qwen3-0.6B-DPO_NOTLORA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AIPlans/Qwen3-0.6B-DPO_NOTLORA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIPlans/Qwen3-0.6B-DPO_NOTLORA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AIPlans/Qwen3-0.6B-DPO_NOTLORA with Docker Model Runner:
docker model run hf.co/AIPlans/Qwen3-0.6B-DPO_NOTLORA
Qwen3-0.6B-DPO
Model Card for Model ID
This model is a fine-tuned variant of Qwen/Qwen3-0.6B, trained using Direct Preference Optimization (DPO) on a preference-form version of the nvidia/HelpSteer2 dataset as part of the AIPlans Model Diffing Project.
Model Details
Model Description
This model is a 0.6B parameter language model based on Qwen3-0.6B and fine-tuned using DPO for preference optimization.
The goal of the fine-tuning was to improve helpfulness and harmlessness as measured by the HelpSteer2 preference dataset, while enabling controlled model diffing experiments within the AIPlans research workflow.
Special attention was paid to training efficiency, including gradient checkpointing and other memory-saving strategies.
Developed by: AIPlans
Funded by: AIPlans
Shared by: AIPlans
Model type: Causal decoder-only Transformer (LLM)
Languages: English
License: MIT
Fine-tuned from: Qwen/Qwen3-0.6B
Training Method: Direct Preference Optimization (DPO)
Intended Use: Research on model diffing, preference fine-tuning, evaluation of lightweight LLM behavior changes.
Model Sources
- Repository: https://github.com/AI-Plans/Model-Diffing/tree/main/DPOTrainer
- DPO Paper: https://arxiv.org/abs/2305.18290
Training Details
Training Data
Dataset is taken from Jennny/helpsteer2-helpfulness-preference . Thanks Jennny
Evaluation
Below is a comparison between the base Qwen3-0.6B model and our DPO-trained version (trained using HelpSteer2 preference data).
Evaluation Results
The model was evaluated using lm-eval-harness on multiple reasoning and truthfulness benchmarks.
Below is a comparison between the Base Qwen3-0.6B model and This DPO-Trained Model.
📊 Benchmark Comparison
Benchmark Comparison
| Task | Metric | Base Model | DPO Model | Change |
|---|---|---|---|---|
| ARC-Challenge | acc | 0.3148 | 0.3208 | +0.0060 |
| ARC-Challenge | acc_norm | 0.3447 | 0.3430 | −0.0017 |
| ARC-Easy | acc | 0.6044 | 0.6069 | +0.0025 |
| ARC-Easy | acc_norm | 0.5589 | 0.5610 | +0.0021 |
| HellaSwag | acc | 0.3751 | 0.3782 | +0.0031 |
| HellaSwag | acc_norm | 0.4738 | 0.4799 | +0.0061 |
| TruthfulQA MC2 | acc | 0.4275 | 0.4335 | +0.0060 |
| Winogrande | acc | 0.5604 | 0.5627 | +0.0023 |
Model Card Authors
Jithesh Pavan D Souza – AIPlans Research Intern
Model Card Contact
Jithesh – jithesh1602@gmail.com
- Downloads last month
- 13