dpo-qlora-Qwen1.5-0.5B-Chat-xtuner is an dpo model from Qwen/Qwen1.5-0.5B-Chat. Direct preference optimization (DPO) is used for fine-tuning on HuggingFaceH4/ultrafeedback_binarized.

Limitations of dpo-qlora-Qwen1.5-0.5B-Chat-xtuner

  • Generate Inaccurate Code and Facts: The model may produce incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions.

  • Unreliable Responses to Instruction: The model has not undergone instruction fine-tuning. As a result, it may struggle or fail to adhere to intricate or nuanced instructions provided by users.

Downloads last month
142
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.