--- base_model: Qwen/Qwen2-0.5B-Instruct datasets: dataset_name library_name: transformers model_name: online-dpo-qwen2-3 tags: - trl - online-dpo - generated_from_trainer licence: license --- # Model Card for online-dpo-qwen2-3 This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt dataset.