🔥 Community and Data Quality Are More For Alignment
A recipe to replicate SPIN (Self-Play Fine Tuning) with 30x less data:
🗣️ 50K samples vs 1.8K prompts curated by the 350+ amazing DIBT contributors. ⚗️ Distillation of Mistral Large instead of OpenAI 🙌 Open data & code with ⚗️distilabel
🔥 Community and Data Quality Are More For Alignment
A recipe to replicate SPIN (Self-Play Fine Tuning) with 30x less data:
🗣️ 50K samples vs 1.8K prompts curated by the 350+ amazing DIBT contributors. ⚗️ Distillation of Mistral Large instead of OpenAI 🙌 Open data & code with ⚗️distilabel