— Long-context post-training 🧶 — Resources for post-training LLMs with long-context samples zai-org/LongAlign-10k Viewer • Updated Feb 22, 2024 • 9.89k • 5.33k • 79 HuggingFaceTB/smoltalk2 Viewer • Updated Jul 11 • 8.61M • 30.4k • 103 zai-org/LongReward-10k Viewer • Updated Oct 29, 2024 • 30k • 532 • 6 Tongyi-Zhiwen/DocQA-RL-1.6K Viewer • Updated May 23 • 3.6k • 245 • 35
Mistral 7B + UltraChat + Arithmo checkpoints A collection of Mistral 7B fine-tunes on UltraChat and Arithmo to boost the math capabilities of chat models. See https://x.com/_lewtun/status/1715652 lewtun/mistral-7b-sft-ultrachat-arithmo-full Text Generation • Updated Oct 21, 2023 • 10 • 1 lewtun/mistral-7b-sft-ultrachat-arithmo-50 Text Generation • Updated Oct 21, 2023 • 8 • 1 lewtun/mistral-7b-sft-ultrachat-arithmo-25 Text Generation • Updated Oct 21, 2023 • 6 stingning/ultrachat Viewer • Updated Feb 22, 2024 • 774k • 1.94k • 454
Gemma RLAIF lewtun/gemma-7b-sft-full-ultrachat-v0 Text Generation • 9B • Updated Feb 29, 2024 • 6 • 1 lewtun/gemma-7b-sft-full-dolly-v3 Text Generation • 9B • Updated Feb 29, 2024 • 6 lewtun/gemma-7b-sft-full-deita-10k-v0 Text Generation • 9B • Updated Feb 29, 2024 • 5 lewtun/gemma-7b-dpo-full-ultrafeedback-v0 Text Generation • Updated Feb 29, 2024 • 6
Awesome RLHF A curated collection of datasets, models, Spaces, and papers on Reinforcement Learning from Human Feedback (RLHF). Running 197 197 MT Bench 📊 Compare model answers to questions garage-bAInd/Open-Platypus Viewer • Updated Jan 24, 2024 • 24.9k • 5.28k • 403 meta-llama/Llama-2-7b-chat-hf Text Generation • 7B • Updated Apr 17, 2024 • 1.05M • 4.59k meta-llama/Llama-2-70b-chat-hf Text Generation • 69B • Updated Apr 17, 2024 • 77.6k • 2.2k
— Long-context post-training 🧶 — Resources for post-training LLMs with long-context samples zai-org/LongAlign-10k Viewer • Updated Feb 22, 2024 • 9.89k • 5.33k • 79 HuggingFaceTB/smoltalk2 Viewer • Updated Jul 11 • 8.61M • 30.4k • 103 zai-org/LongReward-10k Viewer • Updated Oct 29, 2024 • 30k • 532 • 6 Tongyi-Zhiwen/DocQA-RL-1.6K Viewer • Updated May 23 • 3.6k • 245 • 35
Awesome RLHF A curated collection of datasets, models, Spaces, and papers on Reinforcement Learning from Human Feedback (RLHF). Running 197 197 MT Bench 📊 Compare model answers to questions garage-bAInd/Open-Platypus Viewer • Updated Jan 24, 2024 • 24.9k • 5.28k • 403 meta-llama/Llama-2-7b-chat-hf Text Generation • 7B • Updated Apr 17, 2024 • 1.05M • 4.59k meta-llama/Llama-2-70b-chat-hf Text Generation • 69B • Updated Apr 17, 2024 • 77.6k • 2.2k
Mistral 7B + UltraChat + Arithmo checkpoints A collection of Mistral 7B fine-tunes on UltraChat and Arithmo to boost the math capabilities of chat models. See https://x.com/_lewtun/status/1715652 lewtun/mistral-7b-sft-ultrachat-arithmo-full Text Generation • Updated Oct 21, 2023 • 10 • 1 lewtun/mistral-7b-sft-ultrachat-arithmo-50 Text Generation • Updated Oct 21, 2023 • 8 • 1 lewtun/mistral-7b-sft-ultrachat-arithmo-25 Text Generation • Updated Oct 21, 2023 • 6 stingning/ultrachat Viewer • Updated Feb 22, 2024 • 774k • 1.94k • 454
Gemma RLAIF lewtun/gemma-7b-sft-full-ultrachat-v0 Text Generation • 9B • Updated Feb 29, 2024 • 6 • 1 lewtun/gemma-7b-sft-full-dolly-v3 Text Generation • 9B • Updated Feb 29, 2024 • 6 lewtun/gemma-7b-sft-full-deita-10k-v0 Text Generation • 9B • Updated Feb 29, 2024 • 5 lewtun/gemma-7b-dpo-full-ultrafeedback-v0 Text Generation • Updated Feb 29, 2024 • 6