JudgeBench: A Benchmark for Evaluating LLM-based Judges Paper • 2410.12784 • Published 21 days ago • 41
view article Article Experimenting with different training objectives for an AI evaluator By kaikaidai • 6 days ago • 2
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs Paper • 2410.18451 • Published 14 days ago • 13
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24 • 58
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 133
Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering Paper • 2408.07888 • Published Aug 15 • 10