Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper • 2401.05566 • Published Jan 10 • 25
JudgeLM: Fine-tuned Large Language Models are Scalable Judges Paper • 2310.17631 • Published Oct 26, 2023 • 32
Instruction Tuning for Large Language Models: A Survey Paper • 2308.10792 • Published Aug 21, 2023 • 1
An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers Paper • 2403.02839 • Published Mar 5 • 1
Holistic Safety and Responsibility Evaluations of Advanced AI Models Paper • 2404.14068 • Published Apr 22
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications Paper • 2310.17750 • Published Oct 26, 2023 • 9