Reading List - a jinnovation Collection

jinnovation 's Collections

Reading List

updated May 30

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10 • 25

Note TLDR: Anthropic tries to train a deliberately deceptive LLM.
On the Societal Impact of Open Foundation Models

Paper • 2403.07918 • Published Feb 27 • 16
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 32
Instruction Tuning for Large Language Models: A Survey

Paper • 2308.10792 • Published Aug 21, 2023 • 1
An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers

Paper • 2403.02839 • Published Mar 5 • 1
Holistic Safety and Responsibility Evaluations of Advanced AI Models

Paper • 2404.14068 • Published Apr 22
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

Paper • 2310.17750 • Published Oct 26, 2023 • 9