DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Paper • 2502.05163 • Published 5 days ago • 18
BEEAR Collection These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1
BEEAR Collection These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1
BEEAR Collection These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1
BEEAR Collection These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 11
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content Paper • 2403.13031 • Published Mar 19, 2024 • 1