DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Paper • 2502.05163 • Published 5 days ago • 18
BEEAR Collection These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1