A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Paper • 2411.12946 • Published Nov 20, 2024 • 20
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 75
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique Paper • 2408.10701 • Published Aug 20, 2024 • 12
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Paper • 2408.03837 • Published Aug 7, 2024 • 18