Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security Paper โข 2507.19399 โข Published Jul 25 โข 1
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators Paper โข 2507.15339 โข Published Jul 21
Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation Paper โข 2507.11966 โข Published Jul 16
Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications Paper โข 2507.09820 โข Published Jul 13
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages Paper โข 2507.05980 โข Published Jul 8 โข 1
MinorBench: A hand-built benchmark for content-based risks for children Paper โข 2503.10242 โข Published Mar 13 โข 5
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Paper โข 2411.12946 โข Published Nov 20, 2024 โข 22