Abstract
We introduce the Granite Guardian models, a suite of safeguards designed to provide risk detection for prompts and responses, enabling safe and responsible use in combination with any large language model (LLM). These models offer comprehensive coverage across multiple risk dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, and hallucination-related risks such as context relevance, groundedness, and answer relevance for retrieval-augmented generation (RAG). Trained on a unique dataset combining human annotations from diverse sources and synthetic data, Granite Guardian models address risks typically overlooked by traditional risk detection models, such as jailbreaks and RAG-specific issues. With AUC scores of 0.871 and 0.854 on harmful content and RAG-hallucination-related benchmarks respectively, Granite Guardian is the most generalizable and competitive model available in the space. Released as open-source, Granite Guardian aims to promote responsible AI development across the community. https://github.com/ibm-granite/granite-guardian
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation (2024)
- 100% Hallucination Elimination Using Acurai (2024)
- Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots (2024)
- LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models (2024)
- FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs (2024)
- Trustful LLMs: Customizing and Grounding Text Generation with knowledge bases and Dual Decoders (2024)
- DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 0
No dataset linking this paper