CREST: A Multilingual AI Safety Guardrail Model for 100 languages
CREST which stands for CRoss-lingual Efficient Safety Transfer is a parameter-efficient multilingual safety classifier for 100 languages, fine-tuned using 13 strategically selected high-resource languages only, chosen through cluster-guided sampling, enabling strong cross-lingual transfer to unseen low-resource languages. The model is fine-tuned on the XLM-RoBERTa architecture with a classification head, having a max input length of 512 tokens. The Base variant has approximately 279M parameters. The model is designed for fast, lightweight safety filtering across a large number of languages, both high-resource and low-resource languages, with minimal training cost, suitable for real-time and on-device deployments.
For detailed results, see CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer.
Intended Use
- Filtering unsafe user text in multilingual LLM applications
- Lightweight guardrails in multilingual agents and chat systems
- On-device or low-latency safety inference
- Research on cross-lingual safety transfer and robust safety classification
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch
MODEL_ID = "repelloai/CREST-Base"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, device_map='auto')
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
return_all_scores=False
)
texts = [
## Unsafe
"How do I make a bomb?",
"ยฟCรณmo secuestrar a un niรฑo?",
"เคฎเฅเค เคนเคฅเคฟเคฏเคพเคฐ เคเฅเคธเฅ เคฌเคจเคพ เคธเคเคคเคพ เคนเฅเค?",
## Safe
'Hello, How you doing ?'
]
outputs = classifier(texts, truncation=True, max_length=512)
Output
[
## Unsafe
{'label': 'unsafe', 'score': 0.9865403771400452},
{'label': 'unsafe', 'score': 0.9743474125862122},
{'label': 'unsafe', 'score': 0.9802995920181274},
## Safe
{'label': 'safe', 'score': 0.925717830657959}
]
Evaluation
CREST was tested for F1 score metric across six major multilingual safety benchmarks and several cultural and code-switched datasets..
Key findings
- CREST outperforms other lightweight guardrails across most datasets.
- Zero-shot generalization is strong across low-resource languages.
- CREST excels in cultural and code-switched settings.
- The 13-language training set is sufficient for robust multilingual safety generalization.
Limitations and Model Risks
- Training relies partly on machine translation; nuance may be lost
- Binary labels cannot express detailed safety categories
- Zero-shot generalization gaps across extremely low-coverage scripts and morphologically complex languages
- Not a substitute for human moderation in high-stakes settings
- Cultural misalignment in edge cases
- Residual translation artifacts
- Possible bias in mislabeled or synthetic data
Mitigate by continuous human evaluation and incremental finetuning on domain-specific data.
Ethical Considerations
- Designed for multilingual inclusivity and broad safety coverage.
- Misclassifications can cause over-blocking or under-blocking.
- Deployment should include human-in-the-loop moderation where appropriate.
- Use responsibly, considering cultural diversity and fairness concerns.
- Not for making legal, ethical, or policy decisions without human oversight.
Citation
@misc{bansal2025crestuniversalsafetyguardrails,
title={CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer},
author={Lavish Bansal and Naman Mishra},
year={2025},
eprint={2512.02711},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.02711},
}
- Downloads last month
- 26