HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
It is fine-tuned from DeBERTa-v3-large and trained using HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models.
The training process involves knowledge distillation paired with data augmentation, using our HarmAug Generated Dataset.

For more information, please refer to our anonymous github

image/png

image/png

Downloads last month
7
Safetensors
Model size
435M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .