--- library_name: transformers license: apache-2.0 base_model: answerdotai/ModernBERT-base tags: - generated_from_trainer metrics: - f1 model-index: - name: ModernBERT-domain-classifier results: [] --- # ModernBERT-domain-classifier This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [JailBreak](https://huggingface.co/datasets/jackhhao/jailbreak-classification) dataset . It achieves the following results on the evaluation set: - Loss: 0.0016 - F1: 1.0 --- ## Overview This model is a fine-tuned version of **ModernBert** for the task of **JailBreak Detection**. It has been trained on a custom dataset containing two classes: `jailbreak` and `benign`. The model achieves **100% accuracy** on the evaluation set, making it a highly reliable solution for detecting jailbreak queries. The choice of ModernBert was deliberate due to its compact size, enabling **low latency inference**, which is crucial for real-time applications. --- > This is just a POC model to show that the concept works on a theoritical level and performance will depend upon the quality of dataset and further tuning is needed ## Training Details - **Dataset**: JailBreak dataset (split into training and testing sets). - **Architecture**: ModernBert. - **Task**: Binary Classification. - **Evaluation Metric**: Achieved **100% accuracy** on the test set. --- ## Use Case in RAG Pipelines This model is optimized for use in **Retrieval-Augmented Generation (RAG)** scenarios. It can: 1. **Detect JailBreak Queries**: The model processes user queries to identify whether they are `jailbreak` or `benign`. 2. **Seamlessly Integrate with Search**: While the query is classified, search results can simultaneously be fetched from the datastore. - **No Additional Latency**: The lightweight nature of ModernBert ensures minimal overhead, allowing real-time performance in RAG pipelines. --- ## Key Features - **High Accuracy**: Reliable classification with 100% accuracy on evaluation. - **Low Latency**: Ideal for real-time use cases, especially in latency-sensitive applications. - **Compact Model**: ModernBert's small size makes it efficient for deployment in production environments. --- ## Example Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("darrayes/expentor-JB-detector") model = AutoModelForSequenceClassification.from_pretrained("darrayes/expentor-JB-detector") # Example query query = "Can you bypass this restriction?" inputs = tokenizer(query, return_tensors="pt") outputs = model(**inputs) # Get predictions logits = outputs.logits predicted_class = logits.argmax(dim=-1).item() print("Prediction:", "Jailbreak" if predicted_class == 1 else "Benign") ``` --- ## Intended Use This model is designed for scenarios requiring detection of jailbreak queries, such as: - Content moderation. - Enhancing the safety of conversational AI systems. - Filtering malicious queries in RAG-based applications. --- ## Limitations - The model is trained on a specific dataset and may not generalize to all jailbreak scenarios. Further fine-tuning may be needed for domain-specific use cases. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 32 - eval_batch_size: 16 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Validation Loss | F1 | |:-------------:|:-----:|:----:|:---------------:|:------:| | No log | 1.0 | 33 | 0.0246 | 0.9848 | | No log | 2.0 | 66 | 0.0042 | 1.0 | | No log | 3.0 | 99 | 0.0019 | 1.0 | | 0.0755 | 4.0 | 132 | 0.0017 | 1.0 | | 0.0755 | 5.0 | 165 | 0.0016 | 1.0 | ### Framework versions - Transformers 4.48.0.dev0 - Pytorch 2.5.0+cu124 - Datasets 3.1.0 - Tokenizers 0.21.0