sentinella / README.md
DeepMount00's picture
Update README.md
2ff4d02 verified
---
license: apache-2.0
language:
- it
- en
pipeline_tag: text-classification
---
# πŸ›‘οΈ Sentinella: Lightweight Content Safety Guardian
## 🎯 Model Overview
Sentinella is a compact yet powerful content safety classifier designed specifically for Italian language moderation. This model serves as your efficient first line of defense against harmful content.
### πŸ“Š Key Metrics
- **Size**: 32M parameters
- **Accuracy**: 93% on test set
- **Max Input Length**: 8,192 tokens
- **Training Data**: more than 100,000 balanced examples (harmful/safe)
## πŸ”§ Technical Specifications
### Base Architecture
- **Base Model**: jinaai/jina-embeddings-v2-small-en
- **Model Adaptation**:
- Enhanced with a custom classifier head using a two-layer architecture
- Optimized dropout rate of 0.1 for regularization
- CLS token pooling strategy for sequence representation
- Implemented with cross-entropy loss for binary classification
### Classification Details
- **Output Labels**:
- NEGATIVE (0): Harmful content
- POSITIVE (1): Safe content
## πŸ’« Key Features
- **Lightweight**: At just 32M parameters, Sentinella is designed for efficiency
- **Long Context**: Handles up to 8k tokens of input text
- **High Performance**: 93% accuracy in content safety classification
- **Optimized Architecture**: Custom classification head with dimensionality reduction for improved efficiency
## πŸš€ Use Cases
- Content moderation for Italian text
- Safe content filtering
- Automated content screening
- Real-time text analysis
## πŸŽ“ Training Details
- **Training Dataset**: more than 100,000 examples
- Balanced distribution of safe and harmful content
- Focused on Italian language text
- **Training Strategy**:
- Fine-tuned embedding representation
- Intermediate layer dimensionality reduction
- ReLU activation for non-linearity
- Optimized dropout for regularization
## πŸ“ˆ Performance Considerations
- Optimized for real-time classification
- Low memory footprint
- Efficient inference time
- Suitable for both CPU and GPU deployment
## πŸ“ Citation
If you use Sentinella in your research or application, please cite this work as:
```
@model{sentinella,
title={Sentinella: Lightweight Italian Content Safety Classifier},
year={2024},
publisher={[Michele Montebovi]},
note={32M parameter content safety model}
}
```