You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ›‘οΈ Sentinella: Lightweight Content Safety Guardian

🎯 Model Overview

Sentinella is a compact yet powerful content safety classifier designed specifically for Italian language moderation. This model serves as your efficient first line of defense against harmful content.

πŸ“Š Key Metrics

  • Size: 32M parameters
  • Accuracy: 93% on test set
  • Max Input Length: 8,192 tokens
  • Training Data: more than 100,000 balanced examples (harmful/safe)

πŸ”§ Technical Specifications

Base Architecture

  • Base Model: jinaai/jina-embeddings-v2-small-en
  • Model Adaptation:
    • Enhanced with a custom classifier head using a two-layer architecture
    • Optimized dropout rate of 0.1 for regularization
    • CLS token pooling strategy for sequence representation
    • Implemented with cross-entropy loss for binary classification

Classification Details

  • Output Labels:
    • NEGATIVE (0): Harmful content
    • POSITIVE (1): Safe content

πŸ’« Key Features

  • Lightweight: At just 32M parameters, Sentinella is designed for efficiency
  • Long Context: Handles up to 8k tokens of input text
  • High Performance: 93% accuracy in content safety classification
  • Optimized Architecture: Custom classification head with dimensionality reduction for improved efficiency

πŸš€ Use Cases

  • Content moderation for Italian text
  • Safe content filtering
  • Automated content screening
  • Real-time text analysis

πŸŽ“ Training Details

  • Training Dataset: more than 100,000 examples
    • Balanced distribution of safe and harmful content
    • Focused on Italian language text
  • Training Strategy:
    • Fine-tuned embedding representation
    • Intermediate layer dimensionality reduction
    • ReLU activation for non-linearity
    • Optimized dropout for regularization

πŸ“ˆ Performance Considerations

  • Optimized for real-time classification
  • Low memory footprint
  • Efficient inference time
  • Suitable for both CPU and GPU deployment

πŸ“ Citation

If you use Sentinella in your research or application, please cite this work as:

@model{sentinella,
  title={Sentinella: Lightweight Italian Content Safety Classifier},
  year={2024},
  publisher={[Michele Montebovi]},
  note={32M parameter content safety model}
}
Downloads last month
6,376
Safetensors
Model size
32.8M params
Tensor type
F32
Β·
Inference Examples
Unable to determine this model's library. Check the docs .