metadata
license: apache-2.0
language:
- it
- en
pipeline_tag: text-generation
datasets:
- DeepMount00/o1-ITA-REASONING
- DeepMount00/GPT-4o-ITA-INSTRUCT
- DeepMount00/Sonnet-3.5-ITA-INSTRUCT
- DeepMount00/open-perfectblend-ita
- HuggingFaceTB/cosmopedia
- DeepMount00/pretraining_multi
Alireo-400M ๐ค ๐ฎ๐น
A Lightweight Italian Language Model
Model Description ๐
Alireo-400M is a lightweight yet powerful Italian language model with 400M parameters, designed to provide efficient natural language processing capabilities while maintaining a smaller footprint compared to larger models.
Key Features โจ
- Architecture: Transformer-based language model ๐๏ธ
- Parameters: 400M ๐
- Context Window: 8K tokens ๐ช
- Training Data: Curated Italian text corpus (books, articles, web content) ๐
- Model Size: ~800MB ๐พ
Performance ๐
Despite its compact size, Alireo-400M demonstrates impressive performance:
- Benchmark Results: Outperforms Qwen 0.5B across multiple benchmarks ๐
- Language Understanding: Maintains high accuracy in Italian language understanding tasks ๐ฏ
- Speed: Efficient inference speed due to optimized architecture โก
Limitations โ ๏ธ
- Limited context window compared to larger models
- May struggle with highly specialized technical content
- Performance may vary on dialectal variations
- Not suitable for multilingual tasks
Hardware Requirements ๐ป
- Minimum RAM: 2GB
- Recommended RAM: 4GB
- GPU: Optional, but recommended for faster inference
- Disk Space: ~1GB (including model and dependencies)
Citation ๐
@software{alireo2024,
author = {[Michele Montebovi]},
title = {Alireo-400M: A Lightweight Italian Language Model},
year = {2024},
}