Tiny Audio
Speech recognition combining Whisper encoder with SmolLM3 decoder.
Usage
from transformers import AutoModel
model = AutoModel.from_pretrained("mazesmazes/tiny-audio", trust_remote_code=True)
transcription = model.transcribe("audio.wav")
Architecture
- Encoder: Whisper-small (frozen)
- Projector: RMSNorm → Linear projection → AvgPool (2x downsampling) → RMSNorm
- Decoder: SmolLM3 with LoRA
Training
Datasets: LibriSpeech, GigaSpeech, Common Voice, LoquaciousSet
- BF16 mixed precision
- Streaming datasets
- Frozen encoder, LoRA fine-tuning on decoder
Links
- Downloads last month
- 968
Model tree for mazesmazes/tiny-audio
Base model
HuggingFaceTB/SmolLM3-3B-Base