Edit model card

Whisper Large V2 Portuguese πŸ‡§πŸ‡·πŸ‡΅πŸ‡Ή

Bem-vindo ao whisper large-v2 para transcrição em portuguΓͺs πŸ‘‹πŸ»

Transcribe Portuguese audio to text with the highest precision.

  • Loss: 0.282
  • Wer: 5.590

This model is a fine-tuned version of openai/whisper-large-v2 on the mozilla-foundation/common_voice_11 dataset. If you want a lighter model, you may be interested in jlondonobo/whisper-medium-pt. It achieves faster inference with almost no difference in WER.

Comparable models

Reported WER is based on the evaluation subset of Common Voice.

Training hyperparameters

We used the following hyperparameters for training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 5000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.0828 1.09 1000 0.1868 6.778
0.0241 3.07 2000 0.2057 6.109
0.0084 5.06 3000 0.2367 6.029
0.0015 7.04 4000 0.2469 5.709
0.0009 9.02 5000 0.2821 5.590 πŸ€—

Framework versions

  • Transformers 4.26.0.dev0
  • Pytorch 1.13.0+cu117
  • Datasets 2.7.1.dev0
  • Tokenizers 0.13.2
Downloads last month
15
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train jlondonobo/whisper-large-v2-pt

Space using jlondonobo/whisper-large-v2-pt 1

Evaluation results