dgx2_w2v2_large_distill_noisy_teacher_mozilla_epochs_50_batch_16
This model is a fine-tuned version of rohitp1/kkkh_w2v2_large_finetune_teacher_babble_noise_mozilla_50_epochs_batch_16 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 21652.1836
- Wer: 0.2592
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 256
- total_train_batch_size: 4096
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 100
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
74637.3574 | 7.31 | 250 | 4331.1958 | 0.2791 |
75858.376 | 14.63 | 500 | 7166.9727 | 0.2759 |
76494.272 | 21.94 | 750 | 9417.4209 | 0.2713 |
76375.128 | 29.26 | 1000 | 13408.2549 | 0.2680 |
74149.512 | 36.57 | 1250 | 14529.0449 | 0.2657 |
73472.352 | 43.89 | 1500 | 14684.6582 | 0.2643 |
72301.832 | 51.2 | 1750 | 15828.4707 | 0.2634 |
71340.256 | 58.51 | 2000 | 17094.2773 | 0.2614 |
71890.376 | 65.83 | 2250 | 17973.5566 | 0.2604 |
71789.656 | 73.14 | 2500 | 19330.4316 | 0.2599 |
71579.512 | 80.46 | 2750 | 19927.2129 | 0.2599 |
71862.48 | 87.77 | 3000 | 21301.7754 | 0.2592 |
71131.112 | 95.09 | 3250 | 21652.1836 | 0.2592 |
Framework versions
- Transformers 4.29.2
- Pytorch 2.0.0+cu117
- Datasets 2.8.0
- Tokenizers 0.13.2
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.