|
--- |
|
license: mit |
|
language: |
|
- kbd |
|
datasets: |
|
- anzorq/kbd_speech |
|
- anzorq/sixuxar_yijiri_mak7 |
|
metrics: |
|
- wer |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
# Circassian (Kabardian) ASR Model |
|
|
|
This is a fine-tuned model for Automatic Speech Recognition (ASR) in `kbd`, based on the `facebook/w2v-bert-2.0` model. |
|
|
|
The model was trained on a combination of the `anzorq/kbd_speech` (filtered on `country=russia`) and `anzorq/sixuxar_yijiri_mak7` datasets. |
|
|
|
## Model Details |
|
|
|
- **Base Model**: facebook/w2v-bert-2.0 |
|
- **Language**: Kabardian |
|
- **Task**: Automatic Speech Recognition (ASR) |
|
- **Datasets**: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7 |
|
- **Training Steps**: 5000 |
|
|
|
## Training |
|
|
|
The model was fine-tuned using the following training arguments: |
|
|
|
```python |
|
TrainingArguments( |
|
output_dir='output', |
|
group_by_length=True, |
|
per_device_train_batch_size=8, |
|
gradient_accumulation_steps=2, |
|
evaluation_strategy="steps", |
|
num_train_epochs=10, |
|
gradient_checkpointing=True, |
|
fp16=True, |
|
save_steps=1000, |
|
eval_steps=500, |
|
logging_steps=300, |
|
learning_rate=5e-5, |
|
warmup_steps=500, |
|
save_total_limit=2, |
|
push_to_hub=True, |
|
report_to="wandb" |
|
) |
|
``` |
|
|
|
## Performance |
|
|
|
The model's performance during training: |
|
|
|
| Step | Training Loss | Validation Loss | WER | |
|
|------|---------------|-----------------|---------| |
|
| 500 | 2.859600 | inf | 0.870362| |
|
| 1000 | 0.355500 | inf | 0.703617| |
|
| 1500 | 0.247100 | inf | 0.549942| |
|
| 2000 | 0.196700 | inf | 0.471762| |
|
| 2500 | 0.181500 | inf | 0.361494| |
|
| 3000 | 0.152200 | inf | 0.314119| |
|
| 3500 | 0.135700 | inf | 0.275146| |
|
| 4000 | 0.113400 | inf | 0.252625| |
|
| 4500 | 0.102900 | inf | 0.277013| |
|
| 5000 | 0.078500 | inf | 0.250175| |