File size: 1,870 Bytes
f310d0d 0a2b433 f310d0d 0a2b433 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: mit
language:
- kbd
datasets:
- anzorq/kbd_speech
- anzorq/sixuxar_yijiri_mak7
metrics:
- wer
pipeline_tag: automatic-speech-recognition
---
# Circassian (Kabardian) ASR Model
This is a fine-tuned model for Automatic Speech Recognition (ASR) in `kbd`, based on the `facebook/w2v-bert-2.0` model.
The model was trained on a combination of the `anzorq/kbd_speech` (filtered on `country=russia`) and `anzorq/sixuxar_yijiri_mak7` datasets.
## Model Details
- **Base Model**: facebook/w2v-bert-2.0
- **Language**: Kabardian
- **Task**: Automatic Speech Recognition (ASR)
- **Datasets**: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7
- **Training Steps**: 5000
## Training
The model was fine-tuned using the following training arguments:
```python
TrainingArguments(
output_dir='output',
group_by_length=True,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
evaluation_strategy="steps",
num_train_epochs=10,
gradient_checkpointing=True,
fp16=True,
save_steps=1000,
eval_steps=500,
logging_steps=300,
learning_rate=5e-5,
warmup_steps=500,
save_total_limit=2,
push_to_hub=True,
report_to="wandb"
)
```
## Performance
The model's performance during training:
| Step | Training Loss | Validation Loss | WER |
|------|---------------|-----------------|---------|
| 500 | 2.859600 | inf | 0.870362|
| 1000 | 0.355500 | inf | 0.703617|
| 1500 | 0.247100 | inf | 0.549942|
| 2000 | 0.196700 | inf | 0.471762|
| 2500 | 0.181500 | inf | 0.361494|
| 3000 | 0.152200 | inf | 0.314119|
| 3500 | 0.135700 | inf | 0.275146|
| 4000 | 0.113400 | inf | 0.252625|
| 4500 | 0.102900 | inf | 0.277013|
| 5000 | 0.078500 | inf | 0.250175| |