File size: 1,870 Bytes

---
license: mit
language:
- kbd
datasets:
- anzorq/kbd_speech
- anzorq/sixuxar_yijiri_mak7
metrics:
- wer
pipeline_tag: automatic-speech-recognition
---
# Circassian (Kabardian) ASR Model

This is a fine-tuned model for Automatic Speech Recognition (ASR) in `kbd`, based on the `facebook/w2v-bert-2.0` model.

The model was trained on a combination of the `anzorq/kbd_speech` (filtered on `country=russia`) and `anzorq/sixuxar_yijiri_mak7` datasets.

## Model Details

- **Base Model**: facebook/w2v-bert-2.0
- **Language**: Kabardian
- **Task**: Automatic Speech Recognition (ASR)
- **Datasets**: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7
- **Training Steps**: 5000

## Training

The model was fine-tuned using the following training arguments:

```python
TrainingArguments(
   output_dir='output',
   group_by_length=True,
   per_device_train_batch_size=8,
   gradient_accumulation_steps=2,
   evaluation_strategy="steps",
   num_train_epochs=10,
   gradient_checkpointing=True,
   fp16=True,
   save_steps=1000,
   eval_steps=500,
   logging_steps=300,
   learning_rate=5e-5,
   warmup_steps=500,
   save_total_limit=2,
   push_to_hub=True,
   report_to="wandb"
)
```

## Performance

The model's performance during training:

| Step | Training Loss | Validation Loss | WER     |
|------|---------------|-----------------|---------|
| 500  | 2.859600      | inf             | 0.870362|
| 1000 | 0.355500      | inf             | 0.703617|
| 1500 | 0.247100      | inf             | 0.549942|
| 2000 | 0.196700      | inf             | 0.471762|
| 2500 | 0.181500      | inf             | 0.361494|
| 3000 | 0.152200      | inf             | 0.314119|
| 3500 | 0.135700      | inf             | 0.275146|
| 4000 | 0.113400      | inf             | 0.252625|
| 4500 | 0.102900      | inf             | 0.277013|
| 5000 | 0.078500      | inf             | 0.250175|