metadata
license: mit
language:
- kbd
datasets:
- anzorq/kbd_speech
- anzorq/sixuxar_yijiri_mak7
metrics:
- wer
pipeline_tag: automatic-speech-recognition
Circassian (Kabardian) ASR Model
This is a fine-tuned model for Automatic Speech Recognition (ASR) in kbd
, based on the facebook/w2v-bert-2.0
model.
The model was trained on a combination of the anzorq/kbd_speech
(filtered on country=russia
) and anzorq/sixuxar_yijiri_mak7
datasets.
Model Details
- Base Model: facebook/w2v-bert-2.0
- Language: Kabardian
- Task: Automatic Speech Recognition (ASR)
- Datasets: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7
- Training Steps: 5000
Training
The model was fine-tuned using the following training arguments:
TrainingArguments(
output_dir='output',
group_by_length=True,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
evaluation_strategy="steps",
num_train_epochs=10,
gradient_checkpointing=True,
fp16=True,
save_steps=1000,
eval_steps=500,
logging_steps=300,
learning_rate=5e-5,
warmup_steps=500,
save_total_limit=2,
push_to_hub=True,
report_to="wandb"
)
Performance
The model's performance during training:
Step | Training Loss | Validation Loss | WER |
---|---|---|---|
500 | 2.859600 | inf | 0.870362 |
1000 | 0.355500 | inf | 0.703617 |
1500 | 0.247100 | inf | 0.549942 |
2000 | 0.196700 | inf | 0.471762 |
2500 | 0.181500 | inf | 0.361494 |
3000 | 0.152200 | inf | 0.314119 |
3500 | 0.135700 | inf | 0.275146 |
4000 | 0.113400 | inf | 0.252625 |
4500 | 0.102900 | inf | 0.277013 |
5000 | 0.078500 | inf | 0.250175 |