metadata

license: mit
language:
  - kbd
datasets:
  - anzorq/kbd_speech
  - anzorq/sixuxar_yijiri_mak7
metrics:
  - wer
pipeline_tag: automatic-speech-recognition

Circassian (Kabardian) ASR Model

This is a fine-tuned model for Automatic Speech Recognition (ASR) in kbd, based on the facebook/w2v-bert-2.0 model.

The model was trained on a combination of the anzorq/kbd_speech (filtered on country=russia) and anzorq/sixuxar_yijiri_mak7 datasets.

Model Details

Base Model: facebook/w2v-bert-2.0
Language: Kabardian
Task: Automatic Speech Recognition (ASR)
Datasets: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7
Training Steps: 5000

Training

The model was fine-tuned using the following training arguments:

TrainingArguments(
   output_dir='output',
   group_by_length=True,
   per_device_train_batch_size=8,
   gradient_accumulation_steps=2,
   evaluation_strategy="steps",
   num_train_epochs=10,
   gradient_checkpointing=True,
   fp16=True,
   save_steps=1000,
   eval_steps=500,
   logging_steps=300,
   learning_rate=5e-5,
   warmup_steps=500,
   save_total_limit=2,
   push_to_hub=True,
   report_to="wandb"
)

Performance

The model's performance during training:

Step	Training Loss	Validation Loss	WER
500	2.859600	inf	0.870362
1000	0.355500	inf	0.703617
1500	0.247100	inf	0.549942
2000	0.196700	inf	0.471762
2500	0.181500	inf	0.361494
3000	0.152200	inf	0.314119
3500	0.135700	inf	0.275146
4000	0.113400	inf	0.252625
4500	0.102900	inf	0.277013
5000	0.078500	inf	0.250175