w2v-bert-2.0-kbd / README.md
anzorq's picture
Update README.md
f310d0d verified
|
raw
history blame
1.87 kB
metadata
license: mit
language:
  - kbd
datasets:
  - anzorq/kbd_speech
  - anzorq/sixuxar_yijiri_mak7
metrics:
  - wer
pipeline_tag: automatic-speech-recognition

Circassian (Kabardian) ASR Model

This is a fine-tuned model for Automatic Speech Recognition (ASR) in kbd, based on the facebook/w2v-bert-2.0 model.

The model was trained on a combination of the anzorq/kbd_speech (filtered on country=russia) and anzorq/sixuxar_yijiri_mak7 datasets.

Model Details

  • Base Model: facebook/w2v-bert-2.0
  • Language: Kabardian
  • Task: Automatic Speech Recognition (ASR)
  • Datasets: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7
  • Training Steps: 5000

Training

The model was fine-tuned using the following training arguments:

TrainingArguments(
   output_dir='output',
   group_by_length=True,
   per_device_train_batch_size=8,
   gradient_accumulation_steps=2,
   evaluation_strategy="steps",
   num_train_epochs=10,
   gradient_checkpointing=True,
   fp16=True,
   save_steps=1000,
   eval_steps=500,
   logging_steps=300,
   learning_rate=5e-5,
   warmup_steps=500,
   save_total_limit=2,
   push_to_hub=True,
   report_to="wandb"
)

Performance

The model's performance during training:

Step Training Loss Validation Loss WER
500 2.859600 inf 0.870362
1000 0.355500 inf 0.703617
1500 0.247100 inf 0.549942
2000 0.196700 inf 0.471762
2500 0.181500 inf 0.361494
3000 0.152200 inf 0.314119
3500 0.135700 inf 0.275146
4000 0.113400 inf 0.252625
4500 0.102900 inf 0.277013
5000 0.078500 inf 0.250175