Whisper Medium ATC full

This model is a fine-tuned openai/whisper-medium on Czech and English air traffic communication recordings from Czech airport LKKU.

It was created as a product of bachelor's thesis at Faculty of Information Technology Brno University of Technology.

Model description

Usage

import torch
from transformers import pipeline

audio = "path/to/audio.xx"
device = "cuda:0" if torch.cuda.is_available() else "cpu"

transcribe = pipeline(task="automatic-speech-recognition", model="BUT-FIT/whisper-ATC-czech-full", chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(task="transcribe", language="czech")
print('Transcription:', transcribe(audio)["text"])

Dataset

Training dataset was made of ~5 hours of air traffic communication recordings. Recordings were Czech and English (80:20) and sporadically Slovak.

Output format

The model was learned to transcribe every recording word by word. Transcription format of a recording is as follows:

Recording: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů

Transcription: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů

Note: See also model BUT-FIT/whisper-ATC-czech-short, which abbreviates callsigns and numbers.

Results

The model reached total WER of 14.7 % on unseen Czech and English LKKU recordings. 19.6 % WER was achieved on a testset containing Czech air traffic recordings from other airports, LKPR and LKTB.

WER of callsings in LKKU recordings was evaluated to be 6.2 %, while on LKPR and LKTB dataset the model reached 3.6 %.

Training hyperparameters

  • learning_rate: 3e-5
  • per_device_train_batch_size: 2
  • gradient_accumulation_steps: 8
  • warmup_ratio: 0.12
  • fp16: True
  • gradient_checkpointing: True
  • evaluation_strategy: "epoch"
  • save_strategy: "epoch"
  • load_best_model_at_end: True
  • metric_for_best_model: "wer"
  • num_train_epochs: 45

Contact

For further information don't hesitate to contact Veronika Nevarilova (xnevar00@stud.fit.vutbr.cz) or Igor Szoke (szoke@fit.vutbr.cz).

Downloads last month
341
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.