jacktol commited on
Commit
f243e13
1 Parent(s): c4bfc0e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - jacktol/atc-dataset
5
+ language:
6
+ - en
7
+ metrics:
8
+ - wer
9
+ base_model:
10
+ - openai/whisper-medium.en
11
+ pipeline_tag: automatic-speech-recognition
12
+ tags:
13
+ - aviation
14
+ - atc
15
+ - aircraft
16
+ - communication
17
+ model-index:
18
+ - name: Whisper Medium EN Fine-Tuned for ATC (Faster-Whisper)
19
+ results:
20
+ - task:
21
+ type: automatic-speech-recognition
22
+ dataset:
23
+ name: ATC Dataset
24
+ type: jacktol/atc-dataset
25
+ metrics:
26
+ - name: Word Error Rate (WER)
27
+ type: wer
28
+ value: 15.08
29
+ source:
30
+ name: ATC Transcription Evaluation
31
+ url: https://huggingface.co/jacktol/whisper-medium.en-fine-tuned-for-ATC-faster-whisper
32
+ ---
33
+
34
+ # Whisper Medium EN Fine-Tuned for Air Traffic Control (ATC) - Faster-Whisper Optimized
35
+
36
+ ## Model Overview
37
+
38
+ This model is a fine-tuned version of OpenAI's Whisper Medium EN model, specifically trained on **Air Traffic Control (ATC)** communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, reducing the **Word Error Rate (WER) by 84%**, compared to the original pretrained model. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.
39
+
40
+ This model has been **converted to an optimized `.bin` format**, making it compatible with **Faster-Whisper** for faster and more efficient inference.
41
+
42
+ - **Base Model**: OpenAI Whisper Medium EN
43
+ - **Fine-tuned Model WER**: 15.08%
44
+ - **Pretrained Model WER**: 94.59%
45
+ - **Relative Improvement**: 84.06%
46
+ - **Optimized Format**: Compatible with Faster-Whisper
47
+
48
+ You can access the fine-tuned model on Hugging Face:
49
+ - [Whisper Medium EN Fine-Tuned for ATC](https://huggingface.co/jacktol/whisper-medium.en-fine-tuned-for-ATC)
50
+ - [Whisper Medium EN Fine-Tuned for ATC (Faster Whisper)](https://huggingface.co/jacktol/whisper-medium.en-fine-tuned-for-ATC-faster-whisper)
51
+
52
+ ## Model Description
53
+
54
+ Whisper Medium EN fine-tuned for ATC is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from the **[ATC Dataset](https://huggingface.co/datasets/jacktol/atc-dataset)**, a combined and cleaned dataset sourced from the following:
55
+
56
+ - **[ATCO2 corpus](https://huggingface.co/datasets/Jzuluaga/atco2_corpus_1h)** (1-hour test subset)
57
+ - **[UWB-ATCC corpus](https://huggingface.co/datasets/Jzuluaga/uwb_atcc)**
58
+
59
+ The **ATC Dataset** merges these two original sources, filtering and refining the data to enhance transcription accuracy for domain-specific ATC communications. The model has been further **optimized to a `.bin` format for compatibility with Faster-Whisper**, ensuring faster and more efficient processing.
60
+
61
+ ## Intended Use
62
+
63
+ The fine-tuned Whisper model is designed for:
64
+ - **Transcribing aviation communication**: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
65
+ - **Air Traffic Control Systems**: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.
66
+ - **Research and training**: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.
67
+
68
+ You can test the model online using the [ATC Transcription Assistant](https://huggingface.co/spaces/jacktol/ATC-Transcription-Assistant), which lets you upload audio files and generate transcriptions.
69
+
70
+ ## Training Procedure
71
+
72
+ - **Hardware**: Fine-tuning was conducted on two A100 GPUs with 80GB memory.
73
+ - **Epochs**: 10
74
+ - **Learning Rate**: 1e-5
75
+ - **Batch Size**: 32 (effective batch size with gradient accumulation)
76
+ - **Augmentation**: Dynamic data augmentation techniques (Gaussian noise, pitch shifting, etc.) were applied during training.
77
+ - **Evaluation Metric**: Word Error Rate (WER)
78
+
79
+ ## Limitations
80
+
81
+ While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered during training.
82
+
83
+ ## References
84
+
85
+ - **Blog Post**: [Fine-Tuning Whisper for ATC: 84% Improvement in Transcription Accuracy](https://jacktol.net/posts/fine-tuning_whisper_for_atc/)
86
+ - **GitHub Repository**: [Fine-Tuning Whisper on ATC Data](https://github.com/jack-tol/fine-tuning-whisper-on-atc-data/tree/main)