Update README.md
Browse files
README.md
CHANGED
@@ -63,13 +63,14 @@ The fine-tuned Whisper model is designed for:
|
|
63 |
|
64 |
You can test the model online using the [ATC Transcription Assistant](https://huggingface.co/spaces/jacktol/ATC-Transcription-Assistant), which lets you upload audio files and generate transcriptions.
|
65 |
|
66 |
-
##
|
|
|
|
|
67 |
|
68 |
-
|
69 |
-
- **
|
70 |
-
- **UWB-ATCC**: A manually transcribed ATC corpus containing thousands of hours of recordings, focusing on air traffic communications.
|
71 |
|
72 |
-
|
73 |
|
74 |
## Training Procedure
|
75 |
|
|
|
63 |
|
64 |
You can test the model online using the [ATC Transcription Assistant](https://huggingface.co/spaces/jacktol/ATC-Transcription-Assistant), which lets you upload audio files and generate transcriptions.
|
65 |
|
66 |
+
## Model Description
|
67 |
+
|
68 |
+
Whisper Medium EN fine-tuned for ATC is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from the **[ATC Dataset](https://huggingface.co/datasets/jacktol/atc-dataset)**, a combined and cleaned dataset sourced from the following:
|
69 |
|
70 |
+
- **[ATCO2 corpus](https://huggingface.co/datasets/Jzuluaga/atco2_corpus_1h)** (1-hour test subset)
|
71 |
+
- **[UWB-ATCC corpus](https://huggingface.co/datasets/Jzuluaga/uwb_atcc)**
|
|
|
72 |
|
73 |
+
The **ATC Dataset** merges these two original sources, filtering and refining the data to enhance transcription accuracy for domain-specific ATC communications.
|
74 |
|
75 |
## Training Procedure
|
76 |
|