File size: 55,189 Bytes
6f3d506
d2cc210
 
 
7ed1c93
d2cc210
 
 
 
 
 
4220518
1b081b3
d2cc210
 
 
ead4b2b
1679ff0
ead4b2b
5e296a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c9dbc2
 
74d55a9
 
1679ff0
74d55a9
c811efe
 
 
 
 
 
 
ead4b2b
7c9dbc2
 
 
372a1e4
 
 
 
 
 
 
 
 
 
7c9dbc2
ead4b2b
9c2558a
 
27fe5eb
6f3d506
45c735d
 
6f3d506
ead4b2b
64d375d
90af826
64d375d
41827a2
ead4b2b
 
d72acf0
 
90af826
64d375d
d72acf0
41827a2
45c735d
ead4b2b
45c735d
6f3d506
 
 
 
 
 
 
 
 
 
45c735d
41827a2
45c735d
 
41827a2
45c735d
 
6f3d506
4b0635d
 
 
 
 
41827a2
 
 
7c9dbc2
41827a2
6f3d506
4b0635d
 
6f3d506
d72acf0
9c2558a
6f3d506
 
 
45c735d
6f3d506
 
 
 
d72acf0
 
 
6f3d506
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138

---
pretty_name: "WhisperKit ASR Evaluation Results"
viewer: false
library_name: whisperkit
tags:
- whisper
- whisperkit
- coreml
- asr
- quantized
---
# WhisperKit Transcription Quality



## Dataset: `librispeech`
Short-form Audio (<30s/clip) - 5 hours of English audiobook clips

|                                                                                                                               | WER (↓)                                                                                                                               |   QoI (↑) |   File Size (MB) | Code Commit                                                    |
|:------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|:---------------------------------------------------------------|
| large-v2 (WhisperOpenAIAPI)                                                                                                   | [2.35](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/librispeech)              |     100   |             3100 | N/A                                                            |
| [large-v2](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2)                                       | [2.77](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2/librispeech)                    |      96.6 |             3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
| [large-v2_949MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_949MB)                           | [2.4](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_949MB/librispeech)               |      94.6 |              949 | [Link](https://github.com/argmaxinc/WhisperKit/commit/eca4a2e) |
| [large-v2_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo)                           | [2.76](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo/librispeech)              |      96.6 |             3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
| [large-v2_turbo_955MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo_955MB)               | [2.41](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo_955MB/librispeech)        |      94.6 |              955 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
| [large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3)                                       | [2.04](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/librispeech)                    |      95.2 |             3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
| [large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo)                           | [2.03](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo/librispeech)              |      95.4 |             3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
| [large-v3_turbo_954MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo_954MB)               | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo_954MB/librispeech)        |      93.9 |              954 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
| [distil-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3)                         | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3/librispeech)             |      89.7 |             1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
| [distil-large-v3_594MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_594MB)             | [2.96](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_594MB/librispeech)       |      85.4 |              594 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
| [distil-large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_turbo)             | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_turbo/librispeech)       |      89.7 |             1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
| [distil-large-v3_turbo_600MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_turbo_600MB) | [2.78](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_turbo_600MB/librispeech) |      86.2 |              600 | [Link](https://github.com/argmaxinc/WhisperKit/commit/ae1cf96) |
| [small.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small.en)                                       | [3.12](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small.en/librispeech)                    |      85.8 |              483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
| [small](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small)                                             | [3.45](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small/librispeech)                       |      83   |              483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
| [base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en)                                         | [3.98](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/librispeech)                     |      75.3 |              145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
| [base](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base)                                               | [4.97](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech)                        |      67.2 |              145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
| [tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en)                                         | [5.61](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech)                     |      63.9 |               66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
| [tiny](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny)                                               | [7.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech)                        |      52.5 |               66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
| [large-v3-v20240930](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3-v20240930)                   | [1.94](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/librispeech)          |      93.9 |             1640 | [Link](https://github.com/argmaxinc/WhisperKit/commit/c2f1b57) |
| [large-v3-v20240930_626MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3-v20240930_626MB)       | [1.95](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/librispeech)    |      93.8 |              626 | [Link](https://github.com/argmaxinc/WhisperKit/commit/3cd3ef1) |

## Dataset: `earnings22`
Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents

|                                                                                                       | WER (↓)                                                                                                                   |   QoI (↑) |   File Size (MB) | Code Commit                                                    |
|:------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|:---------------------------------------------------------------|
| large-v2 (WhisperOpenAIAPI)                                                                           | [16.27](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/earnings22)  |     100   |             3100 | N/A                                                            |
| [large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3)               | [15.17](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22)        |      58.5 |             3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
| [distil-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3) | [15.28](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3/earnings22) |      46.3 |             1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
| [base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en)                 | [23.49](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22)         |       6.5 |              145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
| [tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en)                 | [28.64](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22)         |       5.7 |               66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |

## Dataset: `common_voice_17_0-argmax_subset-400`
Short-form Audio (<30s/clip) - Max 400 samples per language from Common Voice 17.0 Test Set

|                                                                                                                                                   | es                                                                                                                                                                                | ro                                                                                                                                                                                 | th                                                                                                                                                                                 | nl                                                                                                                                                                                 | id                                                                                                                                                                                 | sv                                                                                                                                                                                 | de                                                                                                                                                                                 | pl                                                                                                                                                                                 | fi                                                                                                                                                                                 | it                                                                                                                                                                                 | cs                                                                                                                                                                                 | en                                                                                                                                                                                 | vi                                                                                                                                                                                 | el                                                                                                                                                                                 | hu                                                                                                                                                                                 | ru                                                                                                                                                                                 | gl                                                                                                                                                                                 | fr                                                                                                                                                                                 | pt                                                                                                                                                                                 | da                                                                                                                                                                                 |   File Size (MB) | Code Commit                                                    |
|:--------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------:|:---------------------------------------------------------------|
| [large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3)                                                           | [4.93](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/es)                 | [5.39](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/ro)                  | [6.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/th)                  | [7.03](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/nl)                  | [9.47](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/id)                  | [9.81](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/sv)                  | [9.89](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/de)                  | [10.13](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/pl)                 | [10.32](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/fi)                 | [11.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/it)                 | [12.04](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/cs)                 | [12.21](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/en)                 | [12.32](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/vi)                 | [12.35](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/el)                 | [12.44](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/hu)                 | [13.0](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/ru)                  | [13.06](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/gl)                 | [13.67](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/fr)                 | [13.75](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/pt)                 | [13.89](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/da)                 |             3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
| [large-v2](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2)                                                           | [6.93](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/es)                 | [7.86](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/ro)                  | [8.76](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/th)                  | [8.93](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/nl)                  | [12.2](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/id)                  | [12.16](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/sv)                 | [11.7](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/de)                  | [12.51](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/pl)                 | [13.13](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/fi)                 | [14.34](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/it)                 | [17.14](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/cs)                 | [12.7](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/en)                  | [17.69](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/vi)                 | [15.04](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/el)                 | [16.72](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/hu)                 | [15.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/ru)                 | [16.27](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/gl)                 | [16.21](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/fr)                 | [15.23](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/pt)                 | [16.72](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/da)                 |             3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
| [large-v3-v20240930](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3-v20240930)                                       | [6.1](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/es)        | [11.41](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/ro)       | [23.3](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/th)        | [8.91](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/nl)        | [11.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/id)       | [12.97](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/sv)       | [12.26](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/de)       | [12.12](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/pl)       | [15.42](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/fi)       | [12.83](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/it)       | [12.85](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/cs)       | [12.13](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/en)       | [16.92](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/vi)       | [17.73](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/el)       | [15.3](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/hu)        | [13.28](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/ru)       | [15.0](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/gl)        | [15.51](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/fr)       | [14.93](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/pt)       | [17.63](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/da)       |             1640 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
| [large-v3-v20240930_626MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3-v20240930_626MB)                           | [5.97](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/es) | [12.24](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/ro) | [23.09](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/th) | [9.05](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/nl)  | [12.66](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/id) | [12.72](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/sv) | [13.21](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/de) | [13.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/pl) | [15.17](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/fi) | [13.16](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/it) | [14.49](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/cs) | [13.03](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/en) | [17.36](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/vi) | [18.71](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/el) | [17.05](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/hu) | [14.37](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/ru) | [15.48](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/gl) | [15.68](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/fr) | [14.85](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/pt) | [18.94](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/common_voice_17_0-argmax_subset-400/forced/da) |              626 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
| [large-v3-v20240930_547MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3-v20240930_547MB)                           | [7.84](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/es) | [18.26](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/ro) | [39.58](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/th) | [14.18](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/nl) | [17.25](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/id) | [19.25](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/sv) | [17.62](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/de) | [19.6](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/pl)  | [20.31](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/fi) | [18.77](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/it) | [23.73](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/cs) | [16.12](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/en) | [25.97](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/vi) | [26.23](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/el) | [27.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/hu) | [18.63](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/ru) | [20.54](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/gl) | [22.0](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/fr)  | [18.91](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/pt) | [25.3](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_547MB/common_voice_17_0-argmax_subset-400/forced/da)  |              547 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
| [small](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small)                                                                 | [11.94](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/es)                   | [26.99](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/ro)                    | [21.52](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/th)                    | [19.94](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/nl)                    | [23.81](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/id)                    | [23.97](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/sv)                    | [23.87](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/de)                    | [23.74](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/pl)                    | [30.07](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/fi)                    | [25.02](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/it)                    | [37.7](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/cs)                     | [17.35](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/en)                    | [25.43](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/vi)                    | [31.49](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/el)                    | [44.66](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/hu)                    | [26.09](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/ru)                    | [30.45](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/gl)                    | [27.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/fr)                    | [35.7](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/pt)                     | [37.18](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-small/common_voice_17_0-argmax_subset-400/forced/da)                    |              483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
| [base](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base)                                                                   | [24.55](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/es)                    | [54.19](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/ro)                     | [32.91](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/th)                     | [37.01](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/nl)                     | [43.04](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/id)                     | [45.53](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/sv)                     | [38.09](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/de)                     | [43.44](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/pl)                     | [56.32](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/fi)                     | [46.45](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/it)                     | [67.24](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/cs)                     | [25.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/en)                     | [40.15](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/vi)                     | [55.22](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/el)                     | [71.07](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/hu)                     | [44.21](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/ru)                     | [47.63](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/gl)                     | [45.09](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/fr)                     | [48.98](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/pt)                     | [61.96](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-base/common_voice_17_0-argmax_subset-400/forced/da)                     |              145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
| [tiny](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny)                                                                   | [34.67](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/es)                    | [66.78](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/ro)                     | [41.88](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/th)                     | [54.03](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/nl)                     | [54.31](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/id)                     | [64.66](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/sv)                     | [49.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/de)                     | [56.38](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/pl)                     | [72.46](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/fi)                     | [60.13](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/it)                     | [81.53](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/cs)                     | [33.47](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/en)                     | [50.47](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/vi)                     | [66.21](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/el)                     | [85.67](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/hu)                     | [59.73](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/ru)                     | [54.05](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/gl)                     | [59.49](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/fr)                     | [65.65](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/pt)                     | [79.84](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-tiny/common_voice_17_0-argmax_subset-400/forced/da)                     |               66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |


### Explanation

We believe that rigorously measuring the quality of inference is necessary for developers and
enterprises to make informed decisions when opting to use optimized or compressed variants of
any machine learning model in production. To contextualize `WhisperKit`, we take the following Whisper
implementations and benchmark them using a consistent evaluation harness:

Server-side:
- `WhisperOpenAIAPI`: [OpenAI's Whisper API](https://platform.openai.com/docs/guides/speech-to-text)

($0.36 per hour of audio as of 02/29/24, 25MB file size limit per request)

On-device:
- `WhisperKit`: Argmax's implementation [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L100) [[Repo]](https://github.com/argmaxinc/WhisperKit)
- `whisper.cpp`: A C++ implementation form ggerganov [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L212) [[Repo]](https://github.com/ggerganov/whisper.cpp)
- `WhisperMLX`: A Python implementation from Apple MLX [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L338) [[Repo]](https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/transcribe.py)

(All on-device implementations are available for free under MIT license as of 03/19/2024)

`WhisperOpenAIAPI` sets the reference and we assume that it is using the equivalent of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)
in float16 precision along with additional undisclosed optimizations from OpenAI. In all measurements, we care primarily about per-example no-regressions (quantified as `qoi` below)
which is a stricter metric compared to dataset average [Word Error RATE (WER)](https://en.wikipedia.org/wiki/Word_error_rate). A 100% `qoi` preserves perfect backwards-compatibility on the test distribution and avoids "perceived regressions", the phenomenon
where per-example known behavior changes after a code/model update and causes divergence in downstream code or breaks the user experience itself (even if dataset averages might stay flat
across updates). Pseudocode for `qoi`:

```python
qoi = []
for example in dataset:
    no_regression = wer(optimized_model(example)) <= wer(reference_model(example))
    qoi.append(no_regression)
qoi = (sum(qoi) / len(qoi)) * 100.
```

Note that the ordering of models with respect to `WER` does not necessarily match the ordering with respect to `QoI`. This is because the reference model gets assigned
a QoI of 100% by definition. Any per-example regression by other implementations get penalized while per-example improvements are not rewarded. `QoI` (higher is better) matters
where the production behavior is established by the reference results and the goal is to not regress when switching to an optimized or compressed model. On the other hand,
`WER` (lower is better) matters when there is no established production behavior and one is picking the best quality versus model size trade off point.

We anticipate developers that use Whisper (or similar models) in production to have their own Quality Assurance test sets and [whisperkittools](https://github.com/argmaxinc/whisperkittools) offers
the tooling necessary to run the same measurements on such custom test sets, please see the [Model Evaluation on Custom Dataset]((https://github.com/argmaxinc/whisperkittools)) for details.

### Why are there so many Whisper versions?
WhisperKit is an SDK for building speech-to-text features in apps across a wide range of Apple devices. We are working towards abstracting away the model versioning from the developer so WhisperKit
"just works" by deploying the highest-quality model version that a particular device can execute. In the interim, we leave the choice to the developer by providing quality and size trade-offs.


### Datasets
- [librispeech](https://huggingface.co/datasets/argmaxinc/librispeech): ~5 hours of short English audio clips, tests short-form transcription quality
- [earnings22](https://huggingface.co/datasets/argmaxinc/earnings22): ~120 hours of English audio clips from earnings calls with various accents, tests long-form transcription quality
- [common_voice_17_0-argmax_subset-400](https://huggingface.co/datasets/argmaxinc/common_voice_17_0-argmax_subset-400): Up to 400 samples per language of multilingual audio clips with corresponding text, testing transcription quality across diverse languages and accents.

### Reproducing Results
Benchmark results on this page were automatically generated by [whisperkittools](https://github.com/argmaxinc/whisperkittools) using our cluster of Apple Silicon Macs as self-hosted runners on
Github Actions. We periodically recompute these benchmarks as part of our CI pipeline. Due to [security concerns](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners),
we are unable to open up the cluster to the public. However, any Apple Silicon Mac (even with 8GB RAM) can be used to
run identical [evaluation jobs](#evaluation) locally. For reference, our M2 Ultra devices complete a `librispeech` + `openai/whisper-large-v3`
evaluation in under 1 hour regardless of the Whisper implementation. Oldest Apple Silicon Macs should take less than 1 day to complete the same evaluation.



### Glossary

- `_turbo`: Indicates the presence of additional optimizations (not compression) to unlock streaming transcription
as described in our [Blog Post](https://www.takeargmax.com/blog/whisperkit).

- `_*MB`: Indicates the presence of model compression. Instead of cluttering the filename with details like
`_AudioEncoder-5.8bits_TextDecoder-6.1bits_QLoRA-rank=16`, we choose to summarize the compression spec as the
resulting total file size since this is what matters to developers in production.