Fixes to documentation
Browse files- TRAINING.md +3 -1
TRAINING.md
CHANGED
@@ -20,7 +20,7 @@ The model improves in WER evaluation metric when it is evaluated against the Com
|
|
20 |
|
21 |
**2. Model degrades according to human evaluation**
|
22 |
|
23 |
-
When doing human
|
24 |
|
25 |
Our hypothesis is that the evaluation on Common Voice gives better results because the model is overfitted and has lost generalization capabilities.
|
26 |
|
@@ -50,6 +50,8 @@ Summary as March 2023:
|
|
50 |
|
51 |
**b**. HuggingFace Whisper implementation performs poorly. This can be really misleading when doing evaluations, since HuggingFace is the stack used for fine-tuning
|
52 |
|
|
|
|
|
53 |
In our experiments
|
54 |
|
55 |
| Whisper Client | WER |
|
|
|
20 |
|
21 |
**2. Model degrades according to human evaluation**
|
22 |
|
23 |
+
When doing human evaluation the results for finetuned Catalan language model were disapointing. The fine-tuned models clearly perform worse than the original OpenAI models as reported by all users (half dozen) that test them.
|
24 |
|
25 |
Our hypothesis is that the evaluation on Common Voice gives better results because the model is overfitted and has lost generalization capabilities.
|
26 |
|
|
|
50 |
|
51 |
**b**. HuggingFace Whisper implementation performs poorly. This can be really misleading when doing evaluations, since HuggingFace is the stack used for fine-tuning
|
52 |
|
53 |
+
**c**. We have only been able to use the models reliable with Whisper.cpp and CTranslate 2 inference clients.
|
54 |
+
|
55 |
In our experiments
|
56 |
|
57 |
| Whisper Client | WER |
|