Commit
•
1204837
1
Parent(s):
3643ef4
Allow single quotes "'" and hyphens "-"
Browse filesRemove single quotes `'` (id 6) and hyphens `-` (id 12) from `suppress_tokens`. These tokens should **not** be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258
Check that we're removing the right tokens:
```python
from transformers import WhisperTokenizer
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-tiny.en")
print(tokenizer.decode(6))
print(tokenizer.decode(12))
```
**Print Output:**
```
'
-
```
- config.json +0 -2
config.json
CHANGED
@@ -50,12 +50,10 @@
|
|
50 |
"suppress_tokens": [
|
51 |
1,
|
52 |
2,
|
53 |
-
6,
|
54 |
7,
|
55 |
8,
|
56 |
9,
|
57 |
10,
|
58 |
-
12,
|
59 |
14,
|
60 |
25,
|
61 |
26,
|
|
|
50 |
"suppress_tokens": [
|
51 |
1,
|
52 |
2,
|
|
|
53 |
7,
|
54 |
8,
|
55 |
9,
|
56 |
10,
|
|
|
57 |
14,
|
58 |
25,
|
59 |
26,
|