Commit
•
66a5149
1
Parent(s):
a356db9
Allow single quotes "'" and hyphens "-"
Browse filesRemove single quotes `'` (id 6) and hyphens `-` (id 12) from `suppress_tokens`. These tokens should **not** be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258
Check that we're removing the right tokens:
```python
from transformers import WhisperTokenizer
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small.en")
print(tokenizer.decode(6))
print(tokenizer.decode(12))
```
**Print Output:**
```
'
-
```
- config.json +0 -2
config.json
CHANGED
@@ -42,12 +42,10 @@
|
|
42 |
"suppress_tokens": [
|
43 |
1,
|
44 |
2,
|
45 |
-
6,
|
46 |
7,
|
47 |
8,
|
48 |
9,
|
49 |
10,
|
50 |
-
12,
|
51 |
14,
|
52 |
25,
|
53 |
26,
|
|
|
42 |
"suppress_tokens": [
|
43 |
1,
|
44 |
2,
|
|
|
45 |
7,
|
46 |
8,
|
47 |
9,
|
48 |
10,
|
|
|
49 |
14,
|
50 |
25,
|
51 |
26,
|