qmeeus commited on
Commit
10c4eba
1 Parent(s): 4fd0dbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -4
README.md CHANGED
@@ -11,9 +11,6 @@ model-index:
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
  # whisper-small-nl
18
 
19
  This model is a fine-tuned version of [qmeeus/whisper-small-nl](https://huggingface.co/qmeeus/whisper-small-nl) on the None dataset.
@@ -27,7 +24,30 @@ More information needed
27
 
28
  ## Intended uses & limitations
29
 
30
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Training and evaluation data
33
 
 
11
  results: []
12
  ---
13
 
 
 
 
14
  # whisper-small-nl
15
 
16
  This model is a fine-tuned version of [qmeeus/whisper-small-nl](https://huggingface.co/qmeeus/whisper-small-nl) on the None dataset.
 
24
 
25
  ## Intended uses & limitations
26
 
27
+ Transcribe files in Dutch:
28
+
29
+ ```python
30
+ import soundfile as sf
31
+ from transformers import pipeline
32
+
33
+ whisper_asr = pipeline("automatic-speech-recognition", model="qmeeus/whisper-small-nl", device=0)
34
+ whisper_asr.model.config.forced_decoder_ids = whisper_asr.tokenizer.get_decoder_prompt_ids(
35
+ task="transcribe", language="nl"
36
+ )
37
+
38
+ waveform, sr = sf.read(filename)
39
+
40
+ def iter_chunks(waveform, sampling_rate=16_000, chunk_length=30.):
41
+ assert sampling_rate == 16_000
42
+ n_frames = math.floor(sampling_rate * chunk_length)
43
+ for start in range(0, len(waveform), n_frames):
44
+ end = min(len(waveform), start + n_frames)
45
+ yield waveform[start:end]
46
+
47
+ for sentence in whisper_asr(iter_chunks(waveform, sr)):
48
+ print(sentence["text"])
49
+
50
+ ```
51
 
52
  ## Training and evaluation data
53