Update README.md
Browse files
README.md
CHANGED
@@ -58,27 +58,28 @@ When using this model, make sure that your speech input is sampled at 16kHz.
|
|
58 |
To transcribe audio files the model can be used as a standalone acoustic model as follows:
|
59 |
|
60 |
```python
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
|
69 |
-
|
70 |
-
|
71 |
|
72 |
-
|
73 |
-
|
74 |
|
75 |
-
|
76 |
-
|
77 |
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
|
|
82 |
|
83 |
## Citation
|
84 |
If you want to cite this model you can use this:
|
|
|
58 |
To transcribe audio files the model can be used as a standalone acoustic model as follows:
|
59 |
|
60 |
```python
|
61 |
+
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
|
62 |
+
from datasets import load_dataset
|
63 |
+
import torch
|
64 |
|
65 |
+
# load model and tokenizer
|
66 |
+
processor = Wav2Vec2Processor.from_pretrained("bond005/wav2vec2-large-ru-golos")
|
67 |
+
model = Wav2Vec2ForCTC.from_pretrained("bond005/wav2vec2-large-ru-golos")
|
68 |
|
69 |
+
# load test part of Golos dataset and read first soundfile
|
70 |
+
ds = load_dataset("bond005/sberdevices_golos_10h_crowd", split="test")
|
71 |
|
72 |
+
# tokenize
|
73 |
+
processed = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest") # Batch size 1
|
74 |
|
75 |
+
# retrieve logits
|
76 |
+
logits = model(processed.input_values, attention_mask=processed.attention_mask).logits
|
77 |
|
78 |
+
# take argmax and decode
|
79 |
+
predicted_ids = torch.argmax(logits, dim=-1)
|
80 |
+
transcription = processor.batch_decode(predicted_ids)[0]
|
81 |
+
print(transcription)
|
82 |
+
```
|
83 |
|
84 |
## Citation
|
85 |
If you want to cite this model you can use this:
|