viktor-enzell
commited on
Commit
•
efbaccd
1
Parent(s):
3cafa40
Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,5 @@
|
|
1 |
---
|
2 |
language: sv
|
3 |
-
datasets:
|
4 |
-
- common_voice
|
5 |
-
- NST Swedish ASR Database
|
6 |
-
- P4
|
7 |
-
- The Swedish Culturomics Gigaword Corpus
|
8 |
metrics:
|
9 |
- wer
|
10 |
tags:
|
@@ -14,6 +9,11 @@ tags:
|
|
14 |
- hf-asr-leaderboard
|
15 |
- sv
|
16 |
license: cc0-1.0
|
|
|
|
|
|
|
|
|
|
|
17 |
model-index:
|
18 |
- name: Wav2vec 2.0 large VoxRex Swedish (C) with 4-gram
|
19 |
results:
|
@@ -37,7 +37,22 @@ Training of the acoustic model is the work of KBLab. See [VoxRex-C](https://hugg
|
|
37 |
VoxRex-C is extended with a 4-gram language model estimated from a subset extracted from [The Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/resurser/gigaword) from Språkbanken. The subset contains 40M words from the social media genre between 2010 and 2015.
|
38 |
|
39 |
## How to use
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
```python
|
43 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
|
@@ -45,7 +60,7 @@ from datasets import load_dataset
|
|
45 |
import torch
|
46 |
import torchaudio.functional as F
|
47 |
|
48 |
-
# Import model and processor
|
49 |
model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
|
50 |
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
51 |
model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device);
|
|
|
1 |
---
|
2 |
language: sv
|
|
|
|
|
|
|
|
|
|
|
3 |
metrics:
|
4 |
- wer
|
5 |
tags:
|
|
|
9 |
- hf-asr-leaderboard
|
10 |
- sv
|
11 |
license: cc0-1.0
|
12 |
+
datasets:
|
13 |
+
- common_voice
|
14 |
+
- NST_Swedish_ASR_Database
|
15 |
+
- P4
|
16 |
+
- The_Swedish_Culturomics_Gigaword_Corpus
|
17 |
model-index:
|
18 |
- name: Wav2vec 2.0 large VoxRex Swedish (C) with 4-gram
|
19 |
results:
|
|
|
37 |
VoxRex-C is extended with a 4-gram language model estimated from a subset extracted from [The Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/resurser/gigaword) from Språkbanken. The subset contains 40M words from the social media genre between 2010 and 2015.
|
38 |
|
39 |
## How to use
|
40 |
+
#### Simple usage example with pipeline
|
41 |
+
```python
|
42 |
+
import torch
|
43 |
+
from transformers import pipeline
|
44 |
+
|
45 |
+
# Load the model. Using GPU if available
|
46 |
+
model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
|
47 |
+
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
48 |
+
pipe = pipeline(model=model_name).to(device)
|
49 |
+
|
50 |
+
# Run inference on an audio file
|
51 |
+
output = pipe('path/to/audio.mp3')['text']
|
52 |
+
```
|
53 |
+
|
54 |
+
#### More verbose usage example with audio pre-processing
|
55 |
+
Example of transcribing 1% of the Common Voice test split. The model expects 16kHz audio, so audio with another sampling rate is resampled to 16kHz.
|
56 |
|
57 |
```python
|
58 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
|
|
|
60 |
import torch
|
61 |
import torchaudio.functional as F
|
62 |
|
63 |
+
# Import model and processor. Using GPU if available
|
64 |
model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
|
65 |
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
66 |
model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device);
|