pyf98 commited on
Commit
b5e81eb
·
verified ·
1 Parent(s): ab3cca2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -12,14 +12,15 @@ license: cc-by-4.0
12
 
13
  ## OWSM: Open Whisper-style Speech Model
14
 
15
- [OWSM](https://arxiv.org/abs/2309.13876) is an Open Whisper-style Speech Model from [CMU WAVLab](https://www.wavlab.org/). It reproduces Whisper-style training using publicly available data and an open-source toolkit [ESPnet](https://github.com/espnet/espnet).
16
 
17
- Our demo is available [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo). The [project page](https://www.wavlab.org/activities/2024/owsm/) contains various resources.
 
18
 
19
  [OWSM v3.1](https://arxiv.org/abs/2401.16658) is an improved version of OWSM v3. It significantly outperforms OWSM v3 in almost all evaluation benchmarks.
20
  We do not include any new training data. Instead, we utilize a state-of-the-art speech encoder, [E-Branchformer](https://arxiv.org/abs/2210.00077).
21
 
22
- This is a base size model which has 101M parameters and is trained on 180k hours of public speech data.
23
  Specifically, it supports the following speech-to-text tasks:
24
  - Speech recognition
25
  - Any-to-any-language speech translation
 
12
 
13
  ## OWSM: Open Whisper-style Speech Model
14
 
15
+ OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits, including [ESPnet](https://github.com/espnet/espnet).
16
 
17
+ Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
18
+ Our demo is available [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
19
 
20
  [OWSM v3.1](https://arxiv.org/abs/2401.16658) is an improved version of OWSM v3. It significantly outperforms OWSM v3 in almost all evaluation benchmarks.
21
  We do not include any new training data. Instead, we utilize a state-of-the-art speech encoder, [E-Branchformer](https://arxiv.org/abs/2210.00077).
22
 
23
+ This is a base-sized model with 101M parameters and is trained on 180k hours of public speech data.
24
  Specifically, it supports the following speech-to-text tasks:
25
  - Speech recognition
26
  - Any-to-any-language speech translation