|
--- |
|
library_name: transformers |
|
language: ja |
|
license: apache-2.0 |
|
datasets: reazon-research/reazonspeech |
|
pipeline_tag: feature-extraction |
|
tags: |
|
- wav2vec2 |
|
- speech |
|
--- |
|
|
|
# `reazon-research/japanese-wav2vec2-large` |
|
|
|
This is a Japanese wav2vec 2.0 Large model pre-trained on [ReazonSpeech v2.0 corpus](https://huggingface.co/datasets/reazon-research/reazonspeech). |
|
|
|
We also release the CTC model [`reazon-research/japanese-wav2vec2-large-rs35kh`](https://huggingface.co/reazon-research/japanese-wav2vec2-large-rs35kh) derived from this model. |
|
|
|
## Usage |
|
|
|
```python |
|
import librosa |
|
import torch |
|
from transformers import AutoFeatureExtractor, AutoModel |
|
|
|
feature_extractor = AutoFeatureExtractor.from_pretrained("reazon-research/japanese-wav2vec2-large") |
|
model = AutoModel.from_pretrained("reazon-research/japanese-wav2vec2-large") |
|
|
|
audio, sr = librosa.load(audio_file, sr=16_000) |
|
inputs = feature_extractor( |
|
audio, |
|
return_tensors="pt", |
|
sampling_rate=sr, |
|
) |
|
with torch.inference_mode(): |
|
outputs = model(**inputs) |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{reazon-research-japanese-wav2vec2-large, |
|
title={japanese-wav2vec2-large}, |
|
author={Sasaki, Yuta}, |
|
url = {https://huggingface.co/reazon-research/japanese-wav2vec2-large}, |
|
year = {2024} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
[Apaceh Licence 2.0](https://choosealicense.com/licenses/apache-2.0/) |