Pre-trained checkpoints for speech representation in Japanese
The models in this repository were pre-trained via self-supervised learning (SSL) for speech representation. The SSL models were built on the fairseq toolkit.
wav2vec2_base_csj.pt
- fairseq checkpoint of wav2vec2.0 model with Base architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ)
wav2vec2_base_csj_hf
- converted version of
wav2vec2_base_csj.pt
compatible with the interface of Hugging Face by using this tool
- converted version of
hubert_base_csj.pt
- fairseq checkpoint of HuBERT model with Base architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ)
hubert_base_csj_hf
- converted version of
hubert_base_csj.pt
compatible with the interface of Hugging Face by using this tool
- converted version of
If you find this helpful, please consider citing the following paper.
@INPROCEEDINGS{ashihara_icassp23,
author={Takanori Ashihara and Takafumi Moriya and Kohei Matsuura and Tomohiro Tanaka},
title={Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2023}
}