FAcodec trained on 50k hours speech data, with more timbre diversity and better at reconstructing speakers from podcasts, videos, games or animations.
This is a separate decoder designed and trained based on the pretrained encoder specifically for voice conversion task.
It is capable of zero-shot voice conversion, stream voice conversion and has outstanding timbre generalization ability.
See main repository for example usages.
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.