License
Hi
@steveheh
,
Congrats on the launch! Thanks for releasing this. Might it be possible to switch the license to a more permissive one?
Thanks!
This is the best we could do for this model, however we are working on new models with more permissible license.
Hi,
Thanks for the response! Is it the training data that restricts usage?
Thanks!
As far as I can tell, the whole training data is freely available. Most of it was used for training earlier CC-BY Nemo models.
Did anyone try to replicate the training yet?
@nithinraok is there a timeline for the new models?
It turns out that some training data is not freely available.
"The Canary-1B model is trained on a total of 85k hrs of speech data. It consists of 31k hrs of public data, 20k hrs collected by Suno, and 34k hrs of in-house data."
In this commit we even got changing hour counts for the public data: https://huggingface.co/nvidia/canary-1b/commit/e2ec44628860649c9fee47ea2f591c4ebb542c02t
So it is actually not possible to replicate the training, as far as I can tell.