Licensing
#4
by
tonylek
- opened
What are the licenses of the data the model was trained on?
Is this permissive code?
Yes, we only use permissively licensed code data for training all our models (both base and instruct).
is this only thestack or additional repos? Is the list available somewhere?
we have crawled additional github data and then filtered by license (only permissive is used).
will the list of repos be published?
any update on this?
hi, we are not planning on releasing the list of repos.
mayank-mishra
changed discussion status to
closed