Is the 14 programming Laungugae dataset uploaded on hugging face ? Any other option to doenload the data
#201
by
MukeshSharma
- opened
I am looking for programming launguage dataset which is used in the model to fine tune it . Where can i get it ?
They are worlds off of "code-davinci-003", now surpassed by "gpt3.5-turbo" with better results at 1/3rd the price, but these are the best models I found:
And I would just search for GitHub in the Datasets to use for fine tuning. For example, "codeparrot" has a few good ones.
Filter down to the language you want to fine tune on for better results:
I'm looking to do about the same. Best of luck!