nvidia/OpenMathInstruct-2
Viewer
•
Updated
•
22M
•
4.76k
•
149
A collection containing math datasets.
Note A math dataset containing 600k unique questions and around 14M question-answer pairs generated using Llama 3.1 405B Instruct and questions from GSM8K and MATH datasets. Paper is super detailed and it contains multiple ablation studies regarding the impact of the size, quality and diversity of the dataset when fine-tuning. https://huggingface.co/papers/2410.01560