AI-MO
/

NuminaMath-72B-TIR

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lewtun HF staff commited on Jul 19

Commit

82929e6

•

1 Parent(s): 35828e9

Update README.md

Files changed (1) hide show

README.md +1 -5

README.md CHANGED Viewed

@@ -96,11 +96,7 @@ should probably proofread and complete it, then remove this comment. -->
 # Model Card for NuminaMath 72B TIR
-NuminaMath is a series of language models that are trained to solve math problems using tool-integrated reasoning (TIR). NuminaMath 7B TIR won the first progress prize of the [AI Math Olympiad (AIMO)](https://aimoprize.com), with a score of 29/50 on the public and private tests sets.
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6200d0a443eb0913fa2df7cc/NyhBs_gzg40iwL995DO9L.png)
-This model is a fine-tuned version of [deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base) with two stages of supervised fine-tuning:
 * **Stage 1:** fine-tune the base model on a large, diverse dataset of natural language math problems and solutions, where each solution is templated with Chain of Thought (CoT) to facilitate reasoning.
 * **Stage 2:** fine-tune the model from Stage 1 on a synthetic dataset of tool-integrated reasoning, where each math problem is decomposed into a sequence of rationales, Python programs, and their outputs.

 # Model Card for NuminaMath 72B TIR
+NuminaMath is a series of language models that are trained with two stages of supervised fine-tuning to solve math problems using chain of thought (CoT) and tool-integrated reasoning (TIR):
 * **Stage 1:** fine-tune the base model on a large, diverse dataset of natural language math problems and solutions, where each solution is templated with Chain of Thought (CoT) to facilitate reasoning.
 * **Stage 2:** fine-tune the model from Stage 1 on a synthetic dataset of tool-integrated reasoning, where each math problem is decomposed into a sequence of rationales, Python programs, and their outputs.