yuexiang96 commited on
Commit
e37ef2e
1 Parent(s): 1b40dcc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - TIGER-Lab/MathInstruct
5
+ language:
6
+ - en
7
+ ---
8
+ # 🦣 MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
9
+
10
+ Project Page: [https://tiger-ai-lab.github.io/MAmmoTH/](https://tiger-ai-lab.github.io/MAmmoTH/)
11
+
12
+ Paper: [https://arxiv.org/pdf/2309.05653.pdf](https://arxiv.org/pdf/2309.05653.pdf)
13
+
14
+ Code: [https://github.com/TIGER-AI-Lab/MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)
15
+
16
+
17
+ ## Introduction
18
+ We introduce 🦣 MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on 🤗 [MathInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/MathInstruct), a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields.
19
+
20
+ | | **Base Model: Llama-2** | **Base Model: Code Llama** |
21
+ |-----|---------------------------------------------------------------|--------------------------------------------------------------------------|
22
+ | 7B | 🦣 [MAmmoTH-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-7B) | 🦣 [MAmmoTH-Coder-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-7B) |
23
+ | 13B | 🦣 [MAmmoTH-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-13B) | 🦣 [MAmmoTH-Coder-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-13B)|
24
+ | 34B | - | 🦣 [MAmmoTH-Coder-34B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-34B)|
25
+ | 70B | 🦣 [MAmmoTH-70B](https://huggingface.co/TIGER-Lab/MAmmoTH-70B) | - |
26
+ |
27
+
28
+
29
+ ## Training Data
30
+ The models are trained on the 🤗 [MathInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/MathInstruct), which is compiled from 13 different math rationale datasets. Check out the dataset card for more details.
31
+
32
+
33
+ ## Training Procedure
34
+ The models are fine-tuned with the MathInstruct dataset using the original Llama-2 and Code Llama models as base models. The training procedure varies for different models based on their sizes. Check out our paper for more details.
35
+
36
+ ## Evaluation
37
+ The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
38
+
39
+
40
+ | Model | Size | Base | GSM8K | MATH | AQuA | NumGLUE | IID Avg | SVAMP | Mathematics | SimulEq | SAT-Math | MMLU-Math | OOD Avg |
41
+ |-------------------|-------|---------------|-----------|-------|-------|-----------|---------------|-----------|---------------|-----------|-----------|---------------|---------------|
42
+ | | | | | | | | | | | | | | |
43
+ | MAmmoTH | 7B | Llama-2 | 51.7 | 31.2 | 42.9 | 53.1 | 44.7 | 66.7 | 44.8 | 42 | 36.4 | 38.6 | 45.7 |
44
+ | MAmmoTH-Coder | 7B | Code-Llama | 58.8 | 35.2 | 43 | 57.1 | 48.5 | 71.1 | 53.9 | 44.6 | 40 | 40.5 | 50.2 |
45
+ | MAmmoTH | 13B | Llama-2 | 61.7 | 36 | 44.8 | 59.6 | 50.5 | 72.4 | 48.7 | 40.5 | 42.7 | 45.3 | 49.9 |
46
+ | MAmmoTH-Coder | 13B | Code-Llama | 64.3 | 38.6 | 46.1 | 54.2 | 50.8 | 73.2 | 60 | 44.1 | 40.9 | 45.2 | 52.6 |
47
+ | MAmmoTH-Coder | 34B | Code-Llama | 72.3 | 46.8 | 50.8 | 59.6 | 57.3 | 84 | 64.7 | 50.6 | 51.8 | 50.2 | 60.3 |
48
+ | MAmmoTH | 70B | Llama-2 | 76.7 | 44.2 | 61.4 | 64.3 | 61.7 | 81.7 | 55.3 | 45.3 | 58.6 | 52.3 | 58.6 |
49
+
50
+
51
+
52
+ ## Usage
53
+ You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.
54
+ Check our Github repo for more advanced use: [https://github.com/TIGER-AI-Lab/MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)
55
+
56
+ ## Intended Uses
57
+ These models are trained for research purposes. They are designed to solve general math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed. The models can generate both a chain of thought (CoT) rationale and a program of thought (PoT) rationale, providing a comprehensive solution to a given math problem.
58
+
59
+ ## Limitations
60
+ We've tried our best to build math generalist models. However, we acknowledge that the models' performance may vary based on the complexity and specifics of the math problem. Still not all mathematical fields can be covered comprehensively.
61
+
62
+
63
+ ## Citation
64
+ If you use the models, data, or code from this project, please cite the original paper:
65
+
66
+ ```
67
+ @article{yue2023mammoth,
68
+ title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},
69
+ author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},
70
+ journal={arXiv preprint arXiv:2309.05653},
71
+ year={2023}
72
+ }
73
+ ```