tongyx361 commited on
Commit
2ed695f
1 Parent(s): b742195

Add news of comparability to NuminaMath-7B

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -90,6 +90,9 @@ model-index:
90
 
91
  🐦 [Thread@X(Twitter)](https://x.com/tongyx361/status/1811413243350454455) | 🐶 [中文博客@知乎](https://zhuanlan.zhihu.com/p/708371895) | 📊 [Leaderboard@PapersWithCode](https://paperswithcode.com/paper/dart-math-difficulty-aware-rejection-tuning#results) | 📑 [BibTeX](https://github.com/hkust-nlp/dart-math?tab=readme-ov-file#citation)
92
 
 
 
 
93
  ## Models: `DART-Math`
94
 
95
  `DART-Math` models achieve performance **superior or competitive to previous SOTAs** on 2 in-domain and 4 challenging out-of-domain mathematical reasoning benchmarks, despite using **much smaller datasets** and **no proprietary model like GPT-4**.
 
90
 
91
  🐦 [Thread@X(Twitter)](https://x.com/tongyx361/status/1811413243350454455) | 🐶 [中文博客@知乎](https://zhuanlan.zhihu.com/p/708371895) | 📊 [Leaderboard@PapersWithCode](https://paperswithcode.com/paper/dart-math-difficulty-aware-rejection-tuning#results) | 📑 [BibTeX](https://github.com/hkust-nlp/dart-math?tab=readme-ov-file#citation)
92
 
93
+ > [!IMPORTANT]
94
+ > 🔥 Excited to find **[our `DART-Math-DSMath-7B` (Prop2Diff)](https://huggingface.co/hkust-nlp/dart-math-dsmath-7b-prop2diff) [comparable](https://github.com/project-numina/aimo-progress-prize/blob/main/report/numina_dataset.pdf) to the AIMO winner [NuminaMath-7B](https://huggingface.co/AI-MO/NuminaMath-7B-CoT)**, but based solely on [MATH](https://huggingface.co/datasets/hkust-nlp/dart-math-pool-math-query-info) & [GSM8K](https://huggingface.co/datasets/hkust-nlp/dart-math-pool-gsm8k-query-info) prompt set! Join the discussion under this [X thread](https://x.com/tongyx361/status/1815112376649134172)!
95
+
96
  ## Models: `DART-Math`
97
 
98
  `DART-Math` models achieve performance **superior or competitive to previous SOTAs** on 2 in-domain and 4 challenging out-of-domain mathematical reasoning benchmarks, despite using **much smaller datasets** and **no proprietary model like GPT-4**.