ReasonEval-7B Model Card
Model Description
ReasonEval-7B
is a 7B parameter decoder-only language model fine-tuned from WizardMath-7B-V1.1
. Given a mathematical problem and the solution, ReasonEval-7B
assesses the problem-solving process in a step-by-step format from the following perspectives:
- Validity: The step contains no mistakes in calculation and logic.
- Redundancy: The step lacks utility in solving the problem but is still valid.
With ReasonEval, you can
๐ quantify the quality of reasoning steps free of human or close-source models.
๐ค find the potential invalid or redundant steps in the solutions even with the correct results.
๐ ๏ธ select high-quality training data for downstream tasks (e.g., fine-tuning).
Model Details
- Model type:
ReasonEval-7B
's architecture is identical toWizardMath-7B-V1.1
, except that the classification head for next-token prediction is replaced with a classification head for outputting the possibilities of each class of reasong steps. - Language(s): English
- Paper: Evaluating Mathematical Reasoning Beyond Accuracy
- Github: https://github.com/GAIR-NLP/ReasonEval
- Finetuned from model: https://huggingface.co/WizardLM/WizardMath-7B-V1.1
- Fine-tuning Data: PRM800K
For detailed instructions on how to use the ReasonEval-7B model, visit our GitHub repository at https://github.com/GAIR-NLP/ReasonEval.
How to Cite
@article{xia2024evaluating,
title={Evaluating Mathematical Reasoning Beyond Accuracy},
author={Xia, Shijie and Li, Xuefeng and Liu, Yixin and Wu, Tongshuang and Liu, Pengfei},
journal={arXiv preprint arXiv:2404.05692},
year={2024},
}
- Downloads last month
- 280
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.