Update README.md
Browse files
README.md
CHANGED
@@ -24,6 +24,10 @@ In addition to the mathematical Outcome Reward Model (ORM) Qwen2.5-Math-RM-72B,
|
|
24 |
![](http://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwen2.5/Qwen2.5-Math-PRM/Qwen2.5-Math-PRM.png)
|
25 |
|
26 |
|
|
|
|
|
|
|
|
|
27 |
|
28 |
## Requirements
|
29 |
* `transformers>=4.40.0` for Qwen2.5-Math models. The latest version is recommended.
|
@@ -122,10 +126,12 @@ print(step_reward) # [[0.9921875, 0.0047607421875, 0.32421875, 0.8203125]]
|
|
122 |
If you find our work helpful, feel free to give us a citation.
|
123 |
|
124 |
```
|
125 |
-
@article{
|
126 |
-
title={
|
127 |
-
author={
|
128 |
-
|
129 |
-
|
|
|
|
|
130 |
}
|
131 |
```
|
|
|
24 |
![](http://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwen2.5/Qwen2.5-Math-PRM/Qwen2.5-Math-PRM.png)
|
25 |
|
26 |
|
27 |
+
## Model Details
|
28 |
+
|
29 |
+
For more details, please refer to our [paper](https://arxiv.org/pdf/2501.07301).
|
30 |
+
|
31 |
|
32 |
## Requirements
|
33 |
* `transformers>=4.40.0` for Qwen2.5-Math models. The latest version is recommended.
|
|
|
126 |
If you find our work helpful, feel free to give us a citation.
|
127 |
|
128 |
```
|
129 |
+
@article{prmlessons,
|
130 |
+
title={The Lessons of Developing Process Reward Models in Mathematical Reasoning},
|
131 |
+
author={
|
132 |
+
Zhenru Zhang and Chujie Zheng and Yangzhen Wu and Beichen Zhang and Runji Lin and Bowen Yu and Dayiheng Liu and Jingren Zhou and Junyang Lin
|
133 |
+
},
|
134 |
+
journal={arXiv preprint arXiv:2501.07301},
|
135 |
+
year={2025}
|
136 |
}
|
137 |
```
|