Update README.md
Browse files
README.md
CHANGED
@@ -48,7 +48,7 @@ For requirements on GPU memory and the respective throughput, see similar result
|
|
48 |
> **Qwen2.5-Math-PRM-72B** is a process reward model typically used for offering feedback on the quality of reasoning and intermediate steps rather than generation.
|
49 |
|
50 |
### Prerequisites
|
51 |
-
- Step Separation: We recommend using double line breaks ("\n\n") to separate individual steps within the solution.
|
52 |
- Reward Computation: After each step, we insert a special token "`<extra_0>`". For reward calculation, we extract the probability score of this token being classified as positive, resulting in a reward value between 0 and 1.
|
53 |
|
54 |
### 🤗 Hugging Face Transformers
|
|
|
48 |
> **Qwen2.5-Math-PRM-72B** is a process reward model typically used for offering feedback on the quality of reasoning and intermediate steps rather than generation.
|
49 |
|
50 |
### Prerequisites
|
51 |
+
- Step Separation: We recommend using double line breaks ("\n\n") to separate individual steps within the solution if using responses from Qwen2.5-Math-Instruct.
|
52 |
- Reward Computation: After each step, we insert a special token "`<extra_0>`". For reward calculation, we extract the probability score of this token being classified as positive, resulting in a reward value between 0 and 1.
|
53 |
|
54 |
### 🤗 Hugging Face Transformers
|