Zhenru commited on
Commit
c1f1949
·
verified ·
1 Parent(s): 0062179

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -48,7 +48,7 @@ For requirements on GPU memory and the respective throughput, see similar result
48
  > **Qwen2.5-Math-PRM-72B** is a process reward model typically used for offering feedback on the quality of reasoning and intermediate steps rather than generation.
49
 
50
  ### Prerequisites
51
- - Step Separation: We recommend using double line breaks ("\n\n") to separate individual steps within the solution.
52
  - Reward Computation: After each step, we insert a special token "`<extra_0>`". For reward calculation, we extract the probability score of this token being classified as positive, resulting in a reward value between 0 and 1.
53
 
54
  ### 🤗 Hugging Face Transformers
 
48
  > **Qwen2.5-Math-PRM-72B** is a process reward model typically used for offering feedback on the quality of reasoning and intermediate steps rather than generation.
49
 
50
  ### Prerequisites
51
+ - Step Separation: We recommend using double line breaks ("\n\n") to separate individual steps within the solution if using responses from Qwen2.5-Math-Instruct.
52
  - Reward Computation: After each step, we insert a special token "`<extra_0>`". For reward calculation, we extract the probability score of this token being classified as positive, resulting in a reward value between 0 and 1.
53
 
54
  ### 🤗 Hugging Face Transformers