Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-classification
|
6 |
+
---
|
7 |
+
|
8 |
+
|
9 |
+
# Qwen2-Math-RM-72B
|
10 |
+
|
11 |
+
## Introduction
|
12 |
+
Qwen2-Math-RM-72B is specifically designed to guide the Qwen2-Math model throughout the training process by offering more granular feedback on the quality of reasoning and intermediate steps, ultimately facilitating more robust model improvements.
|
13 |
+
|
14 |
+
|
15 |
+
Key Highlights:
|
16 |
+
|
17 |
+
- Model Training Guide:
|
18 |
+
- Training Data Enhancement: Employs a data selection process via reward model scoring combined with Rejection Sampling to incrementally enhance the quality of responses
|
19 |
+
- Reinforcement Learning Training: Integrates seamlessly into the reinforcement learning training and provide effective reward signal, further improving model performance.
|
20 |
+
|
21 |
+
- Inference Boosting:
|
22 |
+
- Best of N: By leveraging a combination of response sampling and Best-of-N strategies, we choose the response of top score judged by reward model, yielding better results with spending more inference time. For example, Qwen2-Math-1.5B-Instruct obtains 79.9 on MATH in RM@8 setting and even surpasses the performance of Qwen2-Math-7B-Instruct 75.1 with greedy decoding.
|
23 |
+
- Comparasion with majority voting (Maj@N): RM@N scores are substantially better than Maj@N scores aross almost all benchmarks and models.
|
24 |
+
|
25 |
+
|
26 |
+
## Model Details
|
27 |
+
|
28 |
+
For more details, please refer to our [blog post](https://qwenlm.github.io/blog/qwen2-math/) and [GitHub repo](https://github.com/QwenLM/Qwen2-Math).
|
29 |
+
|
30 |
+
|
31 |
+
## Requirements
|
32 |
+
* `transformers>=4.40.0` for Qwen2-Math models. The latest version is recommended.
|
33 |
+
|
34 |
+
> [!Warning]
|
35 |
+
> <div align="center">
|
36 |
+
> <b>
|
37 |
+
> 🚨 This is a must because `transformers` integrated Qwen2 codes since `4.37.0`.
|
38 |
+
> </b>
|
39 |
+
> </div>
|
40 |
+
|
41 |
+
For requirements on GPU memory and the respective throughput, see similar results of Qwen2 [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
|
42 |
+
|
43 |
+
## Quick Start
|
44 |
+
|
45 |
+
> [!Important]
|
46 |
+
>
|
47 |
+
> **Qwen2-Math-RM-72B** is a reward model typically used for offering feedback on the quality of reasoning and intermediate steps, serving in Rejection Sampling, reinforcement learning training and RM@N.
|
48 |
+
|
49 |
+
### 🤗 Hugging Face Transformers
|
50 |
+
|
51 |
+
Here we show a code snippet to show you how to use the Qwen2-Math-RM-72B with `transformers`:
|
52 |
+
|
53 |
+
```python
|
54 |
+
import torch
|
55 |
+
from transformers import AutoModel, AutoTokenizer
|
56 |
+
|
57 |
+
model_name = "Qwen/Qwen2-Math-RM-72B"
|
58 |
+
device = "auto" # the device to load the model onto
|
59 |
+
|
60 |
+
model = AutoModel.from_pretrained(
|
61 |
+
model_name,
|
62 |
+
device_map=device,
|
63 |
+
torch_dtype=torch.bfloat16,
|
64 |
+
trust_remote_code=True,
|
65 |
+
).eval()
|
66 |
+
|
67 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
68 |
+
|
69 |
+
chat = [
|
70 |
+
{"role": "system", "content": ""},
|
71 |
+
{"role": "user", "content": "Suppose that a certain software product has a mean time between failures of 10,000 hours and has a mean time to repair of 20 hours. If the product is used by 100 customers, what is its availability?\nAnswer Choices: (A) 80% (B) 90% (C) 98% (D) 99.80%\nPlease reason step by step, and put your final answer within \\boxed{}."},
|
72 |
+
{"role": "user", "content": "To find the availability of the software product, we'll use the formula:\n\n\\[ \\text{ availability} = \\frac{\\text{Mean Time Between Failures (MTBF)}}{\\text{Mean Time Between Failures (MTBF) + Mean Time To Repair (MTTR)}} \\]\n\nGiven:\n- MTBF = 10,000 hours\n- MTTR = 20 hours\n\nLet's plug these values into the formula:\n\n\\[ \\text{availability} = \\frac{10,000}{10,000 + 20} = \\frac{10,000}{10,020} \\]\n\nTo simplify this fraction, we can divide both the numerator and the denominator by 10,000:\n\n\\[ \\text{availability} = \\frac{10,000 \\div 10,000}{10,020 \\div 10,000} = \\frac{1}{1.002} \\]\n\nTo express this as a percentage, we can calculate the decimal value of the fraction and then multiply by 100:\n\n\\[ \\text{availability} \\approx 0.998002 \\times 100 = 99.80\\% \\]\n\nTherefore, the availability of the software product is approximately 99.80%.\n\nThe correct answer is \\boxed{D}"}
|
73 |
+
] # 1.7656
|
74 |
+
|
75 |
+
conversation_str = tokenizer.apply_chat_template(
|
76 |
+
chat,
|
77 |
+
tokenize=False,
|
78 |
+
add_generation_prompt=False
|
79 |
+
)
|
80 |
+
|
81 |
+
input_ids = tokenizer.encode(
|
82 |
+
conversation_str,
|
83 |
+
return_tensors="pt",
|
84 |
+
add_special_tokens=False
|
85 |
+
).to(model.device)
|
86 |
+
|
87 |
+
outputs = model(input_ids=input_ids)
|
88 |
+
print(outputs[0])
|
89 |
+
```
|
90 |
+
|
91 |
+
### 🤖 ModelScope
|
92 |
+
We strongly advise users, especially those in mainland China, to use ModelScope. `snapshot_download` can help you solve issues concerning downloading checkpoints.
|
93 |
+
|
94 |
+
|
95 |
+
## Citation
|
96 |
+
|
97 |
+
If you find our work helpful, feel free to give us a citation.
|
98 |
+
|
99 |
+
```
|
100 |
+
@article{yang2024qwen2,
|
101 |
+
title={Qwen2 technical report},
|
102 |
+
author={Yang, An and Yang, Baosong and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Zhou, Chang and Li, Chengpeng and Li, Chengyuan and Liu, Dayiheng and Huang, Fei and others},
|
103 |
+
journal={arXiv preprint arXiv:2407.10671},
|
104 |
+
year={2024}
|
105 |
+
}
|
106 |
+
```
|