Zhenru commited on
Commit
ac9471c
1 Parent(s): c3226a8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-classification
6
+ ---
7
+
8
+
9
+ # Qwen2-Math-RM-72B
10
+
11
+ ## Introduction
12
+ Qwen2-Math-RM-72B is specifically designed to guide the Qwen2-Math model throughout the training process by offering more granular feedback on the quality of reasoning and intermediate steps, ultimately facilitating more robust model improvements.
13
+
14
+
15
+ Key Highlights:
16
+
17
+ - Model Training Guide:
18
+ - Training Data Enhancement: Employs a data selection process via reward model scoring combined with Rejection Sampling to incrementally enhance the quality of responses
19
+ - Reinforcement Learning Training: Integrates seamlessly into the reinforcement learning training and provide effective reward signal, further improving model performance.
20
+
21
+ - Inference Boosting:
22
+ - Best of N: By leveraging a combination of response sampling and Best-of-N strategies, we choose the response of top score judged by reward model, yielding better results with spending more inference time. For example, Qwen2-Math-1.5B-Instruct obtains 79.9 on MATH in RM@8 setting and even surpasses the performance of Qwen2-Math-7B-Instruct 75.1 with greedy decoding.
23
+ - Comparasion with majority voting (Maj@N): RM@N scores are substantially better than Maj@N scores aross almost all benchmarks and models.
24
+
25
+
26
+ ## Model Details
27
+
28
+ For more details, please refer to our [blog post](https://qwenlm.github.io/blog/qwen2-math/) and [GitHub repo](https://github.com/QwenLM/Qwen2-Math).
29
+
30
+
31
+ ## Requirements
32
+ * `transformers>=4.40.0` for Qwen2-Math models. The latest version is recommended.
33
+
34
+ > [!Warning]
35
+ > <div align="center">
36
+ > <b>
37
+ > 🚨 This is a must because `transformers` integrated Qwen2 codes since `4.37.0`.
38
+ > </b>
39
+ > </div>
40
+
41
+ For requirements on GPU memory and the respective throughput, see similar results of Qwen2 [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
42
+
43
+ ## Quick Start
44
+
45
+ > [!Important]
46
+ >
47
+ > **Qwen2-Math-RM-72B** is a reward model typically used for offering feedback on the quality of reasoning and intermediate steps, serving in Rejection Sampling, reinforcement learning training and RM@N.
48
+
49
+ ### 🤗 Hugging Face Transformers
50
+
51
+ Here we show a code snippet to show you how to use the Qwen2-Math-RM-72B with `transformers`:
52
+
53
+ ```python
54
+ import torch
55
+ from transformers import AutoModel, AutoTokenizer
56
+
57
+ model_name = "Qwen/Qwen2-Math-RM-72B"
58
+ device = "auto" # the device to load the model onto
59
+
60
+ model = AutoModel.from_pretrained(
61
+ model_name,
62
+ device_map=device,
63
+ torch_dtype=torch.bfloat16,
64
+ trust_remote_code=True,
65
+ ).eval()
66
+
67
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
68
+
69
+ chat = [
70
+ {"role": "system", "content": ""},
71
+ {"role": "user", "content": "Suppose that a certain software product has a mean time between failures of 10,000 hours and has a mean time to repair of 20 hours. If the product is used by 100 customers, what is its availability?\nAnswer Choices: (A) 80% (B) 90% (C) 98% (D) 99.80%\nPlease reason step by step, and put your final answer within \\boxed{}."},
72
+ {"role": "user", "content": "To find the availability of the software product, we'll use the formula:\n\n\\[ \\text{ availability} = \\frac{\\text{Mean Time Between Failures (MTBF)}}{\\text{Mean Time Between Failures (MTBF) + Mean Time To Repair (MTTR)}} \\]\n\nGiven:\n- MTBF = 10,000 hours\n- MTTR = 20 hours\n\nLet's plug these values into the formula:\n\n\\[ \\text{availability} = \\frac{10,000}{10,000 + 20} = \\frac{10,000}{10,020} \\]\n\nTo simplify this fraction, we can divide both the numerator and the denominator by 10,000:\n\n\\[ \\text{availability} = \\frac{10,000 \\div 10,000}{10,020 \\div 10,000} = \\frac{1}{1.002} \\]\n\nTo express this as a percentage, we can calculate the decimal value of the fraction and then multiply by 100:\n\n\\[ \\text{availability} \\approx 0.998002 \\times 100 = 99.80\\% \\]\n\nTherefore, the availability of the software product is approximately 99.80%.\n\nThe correct answer is \\boxed{D}"}
73
+ ] # 1.7656
74
+
75
+ conversation_str = tokenizer.apply_chat_template(
76
+ chat,
77
+ tokenize=False,
78
+ add_generation_prompt=False
79
+ )
80
+
81
+ input_ids = tokenizer.encode(
82
+ conversation_str,
83
+ return_tensors="pt",
84
+ add_special_tokens=False
85
+ ).to(model.device)
86
+
87
+ outputs = model(input_ids=input_ids)
88
+ print(outputs[0])
89
+ ```
90
+
91
+ ### 🤖 ModelScope
92
+ We strongly advise users, especially those in mainland China, to use ModelScope. `snapshot_download` can help you solve issues concerning downloading checkpoints.
93
+
94
+
95
+ ## Citation
96
+
97
+ If you find our work helpful, feel free to give us a citation.
98
+
99
+ ```
100
+ @article{yang2024qwen2,
101
+ title={Qwen2 technical report},
102
+ author={Yang, An and Yang, Baosong and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Zhou, Chang and Li, Chengpeng and Li, Chengyuan and Liu, Dayiheng and Huang, Fei and others},
103
+ journal={arXiv preprint arXiv:2407.10671},
104
+ year={2024}
105
+ }
106
+ ```