MinghaoYang commited on
Commit
2ab0c2d
1 Parent(s): 1a081a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +154 -154
README.md CHANGED
@@ -1,154 +1,154 @@
1
- ---
2
- library_name: transformers
3
- base_model: meta-llama/Llama-3.1-70B-Instruct
4
- datasets:
5
- - infly/INF-ORM-Preference-Magnitude-80K
6
- pipeline_tag: text-classification
7
- ---
8
-
9
-
10
- # INF Outcome Reward Model
11
- ## Introduction
12
-
13
- [**INF-ORM-Llama3.1-70B**](https://huggingface.co/Skywork/Skywork-Reward-Gemma-2-27B-v0.2) is the outcome reward model roughly built on the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) architecture and trained with the dataset [INF-ORM-Preference-Magnitude-80K](https://huggingface.co/datasets/infly/INF-ORM-Preference-Magnitude-80K).
14
-
15
- **Note: Train Details are coming soon!**
16
-
17
- ## RewardBench Leaderboard
18
-
19
- We evaluate our model on [RewardBench](https://huggingface.co/spaces/allenai/reward-bench) using the [official test script](https://github.com/allenai/reward-bench) locally. As of December 2024, INF-ORM-Llama3.1-70B ranks first on the RewardBench leaderboard.
20
-
21
- | Rank | Model | Model Type | Score | Chat | Chat Hard | Safety | Reasoning |
22
- | :---: | -------------------------------------------- | ----------------- | :---: | :---: | :-------: | :----: | :-------: |
23
- | 1 | **infly/INF-ORM-Llama3.1-70B** | Custom Classifier | 95.2 | 96.9 | 91.0 | 93.8 | 99.1 |
24
- | 2 | Skywork/Skywork-Reward-Gemma-2-27B-v0.2 | Seq. Classifier | 94.3 | 96.1 | 89.9 | 93.0 | 98.1 |
25
- | 3 | nvidia/Llama-3.1-Nemotron-70B-Reward | Custom Classifier | 94.1 | 97.5 | 85.7 | 95.1 | 98.1 |
26
- | 4 | Skywork/Skywork-Reward-Gemma-2-27B | Seq. Classifier | 93.8 | 95.8 | 91.4 | 91.9 | 96.1 |
27
- | 5 | SF-Foundation/TextEval-Llama3.1-70B | Generative | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
28
- | 6 | meta-metrics/MetaMetrics-RM-v1.0 | Custom Classifier | 93.4 | 98.3 | 86.4 | 90.8 | 98.2 |
29
- | 7 | Skywork/Skywork-Critic-Llama-3.1-70B | Generative | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
30
- | 8 | Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 | Seq. Classifier | 93.1 | 94.7 | 88.4 | 92.7 | 96.7 |
31
- | 9 | nicolinho/QRM-Llama3.1-8B | Seq. Classifier | 93.1 | 94.4 | 89.7 | 92.3 | 95.8 |
32
- | 10 | LxzGordon/URM-LLaMa-3.1-8B | Seq. Classifier | 92.9 | 95.5 | 88.2 | 91.1 | 97.0 |
33
-
34
- ## Demo Code
35
-
36
- We provide example usage of the Skywork reward model series below.
37
- Below is an example of obtaining the reward scores of two conversations.
38
-
39
- ```python
40
- from typing import List, Optional, Union
41
-
42
- import torch
43
- import torch.nn as nn
44
- from transformers import LlamaPreTrainedModel, LlamaModel, PreTrainedTokenizerFast
45
- from transformers.modeling_outputs import SequenceClassifierOutputWithPast
46
-
47
- class INFORMForSequenceClassification(LlamaPreTrainedModel):
48
- def __init__(self, config):
49
- super().__init__(config)
50
- self.num_labels = config.num_labels
51
- self.model = LlamaModel(config)
52
- self.score = nn.Sequential(
53
- nn.Linear(config.hidden_size, config.hidden_size),
54
- nn.ReLU(),
55
- nn.Linear(config.hidden_size, self.num_labels)
56
- )
57
- # Initialize weights and apply final processing
58
- self.post_init()
59
-
60
- def forward(
61
- self,
62
- input_ids: Optional[torch.LongTensor] = None,
63
- attention_mask: Optional[torch.Tensor] = None,
64
- position_ids: Optional[torch.LongTensor] = None,
65
- past_key_values: Optional[List[torch.FloatTensor]] = None,
66
- inputs_embeds: Optional[torch.FloatTensor] = None,
67
- labels: Optional[torch.LongTensor] = None,
68
- use_cache: Optional[bool] = None,
69
- output_attentions: Optional[bool] = None,
70
- output_hidden_states: Optional[bool] = None,
71
- return_dict: Optional[bool] = None,
72
- ):
73
-
74
- transformer_outputs = self.model(
75
- input_ids,
76
- attention_mask=attention_mask,
77
- position_ids=position_ids,
78
- past_key_values=past_key_values,
79
- inputs_embeds=inputs_embeds,
80
- )
81
- hidden_states = transformer_outputs[0]
82
- logits = self.score(hidden_states)
83
-
84
- if input_ids is not None:
85
- batch_size = input_ids.shape[0]
86
- else:
87
- batch_size = inputs_embeds.shape[0]
88
-
89
- if self.config.pad_token_id is None and batch_size != 1:
90
- raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
91
- if self.config.pad_token_id is None:
92
- sequence_lengths = -1
93
- else:
94
- if input_ids is not None:
95
- # if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
96
- sequence_lengths = torch.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
97
- sequence_lengths = sequence_lengths % input_ids.shape[-1]
98
- sequence_lengths = sequence_lengths.to(logits.device)
99
- else:
100
- sequence_lengths = -1
101
-
102
- pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
103
-
104
- loss = None
105
- return SequenceClassifierOutputWithPast(
106
- loss=loss,
107
- logits=pooled_logits,
108
- past_key_values=transformer_outputs.past_key_values,
109
- hidden_states=transformer_outputs.hidden_states,
110
- attentions=transformer_outputs.attentions,
111
- )
112
-
113
- # Load model
114
- model_name = "infly/INF-ORM-Llama3.1-70B"
115
- orm = INFORMForSequenceClassification.from_pretrained(
116
- model_name,
117
- torch_dtype=torch.bfloat16,
118
- device_map="auto",
119
- attn_implementation="flash_attention_2",
120
- num_labels=1,
121
- )
122
-
123
- # Load tokenizer
124
- tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name)
125
- conv1 = [ { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa, then he bought 3 more oranges. Finally, he divided all the oranges equally among himself and his 4 friends. How many oranges does each person get?", "role": "user" }, { "content": "Tom has 20 oranges. He gave 5 oranges to Lisa. 20 - 5 = 15. Tom now has 15 oranges. Tom bought 3 more oranges. 15 + 3 = 18. Tom now has 18 oranges. Tom divides the 18 oranges equally among himself and his 4 friends (a total of 5 people). 18 ÷ 5 = 3.6 oranges. Each person gets 3.6 oranges.", "role": "assistant" } ]
126
- conv2= [ { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa, then he bought 3 more oranges. Finally, he divided all the oranges equally among himself and his 4 friends. How many oranges does each person get?", "role": "user" }, { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa. 20 - 5 = 15. Tom now has 15 oranges. Tom bought 3 more oranges. 15 + 3 = 18. Tom now has 18 oranges. Tom divides the 18 oranges equally among his 4 friends (a total of 4 people). 18 ÷ 4 = 4.5 oranges. Each person gets 4.5 oranges.", "role": "assistant" } ]
127
- conv1_tokenized = tokenizer.apply_chat_template(conv1, tokenize=True, return_tensors="pt").to("cuda")
128
- conv2_tokenized = tokenizer.apply_chat_template(conv2, tokenize=True, return_tensors="pt").to("cuda")
129
-
130
- # Inference
131
- with torch.no_grad():
132
- score1 = orm(conv1_tokenized).logits[0][0].item()
133
- score2 = orm(conv2_tokenized).logits[0][0].item()
134
- print(f"Score for response 1: {score1}")
135
- print(f"Score for response 2: {score2}")
136
-
137
- # Output:
138
-
139
- # Score for response 1: 4.96875
140
- # Score for response 2: 2.890625
141
-
142
- ```
143
-
144
- ## Declaration and License Agreement
145
-
146
- ### Declaration
147
-
148
- ### License Agreement
149
-
150
- ## Contact
151
- If you have any questions, please feel free to reach us at <23210720070@m.fudan.edu.cn>.
152
- ## Citation
153
-
154
-
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: meta-llama/Llama-3.1-70B-Instruct
4
+ datasets:
5
+ - infly/INF-ORM-Preference-Magnitude-80K
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+
10
+ # INF Outcome Reward Model
11
+ ## Introduction
12
+
13
+ [**INF-ORM-Llama3.1-70B**](https://huggingface.co/Skywork/Skywork-Reward-Gemma-2-27B-v0.2) is the outcome reward model roughly built on the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) architecture and trained with the dataset [INF-ORM-Preference-Magnitude-80K](https://huggingface.co/datasets/infly/INF-ORM-Preference-Magnitude-80K).
14
+
15
+ **Note: Train Details are coming soon!**
16
+
17
+ ## RewardBench Leaderboard
18
+
19
+ We evaluate our model on [RewardBench](https://huggingface.co/spaces/allenai/reward-bench) using the [official test script](https://github.com/allenai/reward-bench) locally. As of December 2024, INF-ORM-Llama3.1-70B ranks first on the RewardBench leaderboard.
20
+
21
+ | Rank | Model | Model Type | Score | Chat | Chat Hard | Safety | Reasoning |
22
+ | :---: | -------------------------------------------- | ----------------- | :---: | :---: | :-------: | :----: | :-------: |
23
+ | 1 | **infly/INF-ORM-Llama3.1-70B** | Custom Classifier | 95.2 | 96.9 | 91.0 | 93.8 | 99.1 |
24
+ | 2 | Skywork/Skywork-Reward-Gemma-2-27B-v0.2 | Seq. Classifier | 94.3 | 96.1 | 89.9 | 93.0 | 98.1 |
25
+ | 3 | nvidia/Llama-3.1-Nemotron-70B-Reward | Custom Classifier | 94.1 | 97.5 | 85.7 | 95.1 | 98.1 |
26
+ | 4 | Skywork/Skywork-Reward-Gemma-2-27B | Seq. Classifier | 93.8 | 95.8 | 91.4 | 91.9 | 96.1 |
27
+ | 5 | SF-Foundation/TextEval-Llama3.1-70B | Generative | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
28
+ | 6 | meta-metrics/MetaMetrics-RM-v1.0 | Custom Classifier | 93.4 | 98.3 | 86.4 | 90.8 | 98.2 |
29
+ | 7 | Skywork/Skywork-Critic-Llama-3.1-70B | Generative | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
30
+ | 8 | Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 | Seq. Classifier | 93.1 | 94.7 | 88.4 | 92.7 | 96.7 |
31
+ | 9 | nicolinho/QRM-Llama3.1-8B | Seq. Classifier | 93.1 | 94.4 | 89.7 | 92.3 | 95.8 |
32
+ | 10 | LxzGordon/URM-LLaMa-3.1-8B | Seq. Classifier | 92.9 | 95.5 | 88.2 | 91.1 | 97.0 |
33
+
34
+ ## Demo Code
35
+
36
+ We provide an example usage of the INF-ORM-Llama3.1-70B below.
37
+ Below is an example of obtaining the reward scores of two conversations.
38
+
39
+ ```python
40
+ from typing import List, Optional, Union
41
+
42
+ import torch
43
+ import torch.nn as nn
44
+ from transformers import LlamaPreTrainedModel, LlamaModel, PreTrainedTokenizerFast
45
+ from transformers.modeling_outputs import SequenceClassifierOutputWithPast
46
+
47
+ class INFORMForSequenceClassification(LlamaPreTrainedModel):
48
+ def __init__(self, config):
49
+ super().__init__(config)
50
+ self.num_labels = config.num_labels
51
+ self.model = LlamaModel(config)
52
+ self.score = nn.Sequential(
53
+ nn.Linear(config.hidden_size, config.hidden_size),
54
+ nn.ReLU(),
55
+ nn.Linear(config.hidden_size, self.num_labels)
56
+ )
57
+ # Initialize weights and apply final processing
58
+ self.post_init()
59
+
60
+ def forward(
61
+ self,
62
+ input_ids: Optional[torch.LongTensor] = None,
63
+ attention_mask: Optional[torch.Tensor] = None,
64
+ position_ids: Optional[torch.LongTensor] = None,
65
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
66
+ inputs_embeds: Optional[torch.FloatTensor] = None,
67
+ labels: Optional[torch.LongTensor] = None,
68
+ use_cache: Optional[bool] = None,
69
+ output_attentions: Optional[bool] = None,
70
+ output_hidden_states: Optional[bool] = None,
71
+ return_dict: Optional[bool] = None,
72
+ ):
73
+
74
+ transformer_outputs = self.model(
75
+ input_ids,
76
+ attention_mask=attention_mask,
77
+ position_ids=position_ids,
78
+ past_key_values=past_key_values,
79
+ inputs_embeds=inputs_embeds,
80
+ )
81
+ hidden_states = transformer_outputs[0]
82
+ logits = self.score(hidden_states)
83
+
84
+ if input_ids is not None:
85
+ batch_size = input_ids.shape[0]
86
+ else:
87
+ batch_size = inputs_embeds.shape[0]
88
+
89
+ if self.config.pad_token_id is None and batch_size != 1:
90
+ raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
91
+ if self.config.pad_token_id is None:
92
+ sequence_lengths = -1
93
+ else:
94
+ if input_ids is not None:
95
+ # if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
96
+ sequence_lengths = torch.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
97
+ sequence_lengths = sequence_lengths % input_ids.shape[-1]
98
+ sequence_lengths = sequence_lengths.to(logits.device)
99
+ else:
100
+ sequence_lengths = -1
101
+
102
+ pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
103
+
104
+ loss = None
105
+ return SequenceClassifierOutputWithPast(
106
+ loss=loss,
107
+ logits=pooled_logits,
108
+ past_key_values=transformer_outputs.past_key_values,
109
+ hidden_states=transformer_outputs.hidden_states,
110
+ attentions=transformer_outputs.attentions,
111
+ )
112
+
113
+ # Load model
114
+ model_name = "infly/INF-ORM-Llama3.1-70B"
115
+ orm = INFORMForSequenceClassification.from_pretrained(
116
+ model_name,
117
+ torch_dtype=torch.bfloat16,
118
+ device_map="auto",
119
+ attn_implementation="flash_attention_2",
120
+ num_labels=1,
121
+ )
122
+
123
+ # Load tokenizer
124
+ tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name)
125
+ conv1 = [ { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa, then he bought 3 more oranges. Finally, he divided all the oranges equally among himself and his 4 friends. How many oranges does each person get?", "role": "user" }, { "content": "Tom has 20 oranges. He gave 5 oranges to Lisa. 20 - 5 = 15. Tom now has 15 oranges. Tom bought 3 more oranges. 15 + 3 = 18. Tom now has 18 oranges. Tom divides the 18 oranges equally among himself and his 4 friends (a total of 5 people). 18 ÷ 5 = 3.6 oranges. Each person gets 3.6 oranges.", "role": "assistant" } ]
126
+ conv2= [ { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa, then he bought 3 more oranges. Finally, he divided all the oranges equally among himself and his 4 friends. How many oranges does each person get?", "role": "user" }, { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa. 20 - 5 = 15. Tom now has 15 oranges. Tom bought 3 more oranges. 15 + 3 = 18. Tom now has 18 oranges. Tom divides the 18 oranges equally among his 4 friends (a total of 4 people). 18 ÷ 4 = 4.5 oranges. Each person gets 4.5 oranges.", "role": "assistant" } ]
127
+ conv1_tokenized = tokenizer.apply_chat_template(conv1, tokenize=True, return_tensors="pt").to("cuda")
128
+ conv2_tokenized = tokenizer.apply_chat_template(conv2, tokenize=True, return_tensors="pt").to("cuda")
129
+
130
+ # Inference
131
+ with torch.no_grad():
132
+ score1 = orm(conv1_tokenized).logits[0][0].item()
133
+ score2 = orm(conv2_tokenized).logits[0][0].item()
134
+ print(f"Score for response 1: {score1}")
135
+ print(f"Score for response 2: {score2}")
136
+
137
+ # Output:
138
+
139
+ # Score for response 1: 4.96875
140
+ # Score for response 2: 2.890625
141
+
142
+ ```
143
+
144
+ ## Declaration and License Agreement
145
+
146
+ ### Declaration
147
+
148
+ ### License Agreement
149
+
150
+ ## Contact
151
+ If you have any questions, please feel free to reach us at <23210720070@m.fudan.edu.cn>.
152
+ ## Citation
153
+
154
+