Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,79 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
A critic model trained on [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback). Please refer to [paper](https://arxiv.org/abs/2310.01377) and [Github](https://github.com/thunlp/UltraFeedback) for more details.
|
6 |
+
|
7 |
+
# Use Case
|
8 |
+
```python
|
9 |
+
ultracm_instruction_template = """Given my answer to an instruction, your role is to provide specific and constructive feedback for me. You should find the best way for me to learn from your feedback and improve my performance.
|
10 |
+
|
11 |
+
You should consider multiple aspects of my answer, including helpfulness, truthfulness, honesty, and to what extent the answer follows instructions.
|
12 |
+
---
|
13 |
+
|
14 |
+
### Instruction
|
15 |
+
{instruction}
|
16 |
+
|
17 |
+
### Answer
|
18 |
+
{completion}
|
19 |
+
---
|
20 |
+
|
21 |
+
Please act as a teacher and provide specific and constructive feedback. Besides describing the weaknesses of the answer, you should also provide specific suggestions to guide me toward understanding how to improve. Please note, however, that your suggestions should help me better complete the instructions, but you should not introduce new requirements that are not mentioned in the instructions. Your feedback should focus on enhancing my ability to think critically and respond accurately. However, never explicitly provide the reference answer, nor do polite phrases be required. Only respond with concise feedback in chat style. Finally, score the overall quality of the answer from 1 to 10, where 1 is the worst and 10 is the best.
|
22 |
+
|
23 |
+
*Format*
|
24 |
+
### Feedback
|
25 |
+
Overall Score: [1-10]
|
26 |
+
[Your feedback]
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
### Feedback
|
31 |
+
Overall Score:
|
32 |
+
"""
|
33 |
+
|
34 |
+
import pandas as pd
|
35 |
+
dataset = [
|
36 |
+
{"instruction":"What if the Beatles had never formed as a band?",
|
37 |
+
"completion":"The Beatles are widely credited with starting the British Invasion, a wave of rock and roll bands from the United Kingdom that became popular in America in the 1960s. If the Beatles had never formed, this musical movement may have never happened, and the world may have been exposed to a different kind of music. So, without the Beatles\u2019 fame and success, other bands wouldn\u2019t have been able to break into the American music scene and create a wider range of sounds. We could have ended up with a less interesting mix of songs playing on the radio."},
|
38 |
+
{"source":"fair_eval","instruction":"Given that f(x) = 5x^3 - 2x + 3, find the value of f(2).","completion":"The problem is asking us to find the value of the function f(x) = 5x^3 - 2x + 3 at the point x = 2. \n\nStep 1: Substitute x with 2 in the function\nWe replace every x in the function with 2. This gives us:\nf(2) = 5(2)^3 - 2(2) + 3\n\nStep 2: Simplify the expression\nNext, we simplify the expression by performing the operations in order from left to right.\n\nFirst, calculate the cube of 2, which is 8. Substitute this back into the expression:\nf(2) = 5(8) - 4 + 3\n\nThen, multiply 5 by 8 which gives us 40:\nf(2) = 40 - 4 + 3\n\nFinally, subtract 4 from 40 which gives us 36, and then add 3 to that:\nf(2) = 36 + 3\n\nStep 3: Final calculation\nNow, add 36 and 3 together:\nf(2) = 39\n\nSo, the value of the function f(x) = 5x^3 - 2x + 3 at the point x = 2 is 39."}
|
39 |
+
]
|
40 |
+
|
41 |
+
|
42 |
+
import torch
|
43 |
+
def generate_feedback(example):
|
44 |
+
system_prompt = "User: A one-turn chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, very detailed, and polite answers to the user's questions.</s>"
|
45 |
+
conv = [system_prompt]
|
46 |
+
conv.append("User: " + ultracm_instruction_template.format(
|
47 |
+
instruction=example["instruction"],
|
48 |
+
completion=example["completion"],
|
49 |
+
) + "</s>")
|
50 |
+
conv.append("Assistant: ")
|
51 |
+
prompt = "\n".join(conv)
|
52 |
+
|
53 |
+
with torch.no_grad():
|
54 |
+
response = generator(prompt, num_return_sequences=1, return_full_text=False, handle_long_generation="hole", temperature=1.0, top_p=1.0, max_new_tokens=1024, repetition_penalty=1.2, do_sample=True)
|
55 |
+
response = response[0]["generated_text"].strip("\n").strip()
|
56 |
+
|
57 |
+
print(response)
|
58 |
+
|
59 |
+
from transformers import pipeline, LlamaTokenizer, LlamaForCausalLM
|
60 |
+
tokenizer = LlamaTokenizer.from_pretrained("openbmb/UltraCM-13b")
|
61 |
+
model = LlamaForCausalLM.from_pretrained("openbmb/UltraCM-13b", device_map="auto")
|
62 |
+
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
63 |
+
|
64 |
+
for example in dataset:
|
65 |
+
generate_feedback(example)
|
66 |
+
```
|
67 |
+
|
68 |
+
|
69 |
+
# Citation
|
70 |
+
```
|
71 |
+
@misc{cui2023ultrafeedback,
|
72 |
+
title={UltraFeedback: Boosting Language Models with High-quality Feedback},
|
73 |
+
author={Ganqu Cui and Lifan Yuan and Ning Ding and Guanming Yao and Wei Zhu and Yuan Ni and Guotong Xie and Zhiyuan Liu and Maosong Sun},
|
74 |
+
year={2023},
|
75 |
+
eprint={2310.01377},
|
76 |
+
archivePrefix={arXiv},
|
77 |
+
primaryClass={cs.CL}
|
78 |
+
}
|
79 |
+
```
|