File size: 25,820 Bytes
a3c5261
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db373f6
a3c5261
cbb1bf8
a3c5261
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59420b1
a3c5261
 
 
 
 
 
64f30a4
d90c0c9
59420b1
a3c5261
64f30a4
d90c0c9
59420b1
a3c5261
 
85e7068
 
d90c0c9
a3c5261
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: apache-2.0
datasets:
- argilla/dolly-curated-comparison-falcon-7b-instruct
language:
- en
metrics:
- accuracy
library_name: transformers
tags:
- rlhf
- reward-model
---

# roberta-base-reward-model-falcon-dolly: An experimental reward model built with Dolly curated and Falcon 

This is an experimental Reward Model trained with TRL using comparison data from the Dolly v2 dataset and generations from Falcon-7b-instruct.

For testing purposes, we have followed the **assumption that human written responses (written by Databricks employees) are preferred to those generated by Falcon**. This might not always be the case but you can setup a comparison data collection with [Argilla](https://docs.argilla.io/en/latest/guides/llms/conceptual_guides/conceptual_guides.html) to gather real feedback about preferred responses.

To use this model for scoring:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("argilla/roberta-base-reward-model-falcon-dolly")

model = AutoModelForSequenceClassification.from_pretrained("argilla/roberta-base-reward-model-falcon-dolly")

def get_score(model, tokenizer, prompt, response):
    # Tokenize the input sequences
    inputs = tokenizer.encode_plus(prompt, response, truncation=True, padding="max_length", max_length=512, return_tensors="pt")

    # Perform forward pass
    with torch.no_grad():
        outputs = model(**inputs)

    # Extract the logits
    logits = outputs.logits

    return logits.item()

# Example usage
prompt = "What is Depreciation"
example_less_pref_response = "What is Depreciation – 10 Important Facts to Know? When a business buys a new asset, the purchase price of that asset is depreciated over time to reflect its usage and eventual obsolescence. Depreciation expense can be a tax deductible expense and is usually a non-cash expense reported on a company’s income statement and balance sheet. The amount of depreciation expense a company reports each year is the difference between the original purchase price of the asset and what the current value of that asset might be. Here are 10 important facts to know about depreciation: 1. Depreciation is a non-cash expense. It is an expense that is reported in a business’s income statement and balance sheet and not a cash flow expense. 2. Depreciation is an accounting standard and it is required to be disclosed in a business’s financial statements. 3. The amount of depreciation is usually a tax expense and not a cash expense reported on a company’s income statement"
example_preferred_response = "Depreciation is the drop in value of an asset due to wear and tear, age and obsolescence (going out of date) as recorded in an organization's financial records."

score = get_score(model, tokenizer, prompt, example_less_pref_response)
print(score)
# >> -3.915163993835449

score = get_score(model, tokenizer, prompt, example_preferred_response)
print(score)
# >> 7.460323333740234
```

## Scored examples

Below you can find the score of some examples (remember the RM has been trained to prefer the original response):


| Prompt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Original Response                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Response-1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |   Score Original |   Score Response-1 |
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------:|-------------------:|
| What is Depreciation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Depreciation is the drop in value of an asset due to wear and tear, age and obsolescence (going out of date) as recorded in an organization's financial records.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | What is Depreciation – 10 Important Facts to Know?<br>When a business buys a new asset, the purchase price of that asset is depreciated over time to reflect its usage and eventual obsolescence. Depreciation expense can be a tax deductible expense and is usually a non-cash expense reported on a company’s income statement and balance sheet. The amount of depreciation expense a company reports each year is the difference between the original purchase price of the asset and what the current value of that asset might be. Here are 10 important facts to know about depreciation:<br>1. Depreciation is a non-cash expense. It is an expense that is reported in a business’s income statement and balance sheet and not a cash flow expense.<br>2. Depreciation is an accounting standard and it is required to be disclosed in a business’s financial statements.<br>3. The amount of depreciation is usually a tax expense and not a cash expense reported on a company’s income statement |         1.9764   |           -9.01173 |
| What do you know about the city of Aberdeen in Scotland?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Aberdeen is a city located in the North East of Scotland. As of 2020 it had a population of 200,000 people and is the third largest city in Scotland, behind Edinburgh (435,000) and Glasgow (610,000). Aberdeen has been referred to as "The Silver City" due to the high number of buildings in the city constructed of Granite from the local quarry, which may sparkle on the rare occasion that the sunlight hits the buildings.  Since the 1960's the city's economy has largely revolved around Oil & Gas due to its proximity to the North Sea Oil discovery.   Aberdeen has two universities; the University of Aberdeen and Robert Gordon University. Notable people who have been associated with Aberdeen include singer Annie Lennox, golfer Paul Lawrie and architect Archibald "Archie" Simpson. | Aberdeen, the city in which I've lived for the past 4 years, is the 3rd largest city in Scotland with a population of around 700,000. It has a long history, with evidence of human occupation dating back around 5000 years. The city is home to the famous "Rock" of Aberdeen (which is in fact the city itself) as well as several universities and is one of the biggest financial centers in the country.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |         7.03729  |           -6.4728  |
| Describe thunderstorm season in the United States and Canada.<br>Summer is traditionally associated with hot or warm weather. In Mediterranean climates, it is also associated with dry weather, while in other places (particularly in Eastern Asia because of the monsoon) it is associated with rainy weather. The wet season is the main period of vegetation growth within the savanna climate regime. Where the wet season is associated with a seasonal shift in the prevailing winds, it is known as a monsoon.<br>In the northern Atlantic Ocean, a distinct tropical cyclone season occurs from 1 June to 30 November. The statistical peak of the Atlantic hurricane season is 10 September. The Northeast Pacific Ocean has a broader period of activity, but in a similar timeframe to the Atlantic. The Northwest Pacific sees tropical cyclones year-round, with a minimum in February and March and a peak in early September. In the North Indian basin, storms are most common from April to December, with peaks in May and November. In the Southern Hemisphere, the tropical cyclone season runs from the start of November until the end of April with peaks in mid-February to early March.<br>Thunderstorm season in the United States and Canada runs in the spring through summer but sometimes can run as late as October or even November in the fall. These storms can produce hail, strong winds and tornadoes, usually during the afternoon and evening. | Thunderstorm season in the United States and Canada runs in the spring through summer but sometimes can run as late as October or even November in the fall. These storms can produce hail, strong winds and tornadoes, usually during the afternoon and evening.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Describe thunderstorm season in the United States and Canada.<br>Thunderstorm season typically occurs in the summer months in the United States and Canada. It is characterized by the convective development of large, powerful thunderstorms. These storms often bring heavy rainfall, strong winds, hail, and lightning. Due to their intensity, thunderstorms can sometimes lead to severe weather and property damage. However, they are often a natural and necessary part of the ecosystem, helping to maintain the region's water supply and providing essential nutrients to plants through the release of precipitation.                                                                                                                                                                                                                                                                                                                                                                            |        -0.197739 |           -9.12217 |
| When did Peloton IPO?<br>On September 26, 2019, Peloton became a public company via an initial public offering, raising $1.16 billion and valuing the company at $8.1 billion. After reaching a market valuation of $50 billion in January 2021, the company's valuation was back to around $8 billion by April 2022.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Peloton became a public company via an initial public offering on September 26, 2019.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | When did Peloton IPO?<br>Peloton launched its initial public offering on November 12, 2019.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |        -5.36242  |           -8.97019 |
| What is the best way to answer an interview question?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | The first recommended step is to ask clarifying questions to ensure you understand the question correctly and gather the requirements needed to answer effectively. If it is a mathematical or computer science question, go ahead and work your way through the problem while you verbalize your thought process along the way. If it is a situation question, you should use the STAR method. 1. Situation - set the scene and provide context, 2. Task - describe what was needed to be done and why, 3. Action - explain what action you took, and 4. Result - describe the result of your actions and what you learned from the experience                                                                                                                                                                 | The best way to answer an interview question may depend on the specific question being asked and the job you are applying for. However, some tips include staying calm, maintaining eye contact, being honest and confident about your abilities, and using specific examples or anecdotes to answer questions. It's also important to be aware of your nonverbal communication and how that may impact your answer.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |        -7.57853  |           -8.82935 |