Spaces:

allenai
/

reward-bench

Running

App Files Files Community

natolambert commited on Apr 22

Commit

b988a04

•

1 Parent(s): f4dca79

Update src/md.py

Browse files

Files changed (1) hide show

src/md.py +1 -10

src/md.py CHANGED Viewed

@@ -20,22 +20,13 @@ Once all subsets weighted averages are achieved, the final RewardBench score is
 We include multiple types of reward models in this evaluation:
 1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
 2. **Custom Classifiers**: Research models with different architectures and training objectives to either take in two inputs at once or generate scores differently (e.g. PairRM and Stanford SteamSHP).
-3. **DPO**: Models trained with Direct Preference Optimization (DPO), with modifiers such as `-ref-free` or `-norm` changing how scores are computed.
 4. **Random**: Random choice baseline.
 4. **Generative**: Prompting fine-tuned models to choose between two answers, similar to MT Bench and AlpacaEval.
 All models are evaluated in fp16 expect for Starling-7B, which is evaluated in fp32.
 Others, such as **Generative Judge** are coming soon.
-### Model Types
-Currently, we evaluate the following model types:
-1. **Sequence Classifiers**: A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
-2. **Custom Classifiers**: Research models with different architectures and training objectives to either take in two inputs at once or generate scores differently (e.g. PairRM and Stanford SteamSHP).
-3. **DPO**: Models trained with Direct Preference Optimization (DPO) with a reference model being either the base or supervised fine-tuning checkpoint.
-Support of DPO models without a reference model is coming soon.
 ### Subset Details
 Total number of the prompts is: 2985, filtered from 5123.

 We include multiple types of reward models in this evaluation:
 1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
 2. **Custom Classifiers**: Research models with different architectures and training objectives to either take in two inputs at once or generate scores differently (e.g. PairRM and Stanford SteamSHP).
+3. **DPO**: Models trained with Direct Preference Optimization (DPO), with modifiers such as `-ref-free` or `-norm` changing how scores are computed. *Note*: This also includes other models trained with implicit rewards, such as those trained with [KTO](https://arxiv.org/abs/2402.01306).
 4. **Random**: Random choice baseline.
 4. **Generative**: Prompting fine-tuned models to choose between two answers, similar to MT Bench and AlpacaEval.
 All models are evaluated in fp16 expect for Starling-7B, which is evaluated in fp32.
 Others, such as **Generative Judge** are coming soon.
 ### Subset Details
 Total number of the prompts is: 2985, filtered from 5123.