tlc4418's picture
Update README.md
7b927d2 verified
metadata
datasets:
  - tatsu-lab/alpaca_farm

1.4b Pythia model after SFT on the AlpacaFarm dataset 'sft' split.

Policy model from 'Reward Model Ensembles Mitigate Overoptimization'