pythia-helpful-1epoch

lomahony 's Collections

updated Mar 12

Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.