loyal-piano-m7-cdpo / README.md
chargoddard's picture
Update README.md
5f5a78b
metadata
license: cc-by-nc-4.0
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
language:
  - en

Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.

Some initial benchmark results:

Task Version Metric Value Stderr
hellaswag 0 acc 0.6621 ± 0.0047
acc_norm 0.8525 ± 0.0035
arc_challenge 0 acc 0.6348 ± 0.0141
acc_norm 0.6698 ± 0.0137
winogrande 0 acc 0.7861 ± 0.0115
gsm8k 0 acc 0.5694 ± 0.0136