ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k - a G-reen Collection

G-reen 's Collections

ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k

ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k

updated Jul 2

Several trained models to compare the differences between each method. Each model has a complete description of hyperparams with wandb reports.

unsloth/mistral-7b-v0.2-bnb-4bit

Text Generation • Updated Sep 11 • 314 • 13

Note All training runs were done on this model (4 bit qlora). Go unsloth!
argilla/dpo-mix-7k

Viewer • Updated Jul 16 • 7.5k • 305 • 155

Note Used this entire dataset for training. For SFT, the rejected part of the dataset was ignored.
G-reen/EXPERIMENT-DPO-m7b2-1-merged

Text Generation • Updated Apr 15 • 66
Note The image shows a comparison between all the completed DPO runs.
G-reen/EXPERIMENT-DPO-m7b2-2-merged

Text Generation • Updated Apr 15 • 68

Note Probably the best loss curve at lr=5e-5.
G-reen/EXPERIMENT-DPO-m7b2-3-merged

Text Generation • Updated Apr 15 • 67

Note Failed to train, definitely do not use
G-reen/EXPERIMENT-DPO-m7b2-4-merged

Text Generation • Updated Apr 5 • 65
G-reen/EXPERIMENT-SFT-m7b2-1-merged

Text Generation • Updated Apr 15 • 5
Note The image shows a comparison between all the completed SFT runs.
G-reen/EXPERIMENT-SFT-m7b2-2-merged

Text Generation • Updated Apr 15 • 70

Note Probably the best loss curve at lr=5e-5.
G-reen/EXPERIMENT-SFT-m7b2-3-merged

Text Generation • Updated Apr 15 • 2
G-reen/EXPERIMENT-ORPO-m7b2-1-merged

Text Generation • Updated Apr 16 • 9
G-reen/EXPERIMENT-ORPO-m7b2-2-merged

Text Generation • Updated Apr 19 • 5