tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_2epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

Loss: 0.6891
Rewards/chosen: -0.0201
Rewards/rejected: -0.0288
Rewards/accuracies: 0.5911
Rewards/margins: 0.0087
Logps/rejected: -66.0638
Logps/chosen: -60.7225
Logits/rejected: -3.0949
Logits/chosen: -3.1006

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-08
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6931	0.0689	100	0.6931	0.0001	0.0001	0.5023	0.0000	-63.1703	-58.7007	-3.1577	-3.1633
0.6931	0.1378	200	0.6932	0.0001	0.0002	0.4875	-0.0001	-63.1621	-58.7010	-3.1575	-3.1632
0.6929	0.2068	300	0.6931	0.0004	0.0003	0.5149	0.0001	-63.1505	-58.6712	-3.1569	-3.1625
0.6927	0.2757	400	0.6930	0.0007	0.0005	0.5258	0.0003	-63.1350	-58.6397	-3.1555	-3.1611
0.692	0.3446	500	0.6929	0.0012	0.0007	0.5246	0.0005	-63.1102	-58.5951	-3.1536	-3.1592
0.6915	0.4135	600	0.6927	0.0016	0.0007	0.5504	0.0009	-63.1105	-58.5481	-3.1508	-3.1564
0.6912	0.4824	700	0.6924	0.0019	0.0004	0.5671	0.0015	-63.1424	-58.5229	-3.1481	-3.1538
0.69	0.5513	800	0.6922	0.0019	-0.0000	0.5760	0.0019	-63.1839	-58.5249	-3.1444	-3.1500
0.6893	0.6203	900	0.6919	0.0017	-0.0008	0.5709	0.0025	-63.2630	-58.5425	-3.1403	-3.1459
0.6892	0.6892	1000	0.6917	0.0011	-0.0020	0.5725	0.0030	-63.3758	-58.6063	-3.1361	-3.1418
0.6892	0.7581	1100	0.6914	0.0002	-0.0034	0.5809	0.0036	-63.5250	-58.6939	-3.1313	-3.1369
0.6885	0.8270	1200	0.6911	-0.0007	-0.0050	0.5755	0.0043	-63.6802	-58.7853	-3.1282	-3.1338
0.6877	0.8959	1300	0.6908	-0.0024	-0.0073	0.5781	0.0048	-63.9072	-58.9567	-3.1223	-3.1280
0.6874	0.9649	1400	0.6907	-0.0040	-0.0092	0.5771	0.0053	-64.1026	-59.1085	-3.1205	-3.1262
0.6871	1.0338	1500	0.6904	-0.0055	-0.0113	0.5825	0.0058	-64.3106	-59.2603	-3.1153	-3.1210
0.6863	1.1027	1600	0.6902	-0.0075	-0.0138	0.5888	0.0063	-64.5576	-59.4592	-3.1122	-3.1179
0.6854	1.1716	1700	0.6900	-0.0096	-0.0163	0.5867	0.0067	-64.8090	-59.6681	-3.1086	-3.1143
0.6855	1.2405	1800	0.6898	-0.0120	-0.0192	0.5827	0.0072	-65.0974	-59.9114	-3.1070	-3.1126
0.6824	1.3094	1900	0.6897	-0.0139	-0.0213	0.5825	0.0074	-65.3089	-60.1001	-3.1034	-3.1091
0.6851	1.3784	2000	0.6895	-0.0155	-0.0234	0.5906	0.0079	-65.5166	-60.2616	-3.1014	-3.1071
0.6834	1.4473	2100	0.6895	-0.0167	-0.0247	0.5862	0.0080	-65.6501	-60.3842	-3.0998	-3.1055
0.6828	1.5162	2200	0.6894	-0.0179	-0.0261	0.5874	0.0082	-65.7914	-60.5049	-3.0984	-3.1041
0.6833	1.5851	2300	0.6892	-0.0188	-0.0273	0.5901	0.0085	-65.9073	-60.5933	-3.0973	-3.1030
0.6835	1.6540	2400	0.6892	-0.0193	-0.0279	0.5862	0.0086	-65.9739	-60.6469	-3.0961	-3.1018
0.6826	1.7229	2500	0.6892	-0.0197	-0.0283	0.5850	0.0086	-66.0099	-60.6819	-3.0956	-3.1013
0.6825	1.7919	2600	0.6891	-0.0198	-0.0285	0.5890	0.0088	-66.0344	-60.6882	-3.0949	-3.1007
0.6823	1.8608	2700	0.6891	-0.0200	-0.0287	0.5890	0.0087	-66.0526	-60.7165	-3.0949	-3.1006
0.6816	1.9297	2800	0.6891	-0.0201	-0.0289	0.5841	0.0088	-66.0728	-60.7263	-3.0951	-3.1008
0.6836	1.9986	2900	0.6891	-0.0201	-0.0288	0.5911	0.0087	-66.0638	-60.7225	-3.0949	-3.1006

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1

martimfasantos
/

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_2epochs_old

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_2epochs_old

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_2epochs_old

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_2epochs_old

Evaluation results