tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

Loss: 0.6803
Rewards/chosen: -0.1265
Rewards/rejected: -0.1560
Rewards/accuracies: 0.6036
Rewards/margins: 0.0295
Logps/rejected: -78.7771
Logps/chosen: -71.3634
Logits/rejected: -2.9512
Logits/chosen: -2.9570

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-08
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6931	0.0689	100	0.6932	-0.0001	0.0001	0.4793	-0.0001	-63.1744	-58.7172	-3.1574	-3.1630
0.6932	0.1378	200	0.6931	0.0001	0.0001	0.4956	0.0000	-63.1716	-58.7029	-3.1576	-3.1633
0.693	0.2068	300	0.6932	0.0001	0.0002	0.4724	-0.0001	-63.1577	-58.7002	-3.1575	-3.1632
0.693	0.2757	400	0.6931	0.0003	0.0003	0.5007	0.0000	-63.1547	-58.6827	-3.1569	-3.1625
0.6927	0.3446	500	0.6931	0.0006	0.0004	0.5128	0.0002	-63.1359	-58.6518	-3.1563	-3.1619
0.6922	0.4135	600	0.6930	0.0009	0.0005	0.5358	0.0004	-63.1295	-58.6249	-3.1544	-3.1600
0.692	0.4824	700	0.6928	0.0015	0.0008	0.5516	0.0007	-63.0973	-58.5609	-3.1522	-3.1578
0.6911	0.5513	800	0.6926	0.0018	0.0006	0.5634	0.0012	-63.1172	-58.5317	-3.1497	-3.1553
0.6903	0.6203	900	0.6923	0.0019	0.0002	0.5641	0.0017	-63.1634	-58.5242	-3.1456	-3.1513
0.6899	0.6892	1000	0.6920	0.0016	-0.0008	0.5676	0.0024	-63.2556	-58.5502	-3.1411	-3.1467
0.6898	0.7581	1100	0.6916	0.0011	-0.0021	0.5802	0.0032	-63.3925	-58.6040	-3.1359	-3.1415
0.689	0.8270	1200	0.6913	0.0000	-0.0038	0.5753	0.0038	-63.5565	-58.7099	-3.1316	-3.1371
0.6881	0.8959	1300	0.6910	-0.0015	-0.0061	0.5804	0.0046	-63.7902	-58.8624	-3.1268	-3.1325
0.6874	0.9649	1400	0.6907	-0.0037	-0.0088	0.5825	0.0051	-64.0628	-59.0799	-3.1213	-3.1269
0.6867	1.0338	1500	0.6903	-0.0063	-0.0124	0.5843	0.0061	-64.4169	-59.3381	-3.1142	-3.1198
0.6857	1.1027	1600	0.6899	-0.0097	-0.0166	0.5876	0.0069	-64.8429	-59.6860	-3.1081	-3.1137
0.6843	1.1716	1700	0.6895	-0.0148	-0.0227	0.5804	0.0078	-65.4468	-60.1953	-3.1013	-3.1070
0.6842	1.2405	1800	0.6890	-0.0219	-0.0309	0.5871	0.0089	-66.2668	-60.9047	-3.0944	-3.1001
0.6802	1.3094	1900	0.6886	-0.0263	-0.0362	0.5920	0.0098	-66.7954	-61.3438	-3.0883	-3.0940
0.6824	1.3784	2000	0.6881	-0.0324	-0.0436	0.5939	0.0112	-67.5355	-61.9519	-3.0814	-3.0871
0.6799	1.4473	2100	0.6875	-0.0387	-0.0510	0.5992	0.0123	-68.2835	-62.5824	-3.0754	-3.0811
0.6793	1.5162	2200	0.6872	-0.0420	-0.0551	0.5913	0.0131	-68.6940	-62.9161	-3.0698	-3.0755
0.6797	1.5851	2300	0.6868	-0.0485	-0.0626	0.5918	0.0141	-69.4427	-63.5627	-3.0623	-3.0680
0.6792	1.6540	2400	0.6863	-0.0512	-0.0663	0.5939	0.0151	-69.8102	-63.8365	-3.0547	-3.0604
0.6775	1.7229	2500	0.6860	-0.0552	-0.0710	0.5946	0.0158	-70.2800	-64.2325	-3.0488	-3.0546
0.6768	1.7919	2600	0.6856	-0.0598	-0.0766	0.5936	0.0169	-70.8443	-64.6883	-3.0412	-3.0469
0.675	1.8608	2700	0.6851	-0.0654	-0.0832	0.5948	0.0178	-71.4996	-65.2471	-3.0345	-3.0402
0.6736	1.9297	2800	0.6847	-0.0707	-0.0896	0.5983	0.0189	-72.1448	-65.7864	-3.0286	-3.0344
0.6773	1.9986	2900	0.6844	-0.0746	-0.0943	0.6020	0.0196	-72.6052	-66.1758	-3.0225	-3.0283
0.6724	2.0675	3000	0.6841	-0.0793	-0.0997	0.6029	0.0204	-73.1465	-66.6415	-3.0158	-3.0216
0.674	2.1365	3100	0.6837	-0.0824	-0.1036	0.6029	0.0212	-73.5381	-66.9540	-3.0112	-3.0169
0.6764	2.2054	3200	0.6834	-0.0857	-0.1076	0.6066	0.0219	-73.9390	-67.2856	-3.0047	-3.0105
0.6749	2.2743	3300	0.6831	-0.0887	-0.1113	0.6069	0.0226	-74.3103	-67.5846	-2.9991	-3.0049
0.6746	2.3432	3400	0.6828	-0.0921	-0.1154	0.6055	0.0233	-74.7230	-67.9247	-2.9944	-3.0002
0.6718	2.4121	3500	0.6824	-0.0962	-0.1204	0.6069	0.0242	-75.2213	-68.3350	-2.9890	-2.9948
0.672	2.4810	3600	0.6822	-0.1013	-0.1261	0.6048	0.0248	-75.7936	-68.8439	-2.9844	-2.9902
0.6733	2.5500	3700	0.6820	-0.1048	-0.1302	0.6032	0.0254	-76.1958	-69.1902	-2.9800	-2.9858
0.6715	2.6189	3800	0.6817	-0.1077	-0.1336	0.6046	0.0260	-76.5409	-69.4776	-2.9765	-2.9823
0.6709	2.6878	3900	0.6816	-0.1102	-0.1366	0.6020	0.0264	-76.8374	-69.7330	-2.9729	-2.9787
0.6696	2.7567	4000	0.6814	-0.1132	-0.1400	0.6032	0.0268	-77.1831	-70.0346	-2.9698	-2.9756
0.6687	2.8256	4100	0.6812	-0.1154	-0.1427	0.6048	0.0273	-77.4501	-70.2526	-2.9670	-2.9729
0.6692	2.8946	4200	0.6810	-0.1166	-0.1443	0.6073	0.0277	-77.6081	-70.3715	-2.9649	-2.9708
0.6742	2.9635	4300	0.6809	-0.1184	-0.1463	0.6027	0.0279	-77.8100	-70.5513	-2.9629	-2.9687
0.6652	3.0324	4400	0.6808	-0.1191	-0.1473	0.6090	0.0282	-77.9141	-70.6218	-2.9606	-2.9664
0.6659	3.1013	4500	0.6807	-0.1206	-0.1490	0.6046	0.0284	-78.0785	-70.7742	-2.9587	-2.9645
0.666	3.1702	4600	0.6805	-0.1225	-0.1512	0.6062	0.0288	-78.3027	-70.9582	-2.9569	-2.9628
0.6644	3.2391	4700	0.6805	-0.1237	-0.1527	0.6059	0.0290	-78.4454	-71.0785	-2.9557	-2.9615
0.6685	3.3081	4800	0.6804	-0.1246	-0.1536	0.6053	0.0291	-78.5441	-71.1674	-2.9547	-2.9605
0.6651	3.3770	4900	0.6803	-0.1250	-0.1542	0.6039	0.0293	-78.6030	-71.2072	-2.9539	-2.9598
0.6689	3.4459	5000	0.6803	-0.1254	-0.1547	0.6062	0.0293	-78.6476	-71.2503	-2.9530	-2.9588
0.6653	3.5148	5100	0.6802	-0.1256	-0.1552	0.6050	0.0296	-78.6955	-71.2721	-2.9525	-2.9583
0.6664	3.5837	5200	0.6803	-0.1261	-0.1556	0.6046	0.0295	-78.7380	-71.3226	-2.9519	-2.9577
0.6687	3.6527	5300	0.6803	-0.1265	-0.1559	0.6064	0.0294	-78.7701	-71.3572	-2.9516	-2.9574
0.6641	3.7216	5400	0.6803	-0.1266	-0.1560	0.6059	0.0294	-78.7822	-71.3690	-2.9514	-2.9573
0.6637	3.7905	5500	0.6803	-0.1265	-0.1559	0.6053	0.0295	-78.7736	-71.3579	-2.9516	-2.9575
0.6694	3.8594	5600	0.6802	-0.1265	-0.1561	0.6036	0.0296	-78.7869	-71.3611	-2.9515	-2.9574
0.6684	3.9283	5700	0.6803	-0.1266	-0.1560	0.6071	0.0294	-78.7792	-71.3707	-2.9512	-2.9571
0.6668	3.9972	5800	0.6803	-0.1265	-0.1560	0.6036	0.0295	-78.7771	-71.3634	-2.9512	-2.9570

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1

martimfasantos
/

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old

Evaluation results