metadata

base_model: TII-Frontier-Team/falcon3-3b-instruct
datasets:
  - TII-Frontier-Team/Reasoning_DPO
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-qlora
    results: []

zephyr-7b-dpo-qlora

This model is a fine-tuned version of TII-Frontier-Team/PEFT-falcon3b-it-gsm8k on the TII-Frontier-Team/Reasoning_DPO dataset. It achieves the following results on the evaluation set:

Loss: 0.0286
Rewards/chosen: -4.7078
Rewards/rejected: -10.6652
Rewards/accuracies: 0.9254
Rewards/margins: 5.9575
Logps/rejected: -1102.4209
Logps/chosen: -503.5470
Logits/rejected: 1.9412
Logits/chosen: 2.1408

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6914	0.0315	100	0.6912	0.0006	-0.0036	0.6340	0.0042	-36.2582	-32.7125	-1.6841	-1.6367
0.6743	0.0629	200	0.6753	-0.0009	-0.0462	0.6321	0.0454	-40.5232	-32.8573	-1.5154	-1.4649
0.6112	0.0944	300	0.5905	-0.5010	-0.8365	0.6631	0.3356	-119.5518	-82.8670	-0.5166	-0.4325
0.4477	0.1258	400	0.4026	-1.9267	-3.0850	0.7201	1.1583	-344.3972	-225.4428	-0.5023	-0.3494
0.3583	0.1573	500	0.3063	-2.4869	-4.1367	0.7646	1.6498	-449.5698	-281.4605	0.3124	0.4717
0.3041	0.1887	600	0.2405	-2.9070	-4.9732	0.7918	2.0662	-533.2189	-323.4665	0.9644	1.1113
0.2487	0.2202	700	0.1964	-3.4123	-5.8172	0.8209	2.4050	-617.6231	-373.9985	1.1343	1.2933
0.218	0.2517	800	0.1547	-3.6771	-6.6251	0.8336	2.9480	-698.4094	-400.4795	1.5710	1.7290
0.1858	0.2831	900	0.1394	-3.5484	-6.6808	0.8485	3.1324	-703.9799	-387.6123	1.6988	1.8631
0.173	0.3146	1000	0.1176	-3.4824	-6.7705	0.8649	3.2881	-712.9531	-381.0118	1.8190	1.9776
0.1494	0.3460	1100	0.0979	-3.7942	-7.4529	0.8713	3.6587	-781.1857	-412.1861	1.8179	1.9865
0.149	0.3775	1200	0.0817	-4.1856	-8.2504	0.8843	4.0648	-860.9355	-451.3316	1.8715	2.0581
0.1143	0.4089	1300	0.0702	-4.2444	-8.6154	0.8884	4.3710	-897.4431	-457.2141	1.7765	1.9770
0.1204	0.4404	1400	0.0642	-4.1442	-8.6112	0.8966	4.4670	-897.0154	-447.1863	2.1996	2.3734
0.1013	0.4718	1500	0.0580	-4.5031	-9.1159	0.8951	4.6128	-947.4904	-483.0838	1.9514	2.1364
0.1011	0.5033	1600	0.0567	-4.0373	-8.5779	0.9067	4.5406	-893.6846	-436.5011	1.9239	2.1103
0.0853	0.5348	1700	0.0482	-4.3119	-9.2927	0.9067	4.9808	-965.1708	-463.9637	2.0648	2.2336
0.0897	0.5662	1800	0.0449	-4.3018	-9.4275	0.9101	5.1257	-978.6490	-462.9552	1.9037	2.0822
0.0717	0.5977	1900	0.0402	-4.4391	-9.8395	0.9112	5.4004	-1019.8445	-476.6779	2.0003	2.1749
0.0487	0.6291	2000	0.0368	-5.4728	-11.3180	0.9078	5.8452	-1167.6968	-580.0486	1.9355	2.1422
0.0683	0.6606	2100	0.0356	-4.6736	-10.2835	0.9190	5.6099	-1064.2465	-500.1268	2.0206	2.2058
0.0514	0.6920	2200	0.0341	-4.6025	-10.2228	0.9209	5.6203	-1058.1812	-493.0187	1.9362	2.1272
0.0623	0.7235	2300	0.0326	-4.9398	-10.7061	0.9213	5.7663	-1106.5096	-526.7491	1.8240	2.0327
0.0693	0.7550	2400	0.0313	-4.8024	-10.6310	0.9231	5.8286	-1098.9999	-513.0095	1.8580	2.0583
0.0543	0.7864	2500	0.0303	-4.8132	-10.7352	0.9228	5.9221	-1109.4199	-514.0873	1.9534	2.1471
0.0555	0.8179	2600	0.0301	-4.7251	-10.5626	0.9261	5.8375	-1092.1620	-505.2810	1.9398	2.1357
0.0646	0.8493	2700	0.0294	-4.6930	-10.6307	0.9261	5.9377	-1098.9694	-502.0694	2.0003	2.1947
0.0546	0.8808	2800	0.0287	-4.8085	-10.8169	0.9250	6.0084	-1117.5887	-513.6258	1.9596	2.1607
0.0702	0.9122	2900	0.0288	-4.6970	-10.6904	0.9243	5.9934	-1104.9371	-502.4718	1.9696	2.1647
0.0623	0.9437	3000	0.0286	-4.7098	-10.6743	0.9269	5.9645	-1103.3302	-503.7507	1.9440	2.1437
0.0593	0.9751	3100	0.0287	-4.6985	-10.6531	0.9276	5.9547	-1101.2122	-502.6163	1.9469	2.1464

Framework versions

PEFT 0.13.0
Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.20.0