phi-ft-1000000-fp-newsplit

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on the generator dataset. It achieves the following results on the evaluation set:

Loss: 1.7754

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 0
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.2
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
3.1002	0.0114	100	3.0505
2.1929	0.0229	200	2.0493
1.6369	0.0343	300	1.6432
1.4618	0.0458	400	1.5580
1.317	0.0572	500	1.5410
1.1329	0.0687	600	1.6269
0.9505	0.0801	700	1.7387
0.8334	0.0916	800	1.7443
0.7692	0.1030	900	1.7634
0.6983	0.1145	1000	1.7546
0.6859	0.1259	1100	1.7593
0.6671	0.1374	1200	1.7647
0.6285	0.1488	1300	1.7951
0.6121	0.1603	1400	1.7816
0.5923	0.1717	1500	1.8132
0.5908	0.1832	1600	1.7664
0.5662	0.1946	1700	1.8307
0.5637	0.2060	1800	1.7864
0.5475	0.2175	1900	1.7988
0.5421	0.2289	2000	1.7876
0.529	0.2404	2100	1.7661
0.5202	0.2518	2200	1.7709
0.5287	0.2633	2300	1.7681
0.514	0.2747	2400	1.7765
0.5026	0.2862	2500	1.7931
0.5038	0.2976	2600	1.7808
0.5052	0.3091	2700	1.7689
0.4918	0.3205	2800	1.7862
0.4817	0.3320	2900	1.7916
0.4806	0.3434	3000	1.7796
0.4849	0.3549	3100	1.7654
0.4784	0.3663	3200	1.7576
0.4712	0.3777	3300	1.7746
0.4715	0.3892	3400	1.7568
0.4608	0.4006	3500	1.7424
0.4629	0.4121	3600	1.7561
0.4591	0.4235	3700	1.7498
0.4652	0.4350	3800	1.7366
0.461	0.4464	3900	1.7394
0.4469	0.4579	4000	1.7397
0.4521	0.4693	4100	1.7555
0.4498	0.4808	4200	1.7652
0.4541	0.4922	4300	1.7583
0.4594	0.5037	4400	1.7605
0.4514	0.5151	4500	1.7686
0.4395	0.5266	4600	1.7714
0.4384	0.5380	4700	1.7889
0.4392	0.5495	4800	1.7709
0.4495	0.5609	4900	1.7554
0.4375	0.5723	5000	1.7532
0.4441	0.5838	5100	1.7770
0.4458	0.5952	5200	1.7528
0.4343	0.6067	5300	1.7646
0.433	0.6181	5400	1.7689
0.4371	0.6296	5500	1.7738
0.4376	0.6410	5600	1.7633
0.4366	0.6525	5700	1.7810
0.43	0.6639	5800	1.7685
0.4345	0.6754	5900	1.7761
0.4379	0.6868	6000	1.7782
0.4294	0.6983	6100	1.7737
0.4441	0.7097	6200	1.7646
0.4396	0.7212	6300	1.7779
0.4307	0.7326	6400	1.7766
0.4331	0.7440	6500	1.7733
0.4326	0.7555	6600	1.7796
0.4286	0.7669	6700	1.7803
0.4294	0.7784	6800	1.7787
0.4294	0.7898	6900	1.7795
0.4364	0.8013	7000	1.7765
0.4414	0.8127	7100	1.7783
0.4336	0.8242	7200	1.7746
0.4324	0.8356	7300	1.7728
0.4414	0.8471	7400	1.7765
0.4288	0.8585	7500	1.7792
0.4359	0.8700	7600	1.7776
0.4242	0.8814	7700	1.7762
0.4413	0.8929	7800	1.7751
0.4402	0.9043	7900	1.7754
0.4452	0.9158	8000	1.7750
0.4346	0.9272	8100	1.7755
0.4396	0.9386	8200	1.7751
0.44	0.9501	8300	1.7752
0.4333	0.9615	8400	1.7753
0.4348	0.9730	8500	1.7754
0.4331	0.9844	8600	1.7752
0.4326	0.9959	8700	1.7754

Framework versions

PEFT 0.10.0
Transformers 4.40.0
Pytorch 2.3.0+cu121
Datasets 2.16.0
Tokenizers 0.19.1

KaranChand
/

phi-ft-1000000-fp-newsplit

phi-ft-1000000-fp-newsplit

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for KaranChand/phi-ft-1000000-fp-newsplit

Evaluation results