long_t5_6

This model is a fine-tuned version of google/long-t5-tglobal-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.0450
Rouge1: 0.5157
Rouge2: 0.3356
Rougel: 0.4671
Rougelsum: 0.4673
Gen Len: 31.344

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
No log	1.0	250	1.6173	0.4644	0.29	0.4269	0.4268	25.406
2.1255	2.0	500	1.5596	0.4748	0.2986	0.4353	0.4354	26.834
2.1255	3.0	750	1.5241	0.4819	0.3074	0.4424	0.4423	25.6985
1.7318	4.0	1000	1.5178	0.4925	0.3161	0.4521	0.4521	26.513
1.7318	5.0	1250	1.5178	0.4975	0.3184	0.4555	0.4555	27.042
1.5463	6.0	1500	1.5168	0.5014	0.3255	0.4614	0.4618	25.815
1.5463	7.0	1750	1.5066	0.5054	0.3306	0.4653	0.4654	25.8755
1.4053	8.0	2000	1.5184	0.508	0.3311	0.4673	0.4673	26.246
1.4053	9.0	2250	1.5372	0.5095	0.3331	0.4669	0.4667	27.511
1.289	10.0	2500	1.5446	0.5078	0.3328	0.4662	0.4664	27.14
1.289	11.0	2750	1.5500	0.5111	0.3329	0.4687	0.4687	27.444
1.191	12.0	3000	1.5660	0.5141	0.3345	0.4704	0.4703	27.397
1.191	13.0	3250	1.5731	0.5168	0.3389	0.4735	0.4736	27.4535
1.107	14.0	3500	1.5926	0.5158	0.3357	0.4709	0.4708	28.82
1.107	15.0	3750	1.6107	0.5158	0.3406	0.473	0.4734	28.3135
1.036	16.0	4000	1.6205	0.5187	0.3411	0.4742	0.4744	28.9715
1.036	17.0	4250	1.6467	0.5142	0.3378	0.4701	0.4702	28.81
0.9655	18.0	4500	1.6670	0.5192	0.3426	0.4748	0.4751	28.266
0.9655	19.0	4750	1.6715	0.5154	0.3373	0.4695	0.4694	29.8395
0.9055	20.0	5000	1.6824	0.5156	0.3388	0.4715	0.4721	28.653
0.9055	21.0	5250	1.7156	0.5164	0.3384	0.4708	0.4712	30.2485
0.8519	22.0	5500	1.7239	0.5164	0.3404	0.4733	0.4735	28.5295
0.8519	23.0	5750	1.7292	0.5169	0.3374	0.4716	0.4718	29.1895
0.8069	24.0	6000	1.7591	0.5168	0.3369	0.4703	0.4707	29.9035
0.8069	25.0	6250	1.7733	0.5146	0.3355	0.4689	0.4692	29.533
0.764	26.0	6500	1.7963	0.5172	0.3388	0.4716	0.4721	30.0075
0.764	27.0	6750	1.8136	0.5173	0.3385	0.471	0.4714	29.672
0.7256	28.0	7000	1.8317	0.5153	0.3361	0.4698	0.4702	30.5335
0.7256	29.0	7250	1.8478	0.5136	0.336	0.4686	0.469	30.654
0.6901	30.0	7500	1.8709	0.5169	0.338	0.472	0.4724	29.7215
0.6901	31.0	7750	1.8733	0.5153	0.3364	0.4694	0.4698	30.3385
0.6617	32.0	8000	1.8882	0.5137	0.3369	0.4692	0.4692	29.8545
0.6617	33.0	8250	1.9176	0.5144	0.3354	0.4689	0.4692	30.489
0.6331	34.0	8500	1.9219	0.517	0.3391	0.472	0.4723	30.3225
0.6331	35.0	8750	1.9272	0.5146	0.3367	0.469	0.4695	30.647
0.6106	36.0	9000	1.9468	0.512	0.3329	0.4658	0.466	31.4695
0.6106	37.0	9250	1.9650	0.5143	0.3345	0.4682	0.4685	31.2565
0.5914	38.0	9500	1.9666	0.5163	0.3367	0.4705	0.4708	30.9375
0.5914	39.0	9750	1.9788	0.5134	0.3351	0.468	0.4683	30.297
0.5722	40.0	10000	1.9985	0.5118	0.3331	0.4659	0.4662	31.1015
0.5722	41.0	10250	2.0013	0.5137	0.3341	0.4671	0.4676	30.8835
0.5571	42.0	10500	2.0087	0.513	0.333	0.4666	0.467	31.094
0.5571	43.0	10750	2.0196	0.5155	0.3361	0.4682	0.4684	31.0515
0.5466	44.0	11000	2.0221	0.5143	0.3349	0.4674	0.4678	31.1495
0.5466	45.0	11250	2.0275	0.5146	0.3353	0.4672	0.4676	31.1845
0.5355	46.0	11500	2.0311	0.5134	0.3344	0.4662	0.4665	30.9715
0.5355	47.0	11750	2.0410	0.5141	0.3345	0.4657	0.466	31.6285
0.5305	48.0	12000	2.0415	0.5154	0.3359	0.467	0.4672	31.3345
0.5305	49.0	12250	2.0424	0.5157	0.3358	0.4677	0.4678	31.033
0.5256	50.0	12500	2.0450	0.5157	0.3356	0.4671	0.4673	31.344

Framework versions

Transformers 4.45.2
Pytorch 2.2.1
Datasets 3.0.1
Tokenizers 0.20.1

zera09
/

long_t5_6

long_t5_6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for zera09/long_t5_6

Evaluation results