long_t5_3

This model is a fine-tuned version of google/long-t5-tglobal-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.1612
Rouge1: 0.5309
Rouge2: 0.3406
Rougel: 0.4779
Rougelsum: 0.4778
Gen Len: 30.6175

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
2.0161	1.0	1000	1.5665	0.4911	0.3059	0.4451	0.4451	25.5255
1.7658	2.0	2000	1.5150	0.5026	0.3142	0.4559	0.4557	26.8015
1.5969	3.0	3000	1.5031	0.51	0.3238	0.4628	0.4626	26.0075
1.4638	4.0	4000	1.5048	0.5189	0.3348	0.4724	0.4724	26.878
1.3675	5.0	5000	1.5363	0.5233	0.3369	0.4769	0.477	27.204
1.249	6.0	6000	1.5550	0.5206	0.3376	0.4762	0.4759	25.569
1.1861	7.0	7000	1.5511	0.5283	0.3444	0.4825	0.4824	26.8355
1.0985	8.0	8000	1.5838	0.5284	0.342	0.4792	0.4792	28.631
1.0178	9.0	9000	1.6231	0.5331	0.3451	0.4827	0.4828	28.7125
0.9649	10.0	10000	1.6392	0.5262	0.3384	0.4762	0.4762	29.0855
0.9069	11.0	11000	1.6758	0.5307	0.3421	0.4808	0.4804	28.9355
0.8472	12.0	12000	1.7137	0.5304	0.3458	0.481	0.4809	29.29
0.8087	13.0	13000	1.7478	0.5287	0.342	0.4789	0.4786	29.5185
0.773	14.0	14000	1.7628	0.5302	0.3436	0.4801	0.4801	29.725
0.7271	15.0	15000	1.8112	0.5293	0.3418	0.4789	0.4786	30.188
0.6919	16.0	16000	1.8520	0.5293	0.342	0.4778	0.4778	30.4125
0.665	17.0	17000	1.8738	0.5341	0.3432	0.4821	0.482	29.534
0.6242	18.0	18000	1.9228	0.5314	0.3439	0.4793	0.4792	29.2675
0.6024	19.0	19000	1.9288	0.535	0.347	0.4824	0.4823	29.852
0.5791	20.0	20000	1.9614	0.531	0.3417	0.4793	0.4791	29.754
0.5445	21.0	21000	2.0021	0.5302	0.3411	0.4784	0.4783	31.0095
0.5355	22.0	22000	2.0283	0.5318	0.3432	0.4792	0.4794	30.2985
0.5172	23.0	23000	2.0588	0.5296	0.3413	0.4775	0.4774	30.463
0.4968	24.0	24000	2.0907	0.5311	0.3423	0.4781	0.478	31.0295
0.4821	25.0	25000	2.0964	0.5318	0.3428	0.4792	0.4793	30.8365
0.4727	26.0	26000	2.1195	0.5317	0.3424	0.4789	0.4788	30.391
0.458	27.0	27000	2.1357	0.5301	0.3391	0.4761	0.4761	30.9145
0.4454	28.0	28000	2.1648	0.531	0.3409	0.4774	0.4774	31.1835
0.444	29.0	29000	2.1570	0.532	0.3418	0.4792	0.4791	30.596
0.4349	30.0	30000	2.1612	0.5309	0.3406	0.4779	0.4778	30.6175

Framework versions

Transformers 4.45.1
Pytorch 2.2.1
Datasets 3.0.1
Tokenizers 0.20.0

zera09
/

long_t5_3

long_t5_3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for zera09/long_t5_3

Evaluation results