Word-selector

This model is a fine-tuned version of google/long-t5-tglobal-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.5118
Rouge1: 0.3547
Rouge2: 0.0761
Rougel: 0.2663
Rougelsum: 0.2667
Gen Len: 25.195

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 12
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
No log	1.0	400	3.9221	0.2104	0.0295	0.1684	0.1684	34.3531
4.5051	2.0	800	3.7571	0.285	0.0449	0.2195	0.2197	20.2419
3.9507	3.0	1200	3.6847	0.2976	0.0513	0.2309	0.2311	22.7119
3.6575	4.0	1600	3.6350	0.3137	0.0595	0.2411	0.2411	25.9231
3.4177	5.0	2000	3.6229	0.3311	0.0636	0.2527	0.2527	22.3788
3.4177	6.0	2400	3.6223	0.3359	0.0658	0.254	0.2543	21.2994
3.1741	7.0	2800	3.6313	0.3453	0.0674	0.2617	0.2618	21.9181
3.013	8.0	3200	3.6278	0.3453	0.0689	0.2649	0.2651	22.93
2.8253	9.0	3600	3.6755	0.3511	0.0705	0.2658	0.2662	23.1806
2.6705	10.0	4000	3.7081	0.3509	0.0742	0.2663	0.2664	22.5356
2.6705	11.0	4400	3.7424	0.3528	0.0716	0.264	0.2643	23.3775
2.5081	12.0	4800	3.8135	0.3553	0.0753	0.2686	0.2686	22.985
2.3745	13.0	5200	3.8369	0.3548	0.0753	0.2671	0.2675	23.7719
2.2399	14.0	5600	3.8816	0.3591	0.0762	0.2708	0.2709	23.1612
2.1414	15.0	6000	3.9132	0.361	0.0781	0.2719	0.2721	24.4581
2.1414	16.0	6400	3.9946	0.3579	0.077	0.2715	0.2714	23.2131
2.0099	17.0	6800	4.0376	0.3595	0.0766	0.2701	0.2703	23.6681
1.9252	18.0	7200	4.0829	0.3576	0.0774	0.2691	0.2694	23.79
1.8406	19.0	7600	4.1218	0.3613	0.0776	0.2718	0.272	23.9888
1.7602	20.0	8000	4.1754	0.3588	0.0787	0.2702	0.2704	24.5425
1.7602	21.0	8400	4.2440	0.3602	0.0769	0.2716	0.2717	24.9531
1.6725	22.0	8800	4.2860	0.3581	0.0775	0.2688	0.2691	24.6638
1.6036	23.0	9200	4.3163	0.3582	0.0764	0.2697	0.27	24.5994
1.5572	24.0	9600	4.3655	0.3545	0.0749	0.2655	0.2658	25.145
1.5034	25.0	10000	4.3811	0.3583	0.0781	0.2695	0.2698	25.6856
1.5034	26.0	10400	4.4350	0.3593	0.0788	0.2691	0.2692	25.2394
1.4617	27.0	10800	4.4539	0.357	0.078	0.2686	0.269	25.2906
1.4175	28.0	11200	4.4785	0.3549	0.0757	0.2657	0.2661	25.62
1.3971	29.0	11600	4.5061	0.3567	0.0767	0.2661	0.2665	25.1988
1.3828	30.0	12000	4.5118	0.3547	0.0761	0.2663	0.2667	25.195

Framework versions

Transformers 4.37.2
Pytorch 2.1.1+cu121
Datasets 3.0.1
Tokenizers 0.15.1

zera09
/

Word-selector

Word-selector

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for zera09/Word-selector

Evaluation results