End of training

5a09957 verified 8 months ago

4.84 kB

	---
	license: apache-2.0
	base_model: google/mt5-large
	tags:
	- generated_from_trainer
	metrics:
	- bleu
	model-index:
	- name: cs_mT5-large2_2e-5_50_v0.3
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# cs_mT5-large2_2e-5_50_v0.3

	This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 10.7179
	- Bleu: 8.2299
	- Gen Len: 19.0

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Bleu \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:-------:\|
	\| 22.4209 \| 1.0 \| 6 \| 14.7377 \| 5.6697 \| 19.0 \|
	\| 20.671 \| 2.0 \| 12 \| 14.9637 \| 6.7619 \| 19.0 \|
	\| 17.7208 \| 3.0 \| 18 \| 14.6564 \| 5.3777 \| 19.0 \|
	\| 22.9549 \| 4.0 \| 24 \| 15.1568 \| 6.7736 \| 19.0 \|
	\| 16.6185 \| 5.0 \| 30 \| 14.1533 \| 7.0263 \| 19.0 \|
	\| 22.1158 \| 6.0 \| 36 \| 15.0667 \| 7.1851 \| 19.0 \|
	\| 24.587 \| 7.0 \| 42 \| 15.5166 \| 7.6752 \| 19.0 \|
	\| 16.4955 \| 8.0 \| 48 \| 14.5515 \| 7.5521 \| 19.0 \|
	\| 21.0521 \| 9.0 \| 54 \| 13.0890 \| 7.7939 \| 19.0 \|
	\| 16.1149 \| 10.0 \| 60 \| 11.8305 \| 7.7866 \| 19.0 \|
	\| 12.8454 \| 11.0 \| 66 \| 11.8727 \| 7.7197 \| 19.0 \|
	\| 18.482 \| 12.0 \| 72 \| 11.6011 \| 7.5761 \| 19.0 \|
	\| 18.6175 \| 13.0 \| 78 \| 11.8911 \| 7.7925 \| 19.0 \|
	\| 12.6805 \| 14.0 \| 84 \| 11.8462 \| 7.3764 \| 19.0 \|
	\| 14.3151 \| 15.0 \| 90 \| 11.4554 \| 7.6604 \| 19.0 \|
	\| 17.2287 \| 16.0 \| 96 \| 11.1727 \| 8.0204 \| 19.0 \|
	\| 16.3546 \| 17.0 \| 102 \| 10.7514 \| 8.0859 \| 19.0 \|
	\| 16.3339 \| 18.0 \| 108 \| 11.1960 \| 8.1381 \| 19.0 \|
	\| 16.6065 \| 19.0 \| 114 \| 11.3321 \| 8.126 \| 19.0 \|
	\| 14.3851 \| 20.0 \| 120 \| 10.9074 \| 6.3032 \| 19.0 \|
	\| 15.8189 \| 21.0 \| 126 \| 10.5179 \| 6.3626 \| 19.0 \|
	\| 8.4543 \| 22.0 \| 132 \| 10.6037 \| 6.3223 \| 19.0 \|
	\| 18.0304 \| 23.0 \| 138 \| 10.3665 \| 6.236 \| 19.0 \|
	\| 13.1475 \| 24.0 \| 144 \| 10.3107 \| 7.4434 \| 19.0 \|
	\| 21.3407 \| 25.0 \| 150 \| 10.2976 \| 7.4596 \| 19.0 \|
	\| 15.8901 \| 26.0 \| 156 \| 10.4723 \| 7.2047 \| 19.0 \|
	\| 13.3029 \| 27.0 \| 162 \| 10.7863 \| 7.2047 \| 19.0 \|
	\| 9.6205 \| 28.0 \| 168 \| 11.2429 \| 7.2047 \| 19.0 \|
	\| 15.4244 \| 29.0 \| 174 \| 11.5663 \| 7.1797 \| 19.0 \|
	\| 10.8496 \| 30.0 \| 180 \| 11.9665 \| 7.1839 \| 19.0 \|
	\| 16.4213 \| 31.0 \| 186 \| 12.3102 \| 7.1002 \| 19.0 \|
	\| 19.9358 \| 32.0 \| 192 \| 12.3951 \| 7.1693 \| 19.0 \|
	\| 13.9974 \| 33.0 \| 198 \| 12.6037 \| 7.1693 \| 19.0 \|
	\| 18.1208 \| 34.0 \| 204 \| 12.4725 \| 7.0996 \| 19.0 \|
	\| 10.2059 \| 35.0 \| 210 \| 12.1561 \| 7.286 \| 19.0 \|
	\| 15.9016 \| 36.0 \| 216 \| 11.9896 \| 7.286 \| 19.0 \|
	\| 16.7008 \| 37.0 \| 222 \| 11.4571 \| 8.4159 \| 19.0 \|
	\| 14.4533 \| 38.0 \| 228 \| 11.1535 \| 8.4159 \| 19.0 \|
	\| 15.1107 \| 39.0 \| 234 \| 11.1553 \| 8.4159 \| 19.0 \|
	\| 13.2587 \| 40.0 \| 240 \| 11.0539 \| 7.2709 \| 19.0 \|
	\| 14.9836 \| 41.0 \| 246 \| 11.3945 \| 7.1574 \| 19.0 \|
	\| 13.083 \| 42.0 \| 252 \| 11.3690 \| 7.1948 \| 19.0 \|
	\| 24.9864 \| 43.0 \| 258 \| 11.2586 \| 8.2299 \| 19.0 \|
	\| 22.1657 \| 44.0 \| 264 \| 11.1126 \| 8.2299 \| 19.0 \|
	\| 15.6887 \| 45.0 \| 270 \| 11.0112 \| 8.2299 \| 19.0 \|
	\| 8.581 \| 46.0 \| 276 \| 10.8892 \| 8.2299 \| 19.0 \|
	\| 14.0141 \| 47.0 \| 282 \| 10.8514 \| 8.2299 \| 19.0 \|
	\| 11.8402 \| 48.0 \| 288 \| 10.8129 \| 8.2299 \| 19.0 \|
	\| 14.7845 \| 49.0 \| 294 \| 10.7252 \| 8.2299 \| 19.0 \|
	\| 18.8443 \| 50.0 \| 300 \| 10.7179 \| 8.2299 \| 19.0 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2

	---
	license: apache-2.0
	base_model: google/mt5-large
	tags:
	- generated_from_trainer
	metrics:
	- bleu
	model-index:
	- name: cs_mT5-large2_2e-5_50_v0.3
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# cs_mT5-large2_2e-5_50_v0.3

	This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 10.7179
	- Bleu: 8.2299
	- Gen Len: 19.0

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Bleu \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:-------:\|
	\| 22.4209 \| 1.0 \| 6 \| 14.7377 \| 5.6697 \| 19.0 \|
	\| 20.671 \| 2.0 \| 12 \| 14.9637 \| 6.7619 \| 19.0 \|
	\| 17.7208 \| 3.0 \| 18 \| 14.6564 \| 5.3777 \| 19.0 \|
	\| 22.9549 \| 4.0 \| 24 \| 15.1568 \| 6.7736 \| 19.0 \|
	\| 16.6185 \| 5.0 \| 30 \| 14.1533 \| 7.0263 \| 19.0 \|
	\| 22.1158 \| 6.0 \| 36 \| 15.0667 \| 7.1851 \| 19.0 \|
	\| 24.587 \| 7.0 \| 42 \| 15.5166 \| 7.6752 \| 19.0 \|
	\| 16.4955 \| 8.0 \| 48 \| 14.5515 \| 7.5521 \| 19.0 \|
	\| 21.0521 \| 9.0 \| 54 \| 13.0890 \| 7.7939 \| 19.0 \|
	\| 16.1149 \| 10.0 \| 60 \| 11.8305 \| 7.7866 \| 19.0 \|
	\| 12.8454 \| 11.0 \| 66 \| 11.8727 \| 7.7197 \| 19.0 \|
	\| 18.482 \| 12.0 \| 72 \| 11.6011 \| 7.5761 \| 19.0 \|
	\| 18.6175 \| 13.0 \| 78 \| 11.8911 \| 7.7925 \| 19.0 \|
	\| 12.6805 \| 14.0 \| 84 \| 11.8462 \| 7.3764 \| 19.0 \|
	\| 14.3151 \| 15.0 \| 90 \| 11.4554 \| 7.6604 \| 19.0 \|
	\| 17.2287 \| 16.0 \| 96 \| 11.1727 \| 8.0204 \| 19.0 \|
	\| 16.3546 \| 17.0 \| 102 \| 10.7514 \| 8.0859 \| 19.0 \|
	\| 16.3339 \| 18.0 \| 108 \| 11.1960 \| 8.1381 \| 19.0 \|
	\| 16.6065 \| 19.0 \| 114 \| 11.3321 \| 8.126 \| 19.0 \|
	\| 14.3851 \| 20.0 \| 120 \| 10.9074 \| 6.3032 \| 19.0 \|
	\| 15.8189 \| 21.0 \| 126 \| 10.5179 \| 6.3626 \| 19.0 \|
	\| 8.4543 \| 22.0 \| 132 \| 10.6037 \| 6.3223 \| 19.0 \|
	\| 18.0304 \| 23.0 \| 138 \| 10.3665 \| 6.236 \| 19.0 \|
	\| 13.1475 \| 24.0 \| 144 \| 10.3107 \| 7.4434 \| 19.0 \|
	\| 21.3407 \| 25.0 \| 150 \| 10.2976 \| 7.4596 \| 19.0 \|
	\| 15.8901 \| 26.0 \| 156 \| 10.4723 \| 7.2047 \| 19.0 \|
	\| 13.3029 \| 27.0 \| 162 \| 10.7863 \| 7.2047 \| 19.0 \|
	\| 9.6205 \| 28.0 \| 168 \| 11.2429 \| 7.2047 \| 19.0 \|
	\| 15.4244 \| 29.0 \| 174 \| 11.5663 \| 7.1797 \| 19.0 \|
	\| 10.8496 \| 30.0 \| 180 \| 11.9665 \| 7.1839 \| 19.0 \|
	\| 16.4213 \| 31.0 \| 186 \| 12.3102 \| 7.1002 \| 19.0 \|
	\| 19.9358 \| 32.0 \| 192 \| 12.3951 \| 7.1693 \| 19.0 \|
	\| 13.9974 \| 33.0 \| 198 \| 12.6037 \| 7.1693 \| 19.0 \|
	\| 18.1208 \| 34.0 \| 204 \| 12.4725 \| 7.0996 \| 19.0 \|
	\| 10.2059 \| 35.0 \| 210 \| 12.1561 \| 7.286 \| 19.0 \|
	\| 15.9016 \| 36.0 \| 216 \| 11.9896 \| 7.286 \| 19.0 \|
	\| 16.7008 \| 37.0 \| 222 \| 11.4571 \| 8.4159 \| 19.0 \|
	\| 14.4533 \| 38.0 \| 228 \| 11.1535 \| 8.4159 \| 19.0 \|
	\| 15.1107 \| 39.0 \| 234 \| 11.1553 \| 8.4159 \| 19.0 \|
	\| 13.2587 \| 40.0 \| 240 \| 11.0539 \| 7.2709 \| 19.0 \|
	\| 14.9836 \| 41.0 \| 246 \| 11.3945 \| 7.1574 \| 19.0 \|
	\| 13.083 \| 42.0 \| 252 \| 11.3690 \| 7.1948 \| 19.0 \|
	\| 24.9864 \| 43.0 \| 258 \| 11.2586 \| 8.2299 \| 19.0 \|
	\| 22.1657 \| 44.0 \| 264 \| 11.1126 \| 8.2299 \| 19.0 \|
	\| 15.6887 \| 45.0 \| 270 \| 11.0112 \| 8.2299 \| 19.0 \|
	\| 8.581 \| 46.0 \| 276 \| 10.8892 \| 8.2299 \| 19.0 \|
	\| 14.0141 \| 47.0 \| 282 \| 10.8514 \| 8.2299 \| 19.0 \|
	\| 11.8402 \| 48.0 \| 288 \| 10.8129 \| 8.2299 \| 19.0 \|
	\| 14.7845 \| 49.0 \| 294 \| 10.7252 \| 8.2299 \| 19.0 \|
	\| 18.8443 \| 50.0 \| 300 \| 10.7179 \| 8.2299 \| 19.0 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2