my_rugpt3medium_finetune / README.md

C0uchP0tat0

End of training

d2c772a 9 months ago

preview code

raw

history blame contribute delete

No virus

5.25 kB

	---
	base_model: ai-forever/rugpt3medium_based_on_gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: my_rugpt3medium_finetune
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# my_rugpt3medium_finetune

	This model is a fine-tuned version of [ai-forever/rugpt3medium_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3medium_based_on_gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.9955

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 3
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 35
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 3.5373 \| 0.46 \| 25 \| 3.4828 \|
	\| 3.5265 \| 0.93 \| 50 \| 3.4708 \|
	\| 3.478 \| 1.39 \| 75 \| 3.4398 \|
	\| 3.4851 \| 1.85 \| 100 \| 3.3995 \|
	\| 3.4407 \| 2.31 \| 125 \| 3.3609 \|
	\| 3.3731 \| 2.78 \| 150 \| 3.3241 \|
	\| 3.3584 \| 3.24 \| 175 \| 3.2886 \|
	\| 3.3267 \| 3.7 \| 200 \| 3.2540 \|
	\| 3.3043 \| 4.17 \| 225 \| 3.2200 \|
	\| 3.229 \| 4.63 \| 250 \| 3.1853 \|
	\| 3.2618 \| 5.09 \| 275 \| 3.1508 \|
	\| 3.1823 \| 5.56 \| 300 \| 3.1164 \|
	\| 3.172 \| 6.02 \| 325 \| 3.0779 \|
	\| 3.1354 \| 6.48 \| 350 \| 3.0395 \|
	\| 3.0899 \| 6.94 \| 375 \| 2.9987 \|
	\| 3.0741 \| 7.41 \| 400 \| 2.9577 \|
	\| 3.009 \| 7.87 \| 425 \| 2.9140 \|
	\| 2.9598 \| 8.33 \| 450 \| 2.8737 \|
	\| 2.9187 \| 8.8 \| 475 \| 2.8294 \|
	\| 2.9378 \| 9.26 \| 500 \| 2.7842 \|
	\| 2.8396 \| 9.72 \| 525 \| 2.7374 \|
	\| 2.8608 \| 10.19 \| 550 \| 2.6889 \|
	\| 2.7296 \| 10.65 \| 575 \| 2.6405 \|
	\| 2.7452 \| 11.11 \| 600 \| 2.5926 \|
	\| 2.6882 \| 11.57 \| 625 \| 2.5389 \|
	\| 2.6463 \| 12.04 \| 650 \| 2.4893 \|
	\| 2.572 \| 12.5 \| 675 \| 2.4356 \|
	\| 2.5384 \| 12.96 \| 700 \| 2.3788 \|
	\| 2.5246 \| 13.43 \| 725 \| 2.3296 \|
	\| 2.4055 \| 13.89 \| 750 \| 2.2747 \|
	\| 2.3759 \| 14.35 \| 775 \| 2.2155 \|
	\| 2.3351 \| 14.81 \| 800 \| 2.1606 \|
	\| 2.286 \| 15.28 \| 825 \| 2.1061 \|
	\| 2.2694 \| 15.74 \| 850 \| 2.0504 \|
	\| 2.1745 \| 16.2 \| 875 \| 1.9967 \|
	\| 2.1053 \| 16.67 \| 900 \| 1.9411 \|
	\| 2.1184 \| 17.13 \| 925 \| 1.8878 \|
	\| 2.0107 \| 17.59 \| 950 \| 1.8362 \|
	\| 2.027 \| 18.06 \| 975 \| 1.7854 \|
	\| 1.9153 \| 18.52 \| 1000 \| 1.7304 \|
	\| 1.9267 \| 18.98 \| 1025 \| 1.6854 \|
	\| 1.8131 \| 19.44 \| 1050 \| 1.6331 \|
	\| 1.8405 \| 19.91 \| 1075 \| 1.5839 \|
	\| 1.7294 \| 20.37 \| 1100 \| 1.5370 \|
	\| 1.7154 \| 20.83 \| 1125 \| 1.4971 \|
	\| 1.6573 \| 21.3 \| 1150 \| 1.4476 \|
	\| 1.6391 \| 21.76 \| 1175 \| 1.4130 \|
	\| 1.5497 \| 22.22 \| 1200 \| 1.3727 \|
	\| 1.5194 \| 22.69 \| 1225 \| 1.3378 \|
	\| 1.535 \| 23.15 \| 1250 \| 1.3000 \|
	\| 1.4514 \| 23.61 \| 1275 \| 1.2714 \|
	\| 1.4711 \| 24.07 \| 1300 \| 1.2388 \|
	\| 1.4105 \| 24.54 \| 1325 \| 1.2136 \|
	\| 1.4202 \| 25.0 \| 1350 \| 1.1890 \|
	\| 1.3351 \| 25.46 \| 1375 \| 1.1679 \|
	\| 1.3575 \| 25.93 \| 1400 \| 1.1440 \|
	\| 1.2882 \| 26.39 \| 1425 \| 1.1202 \|
	\| 1.3378 \| 26.85 \| 1450 \| 1.1074 \|
	\| 1.3094 \| 27.31 \| 1475 \| 1.0864 \|
	\| 1.2793 \| 27.78 \| 1500 \| 1.0743 \|
	\| 1.2377 \| 28.24 \| 1525 \| 1.0626 \|
	\| 1.2693 \| 28.7 \| 1550 \| 1.0468 \|
	\| 1.2157 \| 29.17 \| 1575 \| 1.0368 \|
	\| 1.2007 \| 29.63 \| 1600 \| 1.0263 \|
	\| 1.2376 \| 30.09 \| 1625 \| 1.0221 \|
	\| 1.2216 \| 30.56 \| 1650 \| 1.0136 \|
	\| 1.1923 \| 31.02 \| 1675 \| 1.0102 \|
	\| 1.2143 \| 31.48 \| 1700 \| 1.0039 \|
	\| 1.1764 \| 31.94 \| 1725 \| 1.0014 \|
	\| 1.1654 \| 32.41 \| 1750 \| 0.9990 \|
	\| 1.2031 \| 32.87 \| 1775 \| 0.9976 \|
	\| 1.1952 \| 33.33 \| 1800 \| 0.9965 \|
	\| 1.1852 \| 33.8 \| 1825 \| 0.9961 \|
	\| 1.1737 \| 34.26 \| 1850 \| 0.9959 \|
	\| 1.1609 \| 34.72 \| 1875 \| 0.9955 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.0
	- Tokenizers 0.15.0

	---
	base_model: ai-forever/rugpt3medium_based_on_gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: my_rugpt3medium_finetune
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# my_rugpt3medium_finetune

	This model is a fine-tuned version of [ai-forever/rugpt3medium_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3medium_based_on_gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.9955

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 3
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 35
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 3.5373 \| 0.46 \| 25 \| 3.4828 \|
	\| 3.5265 \| 0.93 \| 50 \| 3.4708 \|
	\| 3.478 \| 1.39 \| 75 \| 3.4398 \|
	\| 3.4851 \| 1.85 \| 100 \| 3.3995 \|
	\| 3.4407 \| 2.31 \| 125 \| 3.3609 \|
	\| 3.3731 \| 2.78 \| 150 \| 3.3241 \|
	\| 3.3584 \| 3.24 \| 175 \| 3.2886 \|
	\| 3.3267 \| 3.7 \| 200 \| 3.2540 \|
	\| 3.3043 \| 4.17 \| 225 \| 3.2200 \|
	\| 3.229 \| 4.63 \| 250 \| 3.1853 \|
	\| 3.2618 \| 5.09 \| 275 \| 3.1508 \|
	\| 3.1823 \| 5.56 \| 300 \| 3.1164 \|
	\| 3.172 \| 6.02 \| 325 \| 3.0779 \|
	\| 3.1354 \| 6.48 \| 350 \| 3.0395 \|
	\| 3.0899 \| 6.94 \| 375 \| 2.9987 \|
	\| 3.0741 \| 7.41 \| 400 \| 2.9577 \|
	\| 3.009 \| 7.87 \| 425 \| 2.9140 \|
	\| 2.9598 \| 8.33 \| 450 \| 2.8737 \|
	\| 2.9187 \| 8.8 \| 475 \| 2.8294 \|
	\| 2.9378 \| 9.26 \| 500 \| 2.7842 \|
	\| 2.8396 \| 9.72 \| 525 \| 2.7374 \|
	\| 2.8608 \| 10.19 \| 550 \| 2.6889 \|
	\| 2.7296 \| 10.65 \| 575 \| 2.6405 \|
	\| 2.7452 \| 11.11 \| 600 \| 2.5926 \|
	\| 2.6882 \| 11.57 \| 625 \| 2.5389 \|
	\| 2.6463 \| 12.04 \| 650 \| 2.4893 \|
	\| 2.572 \| 12.5 \| 675 \| 2.4356 \|
	\| 2.5384 \| 12.96 \| 700 \| 2.3788 \|
	\| 2.5246 \| 13.43 \| 725 \| 2.3296 \|
	\| 2.4055 \| 13.89 \| 750 \| 2.2747 \|
	\| 2.3759 \| 14.35 \| 775 \| 2.2155 \|
	\| 2.3351 \| 14.81 \| 800 \| 2.1606 \|
	\| 2.286 \| 15.28 \| 825 \| 2.1061 \|
	\| 2.2694 \| 15.74 \| 850 \| 2.0504 \|
	\| 2.1745 \| 16.2 \| 875 \| 1.9967 \|
	\| 2.1053 \| 16.67 \| 900 \| 1.9411 \|
	\| 2.1184 \| 17.13 \| 925 \| 1.8878 \|
	\| 2.0107 \| 17.59 \| 950 \| 1.8362 \|
	\| 2.027 \| 18.06 \| 975 \| 1.7854 \|
	\| 1.9153 \| 18.52 \| 1000 \| 1.7304 \|
	\| 1.9267 \| 18.98 \| 1025 \| 1.6854 \|
	\| 1.8131 \| 19.44 \| 1050 \| 1.6331 \|
	\| 1.8405 \| 19.91 \| 1075 \| 1.5839 \|
	\| 1.7294 \| 20.37 \| 1100 \| 1.5370 \|
	\| 1.7154 \| 20.83 \| 1125 \| 1.4971 \|
	\| 1.6573 \| 21.3 \| 1150 \| 1.4476 \|
	\| 1.6391 \| 21.76 \| 1175 \| 1.4130 \|
	\| 1.5497 \| 22.22 \| 1200 \| 1.3727 \|
	\| 1.5194 \| 22.69 \| 1225 \| 1.3378 \|
	\| 1.535 \| 23.15 \| 1250 \| 1.3000 \|
	\| 1.4514 \| 23.61 \| 1275 \| 1.2714 \|
	\| 1.4711 \| 24.07 \| 1300 \| 1.2388 \|
	\| 1.4105 \| 24.54 \| 1325 \| 1.2136 \|
	\| 1.4202 \| 25.0 \| 1350 \| 1.1890 \|
	\| 1.3351 \| 25.46 \| 1375 \| 1.1679 \|
	\| 1.3575 \| 25.93 \| 1400 \| 1.1440 \|
	\| 1.2882 \| 26.39 \| 1425 \| 1.1202 \|
	\| 1.3378 \| 26.85 \| 1450 \| 1.1074 \|
	\| 1.3094 \| 27.31 \| 1475 \| 1.0864 \|
	\| 1.2793 \| 27.78 \| 1500 \| 1.0743 \|
	\| 1.2377 \| 28.24 \| 1525 \| 1.0626 \|
	\| 1.2693 \| 28.7 \| 1550 \| 1.0468 \|
	\| 1.2157 \| 29.17 \| 1575 \| 1.0368 \|
	\| 1.2007 \| 29.63 \| 1600 \| 1.0263 \|
	\| 1.2376 \| 30.09 \| 1625 \| 1.0221 \|
	\| 1.2216 \| 30.56 \| 1650 \| 1.0136 \|
	\| 1.1923 \| 31.02 \| 1675 \| 1.0102 \|
	\| 1.2143 \| 31.48 \| 1700 \| 1.0039 \|
	\| 1.1764 \| 31.94 \| 1725 \| 1.0014 \|
	\| 1.1654 \| 32.41 \| 1750 \| 0.9990 \|
	\| 1.2031 \| 32.87 \| 1775 \| 0.9976 \|
	\| 1.1952 \| 33.33 \| 1800 \| 0.9965 \|
	\| 1.1852 \| 33.8 \| 1825 \| 0.9961 \|
	\| 1.1737 \| 34.26 \| 1850 \| 0.9959 \|
	\| 1.1609 \| 34.72 \| 1875 \| 0.9955 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.0
	- Tokenizers 0.15.0