End of training

6827db8 about 1 year ago

4.18 kB

	---
	base_model: ai-forever/rugpt3medium_based_on_gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: my_rugpt3medium_finetune
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# my_rugpt3medium_finetune

	This model is a fine-tuned version of [ai-forever/rugpt3medium_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3medium_based_on_gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 4.3387

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 3
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 25
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 10.916 \| 0.46 \| 25 \| 10.6340 \|
	\| 10.3795 \| 0.92 \| 50 \| 9.9985 \|
	\| 9.9003 \| 1.38 \| 75 \| 9.7015 \|
	\| 9.6822 \| 1.84 \| 100 \| 9.5795 \|
	\| 9.5804 \| 2.3 \| 125 \| 9.5130 \|
	\| 9.5294 \| 2.76 \| 150 \| 9.4485 \|
	\| 9.439 \| 3.22 \| 175 \| 9.3772 \|
	\| 9.3698 \| 3.68 \| 200 \| 9.2804 \|
	\| 9.2964 \| 4.14 \| 225 \| 9.1746 \|
	\| 9.1945 \| 4.6 \| 250 \| 9.0623 \|
	\| 9.0492 \| 5.06 \| 275 \| 8.9352 \|
	\| 8.9521 \| 5.52 \| 300 \| 8.8157 \|
	\| 8.8634 \| 5.98 \| 325 \| 8.6838 \|
	\| 8.7197 \| 6.44 \| 350 \| 8.5445 \|
	\| 8.6485 \| 6.9 \| 375 \| 8.4181 \|
	\| 8.522 \| 7.36 \| 400 \| 8.2732 \|
	\| 8.4227 \| 7.82 \| 425 \| 8.1704 \|
	\| 8.3083 \| 8.28 \| 450 \| 8.0290 \|
	\| 8.1897 \| 8.74 \| 475 \| 7.8989 \|
	\| 8.0876 \| 9.2 \| 500 \| 7.7778 \|
	\| 7.9824 \| 9.66 \| 525 \| 7.6368 \|
	\| 7.8762 \| 10.12 \| 550 \| 7.4974 \|
	\| 7.7408 \| 10.58 \| 575 \| 7.3658 \|
	\| 7.6855 \| 11.04 \| 600 \| 7.2416 \|
	\| 7.5163 \| 11.5 \| 625 \| 7.1291 \|
	\| 7.5079 \| 11.96 \| 650 \| 7.0295 \|
	\| 7.2873 \| 12.42 \| 675 \| 6.8522 \|
	\| 7.2856 \| 12.88 \| 700 \| 6.7573 \|
	\| 7.0868 \| 13.34 \| 725 \| 6.6651 \|
	\| 7.0886 \| 13.8 \| 750 \| 6.5239 \|
	\| 6.9283 \| 14.26 \| 775 \| 6.3561 \|
	\| 6.8257 \| 14.72 \| 800 \| 6.2392 \|
	\| 6.7328 \| 15.18 \| 825 \| 6.1004 \|
	\| 6.6153 \| 15.64 \| 850 \| 5.9846 \|
	\| 6.5824 \| 16.1 \| 875 \| 5.8627 \|
	\| 6.3905 \| 16.56 \| 900 \| 5.7724 \|
	\| 6.359 \| 17.02 \| 925 \| 5.6321 \|
	\| 6.1679 \| 17.48 \| 950 \| 5.5329 \|
	\| 6.1526 \| 17.94 \| 975 \| 5.4058 \|
	\| 5.9604 \| 18.4 \| 1000 \| 5.3046 \|
	\| 5.9669 \| 18.87 \| 1025 \| 5.1939 \|
	\| 5.6807 \| 19.33 \| 1050 \| 5.0499 \|
	\| 5.7445 \| 19.79 \| 1075 \| 4.9479 \|
	\| 5.6578 \| 20.25 \| 1100 \| 4.8343 \|
	\| 5.4919 \| 20.71 \| 1125 \| 4.7547 \|
	\| 5.4427 \| 21.17 \| 1150 \| 4.6506 \|
	\| 5.3212 \| 21.63 \| 1175 \| 4.5628 \|
	\| 5.2953 \| 22.09 \| 1200 \| 4.4814 \|
	\| 5.1872 \| 22.55 \| 1225 \| 4.4373 \|
	\| 5.1285 \| 23.01 \| 1250 \| 4.3966 \|
	\| 5.047 \| 23.47 \| 1275 \| 4.3611 \|
	\| 5.0698 \| 23.93 \| 1300 \| 4.3520 \|
	\| 5.1259 \| 24.39 \| 1325 \| 4.3408 \|
	\| 4.9851 \| 24.85 \| 1350 \| 4.3387 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.0
	- Tokenizers 0.15.0

	---
	base_model: ai-forever/rugpt3medium_based_on_gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: my_rugpt3medium_finetune
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# my_rugpt3medium_finetune

	This model is a fine-tuned version of [ai-forever/rugpt3medium_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3medium_based_on_gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 4.3387

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 3
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 25
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 10.916 \| 0.46 \| 25 \| 10.6340 \|
	\| 10.3795 \| 0.92 \| 50 \| 9.9985 \|
	\| 9.9003 \| 1.38 \| 75 \| 9.7015 \|
	\| 9.6822 \| 1.84 \| 100 \| 9.5795 \|
	\| 9.5804 \| 2.3 \| 125 \| 9.5130 \|
	\| 9.5294 \| 2.76 \| 150 \| 9.4485 \|
	\| 9.439 \| 3.22 \| 175 \| 9.3772 \|
	\| 9.3698 \| 3.68 \| 200 \| 9.2804 \|
	\| 9.2964 \| 4.14 \| 225 \| 9.1746 \|
	\| 9.1945 \| 4.6 \| 250 \| 9.0623 \|
	\| 9.0492 \| 5.06 \| 275 \| 8.9352 \|
	\| 8.9521 \| 5.52 \| 300 \| 8.8157 \|
	\| 8.8634 \| 5.98 \| 325 \| 8.6838 \|
	\| 8.7197 \| 6.44 \| 350 \| 8.5445 \|
	\| 8.6485 \| 6.9 \| 375 \| 8.4181 \|
	\| 8.522 \| 7.36 \| 400 \| 8.2732 \|
	\| 8.4227 \| 7.82 \| 425 \| 8.1704 \|
	\| 8.3083 \| 8.28 \| 450 \| 8.0290 \|
	\| 8.1897 \| 8.74 \| 475 \| 7.8989 \|
	\| 8.0876 \| 9.2 \| 500 \| 7.7778 \|
	\| 7.9824 \| 9.66 \| 525 \| 7.6368 \|
	\| 7.8762 \| 10.12 \| 550 \| 7.4974 \|
	\| 7.7408 \| 10.58 \| 575 \| 7.3658 \|
	\| 7.6855 \| 11.04 \| 600 \| 7.2416 \|
	\| 7.5163 \| 11.5 \| 625 \| 7.1291 \|
	\| 7.5079 \| 11.96 \| 650 \| 7.0295 \|
	\| 7.2873 \| 12.42 \| 675 \| 6.8522 \|
	\| 7.2856 \| 12.88 \| 700 \| 6.7573 \|
	\| 7.0868 \| 13.34 \| 725 \| 6.6651 \|
	\| 7.0886 \| 13.8 \| 750 \| 6.5239 \|
	\| 6.9283 \| 14.26 \| 775 \| 6.3561 \|
	\| 6.8257 \| 14.72 \| 800 \| 6.2392 \|
	\| 6.7328 \| 15.18 \| 825 \| 6.1004 \|
	\| 6.6153 \| 15.64 \| 850 \| 5.9846 \|
	\| 6.5824 \| 16.1 \| 875 \| 5.8627 \|
	\| 6.3905 \| 16.56 \| 900 \| 5.7724 \|
	\| 6.359 \| 17.02 \| 925 \| 5.6321 \|
	\| 6.1679 \| 17.48 \| 950 \| 5.5329 \|
	\| 6.1526 \| 17.94 \| 975 \| 5.4058 \|
	\| 5.9604 \| 18.4 \| 1000 \| 5.3046 \|
	\| 5.9669 \| 18.87 \| 1025 \| 5.1939 \|
	\| 5.6807 \| 19.33 \| 1050 \| 5.0499 \|
	\| 5.7445 \| 19.79 \| 1075 \| 4.9479 \|
	\| 5.6578 \| 20.25 \| 1100 \| 4.8343 \|
	\| 5.4919 \| 20.71 \| 1125 \| 4.7547 \|
	\| 5.4427 \| 21.17 \| 1150 \| 4.6506 \|
	\| 5.3212 \| 21.63 \| 1175 \| 4.5628 \|
	\| 5.2953 \| 22.09 \| 1200 \| 4.4814 \|
	\| 5.1872 \| 22.55 \| 1225 \| 4.4373 \|
	\| 5.1285 \| 23.01 \| 1250 \| 4.3966 \|
	\| 5.047 \| 23.47 \| 1275 \| 4.3611 \|
	\| 5.0698 \| 23.93 \| 1300 \| 4.3520 \|
	\| 5.1259 \| 24.39 \| 1325 \| 4.3408 \|
	\| 4.9851 \| 24.85 \| 1350 \| 4.3387 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.0
	- Tokenizers 0.15.0