End of training

c891669 verified 5 months ago

3.91 kB

	---
	base_model: unsloth/mistral-7b-v0.3
	library_name: peft
	license: apache-2.0
	tags:
	- unsloth
	- generated_from_trainer
	model-index:
	- name: Mistral-7B-v0.3_pct_default
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Mistral-7B-v0.3_pct_default

	This model is a fine-tuned version of [unsloth/mistral-7b-v0.3](https://huggingface.co/unsloth/mistral-7b-v0.3) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 6.8426

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.02
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.2574 \| 0.0206 \| 8 \| 8.1127 \|
	\| 12.0384 \| 0.0413 \| 16 \| 8.6074 \|
	\| 8.2422 \| 0.0619 \| 24 \| 8.1200 \|
	\| 7.6855 \| 0.0825 \| 32 \| 7.6217 \|
	\| 7.676 \| 0.1032 \| 40 \| 7.6368 \|
	\| 7.636 \| 0.1238 \| 48 \| 7.5536 \|
	\| 7.5027 \| 0.1444 \| 56 \| 7.4853 \|
	\| 7.393 \| 0.1651 \| 64 \| 7.3495 \|
	\| 7.4878 \| 0.1857 \| 72 \| 7.3829 \|
	\| 7.4503 \| 0.2063 \| 80 \| 7.2955 \|
	\| 7.4405 \| 0.2270 \| 88 \| 7.2849 \|
	\| 7.3525 \| 0.2476 \| 96 \| 7.2125 \|
	\| 7.3442 \| 0.2682 \| 104 \| 7.2516 \|
	\| 7.292 \| 0.2888 \| 112 \| 7.2813 \|
	\| 7.2845 \| 0.3095 \| 120 \| 7.2147 \|
	\| 7.3309 \| 0.3301 \| 128 \| 7.1448 \|
	\| 7.165 \| 0.3507 \| 136 \| 7.1427 \|
	\| 7.1362 \| 0.3714 \| 144 \| 7.0595 \|
	\| 7.1956 \| 0.3920 \| 152 \| 7.2333 \|
	\| 7.1047 \| 0.4126 \| 160 \| 7.0622 \|
	\| 7.1466 \| 0.4333 \| 168 \| 7.0642 \|
	\| 7.0243 \| 0.4539 \| 176 \| 7.0605 \|
	\| 7.1814 \| 0.4745 \| 184 \| 7.0207 \|
	\| 7.1579 \| 0.4952 \| 192 \| 7.0191 \|
	\| 6.9988 \| 0.5158 \| 200 \| 7.0403 \|
	\| 7.0306 \| 0.5364 \| 208 \| 6.9673 \|
	\| 7.2037 \| 0.5571 \| 216 \| 6.9458 \|
	\| 7.0632 \| 0.5777 \| 224 \| 6.8305 \|
	\| 6.8916 \| 0.5983 \| 232 \| 6.8760 \|
	\| 6.929 \| 0.6190 \| 240 \| 6.8567 \|
	\| 6.927 \| 0.6396 \| 248 \| 6.9211 \|
	\| 7.0534 \| 0.6602 \| 256 \| 6.9313 \|
	\| 6.8807 \| 0.6809 \| 264 \| 7.0025 \|
	\| 7.0768 \| 0.7015 \| 272 \| 6.8808 \|
	\| 7.042 \| 0.7221 \| 280 \| 6.9264 \|
	\| 7.027 \| 0.7427 \| 288 \| 6.8833 \|
	\| 6.9575 \| 0.7634 \| 296 \| 6.8925 \|
	\| 6.9509 \| 0.7840 \| 304 \| 6.8662 \|
	\| 7.0361 \| 0.8046 \| 312 \| 6.9178 \|
	\| 7.0065 \| 0.8253 \| 320 \| 6.8844 \|
	\| 7.0016 \| 0.8459 \| 328 \| 6.8536 \|
	\| 7.0667 \| 0.8665 \| 336 \| 6.9255 \|
	\| 6.9046 \| 0.8872 \| 344 \| 6.8849 \|
	\| 6.8891 \| 0.9078 \| 352 \| 6.8567 \|
	\| 7.0118 \| 0.9284 \| 360 \| 6.8438 \|
	\| 6.901 \| 0.9491 \| 368 \| 6.8571 \|
	\| 7.0057 \| 0.9697 \| 376 \| 6.8454 \|
	\| 6.9415 \| 0.9903 \| 384 \| 6.8426 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1

	---
	base_model: unsloth/mistral-7b-v0.3
	library_name: peft
	license: apache-2.0
	tags:
	- unsloth
	- generated_from_trainer
	model-index:
	- name: Mistral-7B-v0.3_pct_default
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Mistral-7B-v0.3_pct_default

	This model is a fine-tuned version of [unsloth/mistral-7b-v0.3](https://huggingface.co/unsloth/mistral-7b-v0.3) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 6.8426

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.02
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.2574 \| 0.0206 \| 8 \| 8.1127 \|
	\| 12.0384 \| 0.0413 \| 16 \| 8.6074 \|
	\| 8.2422 \| 0.0619 \| 24 \| 8.1200 \|
	\| 7.6855 \| 0.0825 \| 32 \| 7.6217 \|
	\| 7.676 \| 0.1032 \| 40 \| 7.6368 \|
	\| 7.636 \| 0.1238 \| 48 \| 7.5536 \|
	\| 7.5027 \| 0.1444 \| 56 \| 7.4853 \|
	\| 7.393 \| 0.1651 \| 64 \| 7.3495 \|
	\| 7.4878 \| 0.1857 \| 72 \| 7.3829 \|
	\| 7.4503 \| 0.2063 \| 80 \| 7.2955 \|
	\| 7.4405 \| 0.2270 \| 88 \| 7.2849 \|
	\| 7.3525 \| 0.2476 \| 96 \| 7.2125 \|
	\| 7.3442 \| 0.2682 \| 104 \| 7.2516 \|
	\| 7.292 \| 0.2888 \| 112 \| 7.2813 \|
	\| 7.2845 \| 0.3095 \| 120 \| 7.2147 \|
	\| 7.3309 \| 0.3301 \| 128 \| 7.1448 \|
	\| 7.165 \| 0.3507 \| 136 \| 7.1427 \|
	\| 7.1362 \| 0.3714 \| 144 \| 7.0595 \|
	\| 7.1956 \| 0.3920 \| 152 \| 7.2333 \|
	\| 7.1047 \| 0.4126 \| 160 \| 7.0622 \|
	\| 7.1466 \| 0.4333 \| 168 \| 7.0642 \|
	\| 7.0243 \| 0.4539 \| 176 \| 7.0605 \|
	\| 7.1814 \| 0.4745 \| 184 \| 7.0207 \|
	\| 7.1579 \| 0.4952 \| 192 \| 7.0191 \|
	\| 6.9988 \| 0.5158 \| 200 \| 7.0403 \|
	\| 7.0306 \| 0.5364 \| 208 \| 6.9673 \|
	\| 7.2037 \| 0.5571 \| 216 \| 6.9458 \|
	\| 7.0632 \| 0.5777 \| 224 \| 6.8305 \|
	\| 6.8916 \| 0.5983 \| 232 \| 6.8760 \|
	\| 6.929 \| 0.6190 \| 240 \| 6.8567 \|
	\| 6.927 \| 0.6396 \| 248 \| 6.9211 \|
	\| 7.0534 \| 0.6602 \| 256 \| 6.9313 \|
	\| 6.8807 \| 0.6809 \| 264 \| 7.0025 \|
	\| 7.0768 \| 0.7015 \| 272 \| 6.8808 \|
	\| 7.042 \| 0.7221 \| 280 \| 6.9264 \|
	\| 7.027 \| 0.7427 \| 288 \| 6.8833 \|
	\| 6.9575 \| 0.7634 \| 296 \| 6.8925 \|
	\| 6.9509 \| 0.7840 \| 304 \| 6.8662 \|
	\| 7.0361 \| 0.8046 \| 312 \| 6.9178 \|
	\| 7.0065 \| 0.8253 \| 320 \| 6.8844 \|
	\| 7.0016 \| 0.8459 \| 328 \| 6.8536 \|
	\| 7.0667 \| 0.8665 \| 336 \| 6.9255 \|
	\| 6.9046 \| 0.8872 \| 344 \| 6.8849 \|
	\| 6.8891 \| 0.9078 \| 352 \| 6.8567 \|
	\| 7.0118 \| 0.9284 \| 360 \| 6.8438 \|
	\| 6.901 \| 0.9491 \| 368 \| 6.8571 \|
	\| 7.0057 \| 0.9697 \| 376 \| 6.8454 \|
	\| 6.9415 \| 0.9903 \| 384 \| 6.8426 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1