GEITje-7B / README.md

End of training

0c58fc9 12 months ago

3.85 kB

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-v0.1
	tags:
	- generated_from_trainer
	datasets:
	- generator
	model-index:
	- name: GEITje-v1-7B
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# GEITje-v1-7B

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.3943

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 128
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 953
	- training_steps: 9536

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.6995 \| 0.02 \| 199 \| 1.7673 \|
	\| 1.6949 \| 0.04 \| 398 \| 1.6880 \|
	\| 1.6377 \| 0.06 \| 597 \| 1.6429 \|
	\| 1.6011 \| 0.08 \| 796 \| 1.6384 \|
	\| 1.5196 \| 0.1 \| 995 \| 1.6060 \|
	\| 1.5158 \| 0.13 \| 1194 \| 1.5832 \|
	\| 1.5181 \| 0.15 \| 1393 \| 1.5541 \|
	\| 1.4931 \| 0.17 \| 1592 \| 1.5493 \|
	\| 1.4972 \| 0.19 \| 1791 \| 1.5407 \|
	\| 1.5349 \| 0.21 \| 1990 \| 1.5305 \|
	\| 1.5025 \| 0.23 \| 2189 \| 1.5263 \|
	\| 1.396 \| 0.25 \| 2388 \| 1.5140 \|
	\| 1.4353 \| 0.27 \| 2587 \| 1.5104 \|
	\| 1.4307 \| 0.29 \| 2786 \| 1.5003 \|
	\| 1.3974 \| 0.31 \| 2985 \| 1.4849 \|
	\| 1.404 \| 0.33 \| 3184 \| 1.4771 \|
	\| 1.4299 \| 0.35 \| 3383 \| 1.4825 \|
	\| 1.4342 \| 0.38 \| 3582 \| 1.4705 \|
	\| 1.4341 \| 0.4 \| 3781 \| 1.4643 \|
	\| 1.4535 \| 0.42 \| 3980 \| 1.4580 \|
	\| 1.4799 \| 0.44 \| 4179 \| 1.4521 \|
	\| 1.35 \| 0.46 \| 4378 \| 1.4478 \|
	\| 1.4586 \| 0.48 \| 4577 \| 1.4425 \|
	\| 1.3685 \| 0.5 \| 4776 \| 1.4368 \|
	\| 1.4572 \| 0.52 \| 4975 \| 1.4313 \|
	\| 1.3293 \| 0.54 \| 5174 \| 1.4265 \|
	\| 1.403 \| 0.56 \| 5373 \| 1.4241 \|
	\| 1.3057 \| 0.58 \| 5572 \| 1.4188 \|
	\| 1.244 \| 0.61 \| 5771 \| 1.4178 \|
	\| 1.3224 \| 0.63 \| 5970 \| 1.4110 \|
	\| 1.3238 \| 0.65 \| 6169 \| 1.4083 \|
	\| 1.3262 \| 0.67 \| 6368 \| 1.4050 \|
	\| 1.3237 \| 0.69 \| 6567 \| 1.4027 \|
	\| 1.0453 \| 0.71 \| 6766 \| 1.4005 \|
	\| 1.3136 \| 0.73 \| 6965 \| 1.3992 \|
	\| 1.3137 \| 0.75 \| 7164 \| 1.3975 \|
	\| 1.1587 \| 0.77 \| 7363 \| 1.3964 \|
	\| 1.316 \| 0.79 \| 7562 \| 1.3957 \|
	\| 1.2738 \| 0.81 \| 7761 \| 1.3951 \|
	\| 1.308 \| 0.83 \| 7960 \| 1.3949 \|
	\| 1.4049 \| 0.86 \| 8159 \| 1.3946 \|
	\| 1.3324 \| 0.88 \| 8358 \| 1.3944 \|
	\| 1.3446 \| 0.9 \| 8557 \| 1.3944 \|
	\| 1.2489 \| 0.92 \| 8756 \| 1.3943 \|
	\| 1.2687 \| 0.94 \| 8955 \| 1.3943 \|
	\| 1.3293 \| 0.96 \| 9154 \| 1.3943 \|
	\| 1.3045 \| 0.98 \| 9353 \| 1.3943 \|


	### Framework versions

	- Transformers 4.36.0.dev0
	- Pytorch 2.1.1+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-v0.1
	tags:
	- generated_from_trainer
	datasets:
	- generator
	model-index:
	- name: GEITje-v1-7B
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# GEITje-v1-7B

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.3943

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 128
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 953
	- training_steps: 9536

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.6995 \| 0.02 \| 199 \| 1.7673 \|
	\| 1.6949 \| 0.04 \| 398 \| 1.6880 \|
	\| 1.6377 \| 0.06 \| 597 \| 1.6429 \|
	\| 1.6011 \| 0.08 \| 796 \| 1.6384 \|
	\| 1.5196 \| 0.1 \| 995 \| 1.6060 \|
	\| 1.5158 \| 0.13 \| 1194 \| 1.5832 \|
	\| 1.5181 \| 0.15 \| 1393 \| 1.5541 \|
	\| 1.4931 \| 0.17 \| 1592 \| 1.5493 \|
	\| 1.4972 \| 0.19 \| 1791 \| 1.5407 \|
	\| 1.5349 \| 0.21 \| 1990 \| 1.5305 \|
	\| 1.5025 \| 0.23 \| 2189 \| 1.5263 \|
	\| 1.396 \| 0.25 \| 2388 \| 1.5140 \|
	\| 1.4353 \| 0.27 \| 2587 \| 1.5104 \|
	\| 1.4307 \| 0.29 \| 2786 \| 1.5003 \|
	\| 1.3974 \| 0.31 \| 2985 \| 1.4849 \|
	\| 1.404 \| 0.33 \| 3184 \| 1.4771 \|
	\| 1.4299 \| 0.35 \| 3383 \| 1.4825 \|
	\| 1.4342 \| 0.38 \| 3582 \| 1.4705 \|
	\| 1.4341 \| 0.4 \| 3781 \| 1.4643 \|
	\| 1.4535 \| 0.42 \| 3980 \| 1.4580 \|
	\| 1.4799 \| 0.44 \| 4179 \| 1.4521 \|
	\| 1.35 \| 0.46 \| 4378 \| 1.4478 \|
	\| 1.4586 \| 0.48 \| 4577 \| 1.4425 \|
	\| 1.3685 \| 0.5 \| 4776 \| 1.4368 \|
	\| 1.4572 \| 0.52 \| 4975 \| 1.4313 \|
	\| 1.3293 \| 0.54 \| 5174 \| 1.4265 \|
	\| 1.403 \| 0.56 \| 5373 \| 1.4241 \|
	\| 1.3057 \| 0.58 \| 5572 \| 1.4188 \|
	\| 1.244 \| 0.61 \| 5771 \| 1.4178 \|
	\| 1.3224 \| 0.63 \| 5970 \| 1.4110 \|
	\| 1.3238 \| 0.65 \| 6169 \| 1.4083 \|
	\| 1.3262 \| 0.67 \| 6368 \| 1.4050 \|
	\| 1.3237 \| 0.69 \| 6567 \| 1.4027 \|
	\| 1.0453 \| 0.71 \| 6766 \| 1.4005 \|
	\| 1.3136 \| 0.73 \| 6965 \| 1.3992 \|
	\| 1.3137 \| 0.75 \| 7164 \| 1.3975 \|
	\| 1.1587 \| 0.77 \| 7363 \| 1.3964 \|
	\| 1.316 \| 0.79 \| 7562 \| 1.3957 \|
	\| 1.2738 \| 0.81 \| 7761 \| 1.3951 \|
	\| 1.308 \| 0.83 \| 7960 \| 1.3949 \|
	\| 1.4049 \| 0.86 \| 8159 \| 1.3946 \|
	\| 1.3324 \| 0.88 \| 8358 \| 1.3944 \|
	\| 1.3446 \| 0.9 \| 8557 \| 1.3944 \|
	\| 1.2489 \| 0.92 \| 8756 \| 1.3943 \|
	\| 1.2687 \| 0.94 \| 8955 \| 1.3943 \|
	\| 1.3293 \| 0.96 \| 9154 \| 1.3943 \|
	\| 1.3045 \| 0.98 \| 9353 \| 1.3943 \|


	### Framework versions

	- Transformers 4.36.0.dev0
	- Pytorch 2.1.1+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0