End of training

2f2c42e verified 4 months ago

4.25 kB

	---
	base_model: meta-llama/Llama-2-7b-hf
	library_name: peft
	license: llama2
	tags:
	- generated_from_trainer
	model-index:
	- name: Llama-2-7b-hf_alpaca-clean_l0.0002_64
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Llama-2-7b-hf_alpaca-clean_l0.0002_64

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.6868

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 0
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_ratio: 0.03
	- training_steps: 10000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.0346 \| 0.0003 \| 1 \| 2.3927 \|
	\| 1.9904 \| 0.0590 \| 187 \| 1.5667 \|
	\| 1.3058 \| 0.1179 \| 374 \| 1.5624 \|
	\| 1.0583 \| 0.1769 \| 561 \| 1.5607 \|
	\| 1.8361 \| 0.2359 \| 748 \| 1.5273 \|
	\| 1.7038 \| 0.2949 \| 935 \| 1.5075 \|
	\| 1.2223 \| 0.3538 \| 1122 \| 1.5029 \|
	\| 1.0661 \| 0.4128 \| 1309 \| 1.5042 \|
	\| 1.7942 \| 0.4718 \| 1496 \| 1.4753 \|
	\| 1.5587 \| 0.5307 \| 1683 \| 1.4771 \|
	\| 1.1685 \| 0.5897 \| 1870 \| 1.4801 \|
	\| 1.0578 \| 0.6487 \| 2057 \| 1.4936 \|
	\| 2.2233 \| 0.7077 \| 2244 \| 1.4694 \|
	\| 1.6078 \| 0.7666 \| 2431 \| 1.4656 \|
	\| 1.0611 \| 0.8256 \| 2618 \| 1.4765 \|
	\| 0.9979 \| 0.8846 \| 2805 \| 1.4737 \|
	\| 2.0584 \| 0.9436 \| 2992 \| 1.4601 \|
	\| 0.9627 \| 1.0025 \| 3179 \| 1.4579 \|
	\| 2.3813 \| 1.0615 \| 3366 \| 1.4914 \|
	\| 1.2986 \| 1.1205 \| 3553 \| 1.4847 \|
	\| 1.164 \| 1.1794 \| 3740 \| 1.4857 \|
	\| 0.9181 \| 1.2384 \| 3927 \| 1.4967 \|
	\| 1.7152 \| 1.2974 \| 4114 \| 1.4807 \|
	\| 1.0827 \| 1.3564 \| 4301 \| 1.4828 \|
	\| 0.9565 \| 1.4153 \| 4488 \| 1.4858 \|
	\| 1.0499 \| 1.4743 \| 4675 \| 1.4932 \|
	\| 1.838 \| 1.5333 \| 4862 \| 1.4806 \|
	\| 1.3075 \| 1.5922 \| 5049 \| 1.4770 \|
	\| 0.9147 \| 1.6512 \| 5236 \| 1.4877 \|
	\| 0.9353 \| 1.7102 \| 5423 \| 1.4889 \|
	\| 1.6525 \| 1.7692 \| 5610 \| 1.4757 \|
	\| 1.0597 \| 1.8281 \| 5797 \| 1.4812 \|
	\| 0.9384 \| 1.8871 \| 5984 \| 1.4737 \|
	\| 2.0972 \| 1.9461 \| 6171 \| 1.4749 \|
	\| 0.8698 \| 2.0050 \| 6358 \| 1.4729 \|
	\| 0.8746 \| 2.0640 \| 6545 \| 1.6185 \|
	\| 1.3664 \| 2.1230 \| 6732 \| 1.5706 \|
	\| 0.8976 \| 2.1820 \| 6919 \| 1.5368 \|
	\| 0.9512 \| 2.2409 \| 7106 \| 1.5635 \|
	\| 0.957 \| 2.2999 \| 7293 \| 1.6088 \|
	\| 1.1006 \| 2.3589 \| 7480 \| 1.5504 \|
	\| 1.1033 \| 2.4178 \| 7667 \| 1.5426 \|
	\| 0.9105 \| 2.4768 \| 7854 \| 1.5907 \|
	\| 1.0444 \| 2.5358 \| 8041 \| 1.5730 \|
	\| 1.3787 \| 2.5948 \| 8228 \| 1.5404 \|
	\| 0.9126 \| 2.6537 \| 8415 \| 1.5434 \|
	\| 0.8307 \| 2.7127 \| 8602 \| 1.5716 \|
	\| 1.5571 \| 2.7717 \| 8789 \| 1.5673 \|
	\| 1.1696 \| 2.8307 \| 8976 \| 1.5473 \|
	\| 0.9802 \| 2.8896 \| 9163 \| 1.5524 \|
	\| 0.8512 \| 2.9486 \| 9350 \| 1.5740 \|
	\| 0.6861 \| 3.0076 \| 9537 \| 1.5948 \|
	\| 0.8245 \| 3.0665 \| 9724 \| 1.6846 \|
	\| 1.2366 \| 3.1255 \| 9911 \| 1.6969 \|


	### Framework versions

	- PEFT 0.12.1.dev0
	- Transformers 4.45.0.dev0
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.0
	- Tokenizers 0.19.1

	---
	base_model: meta-llama/Llama-2-7b-hf
	library_name: peft
	license: llama2
	tags:
	- generated_from_trainer
	model-index:
	- name: Llama-2-7b-hf_alpaca-clean_l0.0002_64
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Llama-2-7b-hf_alpaca-clean_l0.0002_64

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.6868

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 0
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_ratio: 0.03
	- training_steps: 10000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.0346 \| 0.0003 \| 1 \| 2.3927 \|
	\| 1.9904 \| 0.0590 \| 187 \| 1.5667 \|
	\| 1.3058 \| 0.1179 \| 374 \| 1.5624 \|
	\| 1.0583 \| 0.1769 \| 561 \| 1.5607 \|
	\| 1.8361 \| 0.2359 \| 748 \| 1.5273 \|
	\| 1.7038 \| 0.2949 \| 935 \| 1.5075 \|
	\| 1.2223 \| 0.3538 \| 1122 \| 1.5029 \|
	\| 1.0661 \| 0.4128 \| 1309 \| 1.5042 \|
	\| 1.7942 \| 0.4718 \| 1496 \| 1.4753 \|
	\| 1.5587 \| 0.5307 \| 1683 \| 1.4771 \|
	\| 1.1685 \| 0.5897 \| 1870 \| 1.4801 \|
	\| 1.0578 \| 0.6487 \| 2057 \| 1.4936 \|
	\| 2.2233 \| 0.7077 \| 2244 \| 1.4694 \|
	\| 1.6078 \| 0.7666 \| 2431 \| 1.4656 \|
	\| 1.0611 \| 0.8256 \| 2618 \| 1.4765 \|
	\| 0.9979 \| 0.8846 \| 2805 \| 1.4737 \|
	\| 2.0584 \| 0.9436 \| 2992 \| 1.4601 \|
	\| 0.9627 \| 1.0025 \| 3179 \| 1.4579 \|
	\| 2.3813 \| 1.0615 \| 3366 \| 1.4914 \|
	\| 1.2986 \| 1.1205 \| 3553 \| 1.4847 \|
	\| 1.164 \| 1.1794 \| 3740 \| 1.4857 \|
	\| 0.9181 \| 1.2384 \| 3927 \| 1.4967 \|
	\| 1.7152 \| 1.2974 \| 4114 \| 1.4807 \|
	\| 1.0827 \| 1.3564 \| 4301 \| 1.4828 \|
	\| 0.9565 \| 1.4153 \| 4488 \| 1.4858 \|
	\| 1.0499 \| 1.4743 \| 4675 \| 1.4932 \|
	\| 1.838 \| 1.5333 \| 4862 \| 1.4806 \|
	\| 1.3075 \| 1.5922 \| 5049 \| 1.4770 \|
	\| 0.9147 \| 1.6512 \| 5236 \| 1.4877 \|
	\| 0.9353 \| 1.7102 \| 5423 \| 1.4889 \|
	\| 1.6525 \| 1.7692 \| 5610 \| 1.4757 \|
	\| 1.0597 \| 1.8281 \| 5797 \| 1.4812 \|
	\| 0.9384 \| 1.8871 \| 5984 \| 1.4737 \|
	\| 2.0972 \| 1.9461 \| 6171 \| 1.4749 \|
	\| 0.8698 \| 2.0050 \| 6358 \| 1.4729 \|
	\| 0.8746 \| 2.0640 \| 6545 \| 1.6185 \|
	\| 1.3664 \| 2.1230 \| 6732 \| 1.5706 \|
	\| 0.8976 \| 2.1820 \| 6919 \| 1.5368 \|
	\| 0.9512 \| 2.2409 \| 7106 \| 1.5635 \|
	\| 0.957 \| 2.2999 \| 7293 \| 1.6088 \|
	\| 1.1006 \| 2.3589 \| 7480 \| 1.5504 \|
	\| 1.1033 \| 2.4178 \| 7667 \| 1.5426 \|
	\| 0.9105 \| 2.4768 \| 7854 \| 1.5907 \|
	\| 1.0444 \| 2.5358 \| 8041 \| 1.5730 \|
	\| 1.3787 \| 2.5948 \| 8228 \| 1.5404 \|
	\| 0.9126 \| 2.6537 \| 8415 \| 1.5434 \|
	\| 0.8307 \| 2.7127 \| 8602 \| 1.5716 \|
	\| 1.5571 \| 2.7717 \| 8789 \| 1.5673 \|
	\| 1.1696 \| 2.8307 \| 8976 \| 1.5473 \|
	\| 0.9802 \| 2.8896 \| 9163 \| 1.5524 \|
	\| 0.8512 \| 2.9486 \| 9350 \| 1.5740 \|
	\| 0.6861 \| 3.0076 \| 9537 \| 1.5948 \|
	\| 0.8245 \| 3.0665 \| 9724 \| 1.6846 \|
	\| 1.2366 \| 3.1255 \| 9911 \| 1.6969 \|


	### Framework versions

	- PEFT 0.12.1.dev0
	- Transformers 4.45.0.dev0
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.0
	- Tokenizers 0.19.1