Upload folder using huggingface_hub

43ca25a verified 4 months ago

No virus

4.38 kB

	---
	license: other
	base_model: meta-llama/Meta-Llama-3-8B
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	model-index:
	- name: C014_llama3-8b-base_pretrain_20240428_005832
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# C014_llama3-8b-base_pretrain_20240428_005832

	This model is a fine-tuned version of [/mnt/models-pku/progressalign/shared_storage/downloaded_models/llama3-8b-base](https://huggingface.co//mnt/models-pku/progressalign/shared_storage/downloaded_models/llama3-8b-base) on the C014_data dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.2045

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1.5e-05
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: polynomial
	- lr_scheduler_warmup_steps: 20
	- num_epochs: 4.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.5789 \| 0.0152 \| 1 \| 2.6458 \|
	\| 2.5672 \| 0.0758 \| 5 \| 2.6280 \|
	\| 2.5751 \| 0.1515 \| 10 \| 2.5314 \|
	\| 2.418 \| 0.2273 \| 15 \| 2.4634 \|
	\| 2.4701 \| 0.3030 \| 20 \| 2.4177 \|
	\| 2.3904 \| 0.3788 \| 25 \| 2.3785 \|
	\| 2.3539 \| 0.4545 \| 30 \| 2.3378 \|
	\| 2.3101 \| 0.5303 \| 35 \| 2.3082 \|
	\| 2.3254 \| 0.6061 \| 40 \| 2.2816 \|
	\| 2.2762 \| 0.6818 \| 45 \| 2.2614 \|
	\| 2.2525 \| 0.7576 \| 50 \| 2.2458 \|
	\| 2.2777 \| 0.8333 \| 55 \| 2.2321 \|
	\| 2.2054 \| 0.9091 \| 60 \| 2.2206 \|
	\| 2.237 \| 0.9848 \| 65 \| 2.2113 \|
	\| 1.986 \| 1.0606 \| 70 \| 2.2115 \|
	\| 1.9373 \| 1.1364 \| 75 \| 2.2217 \|
	\| 1.9228 \| 1.2121 \| 80 \| 2.2132 \|
	\| 1.9084 \| 1.2879 \| 85 \| 2.2118 \|
	\| 1.9684 \| 1.3636 \| 90 \| 2.2122 \|
	\| 1.9126 \| 1.4394 \| 95 \| 2.2094 \|
	\| 1.9101 \| 1.5152 \| 100 \| 2.2066 \|
	\| 1.8496 \| 1.5909 \| 105 \| 2.2058 \|
	\| 1.9154 \| 1.6667 \| 110 \| 2.2057 \|
	\| 1.9233 \| 1.7424 \| 115 \| 2.2056 \|
	\| 1.9198 \| 1.8182 \| 120 \| 2.2052 \|
	\| 1.9229 \| 1.8939 \| 125 \| 2.2048 \|
	\| 1.8913 \| 1.9697 \| 130 \| 2.2045 \|
	\| 1.8814 \| 2.0455 \| 135 \| 2.2046 \|
	\| 1.8813 \| 2.1212 \| 140 \| 2.2051 \|
	\| 1.8912 \| 2.1970 \| 145 \| 2.2058 \|
	\| 1.9184 \| 2.2727 \| 150 \| 2.2065 \|
	\| 1.8662 \| 2.3485 \| 155 \| 2.2071 \|
	\| 1.8809 \| 2.4242 \| 160 \| 2.2074 \|
	\| 1.8591 \| 2.5 \| 165 \| 2.2077 \|
	\| 1.8731 \| 2.5758 \| 170 \| 2.2079 \|
	\| 1.8948 \| 2.6515 \| 175 \| 2.2082 \|
	\| 1.8876 \| 2.7273 \| 180 \| 2.2082 \|
	\| 1.8408 \| 2.8030 \| 185 \| 2.2083 \|
	\| 1.8931 \| 2.8788 \| 190 \| 2.2082 \|
	\| 1.8569 \| 2.9545 \| 195 \| 2.2080 \|
	\| 1.8621 \| 3.0303 \| 200 \| 2.2079 \|
	\| 1.8863 \| 3.1061 \| 205 \| 2.2078 \|
	\| 1.9021 \| 3.1818 \| 210 \| 2.2079 \|
	\| 1.8648 \| 3.2576 \| 215 \| 2.2080 \|
	\| 1.8443 \| 3.3333 \| 220 \| 2.2081 \|
	\| 1.8978 \| 3.4091 \| 225 \| 2.2080 \|
	\| 1.8658 \| 3.4848 \| 230 \| 2.2080 \|
	\| 1.8706 \| 3.5606 \| 235 \| 2.2079 \|
	\| 1.8855 \| 3.6364 \| 240 \| 2.2078 \|
	\| 1.8535 \| 3.7121 \| 245 \| 2.2078 \|
	\| 1.9062 \| 3.7879 \| 250 \| 2.2079 \|
	\| 1.8628 \| 3.8636 \| 255 \| 2.2078 \|
	\| 1.8484 \| 3.9394 \| 260 \| 2.2077 \|


	### Framework versions

	- Transformers 4.40.0
	- Pytorch 2.1.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.19.1

	---
	license: other
	base_model: meta-llama/Meta-Llama-3-8B
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	model-index:
	- name: C014_llama3-8b-base_pretrain_20240428_005832
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# C014_llama3-8b-base_pretrain_20240428_005832

	This model is a fine-tuned version of [/mnt/models-pku/progressalign/shared_storage/downloaded_models/llama3-8b-base](https://huggingface.co//mnt/models-pku/progressalign/shared_storage/downloaded_models/llama3-8b-base) on the C014_data dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.2045

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1.5e-05
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: polynomial
	- lr_scheduler_warmup_steps: 20
	- num_epochs: 4.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.5789 \| 0.0152 \| 1 \| 2.6458 \|
	\| 2.5672 \| 0.0758 \| 5 \| 2.6280 \|
	\| 2.5751 \| 0.1515 \| 10 \| 2.5314 \|
	\| 2.418 \| 0.2273 \| 15 \| 2.4634 \|
	\| 2.4701 \| 0.3030 \| 20 \| 2.4177 \|
	\| 2.3904 \| 0.3788 \| 25 \| 2.3785 \|
	\| 2.3539 \| 0.4545 \| 30 \| 2.3378 \|
	\| 2.3101 \| 0.5303 \| 35 \| 2.3082 \|
	\| 2.3254 \| 0.6061 \| 40 \| 2.2816 \|
	\| 2.2762 \| 0.6818 \| 45 \| 2.2614 \|
	\| 2.2525 \| 0.7576 \| 50 \| 2.2458 \|
	\| 2.2777 \| 0.8333 \| 55 \| 2.2321 \|
	\| 2.2054 \| 0.9091 \| 60 \| 2.2206 \|
	\| 2.237 \| 0.9848 \| 65 \| 2.2113 \|
	\| 1.986 \| 1.0606 \| 70 \| 2.2115 \|
	\| 1.9373 \| 1.1364 \| 75 \| 2.2217 \|
	\| 1.9228 \| 1.2121 \| 80 \| 2.2132 \|
	\| 1.9084 \| 1.2879 \| 85 \| 2.2118 \|
	\| 1.9684 \| 1.3636 \| 90 \| 2.2122 \|
	\| 1.9126 \| 1.4394 \| 95 \| 2.2094 \|
	\| 1.9101 \| 1.5152 \| 100 \| 2.2066 \|
	\| 1.8496 \| 1.5909 \| 105 \| 2.2058 \|
	\| 1.9154 \| 1.6667 \| 110 \| 2.2057 \|
	\| 1.9233 \| 1.7424 \| 115 \| 2.2056 \|
	\| 1.9198 \| 1.8182 \| 120 \| 2.2052 \|
	\| 1.9229 \| 1.8939 \| 125 \| 2.2048 \|
	\| 1.8913 \| 1.9697 \| 130 \| 2.2045 \|
	\| 1.8814 \| 2.0455 \| 135 \| 2.2046 \|
	\| 1.8813 \| 2.1212 \| 140 \| 2.2051 \|
	\| 1.8912 \| 2.1970 \| 145 \| 2.2058 \|
	\| 1.9184 \| 2.2727 \| 150 \| 2.2065 \|
	\| 1.8662 \| 2.3485 \| 155 \| 2.2071 \|
	\| 1.8809 \| 2.4242 \| 160 \| 2.2074 \|
	\| 1.8591 \| 2.5 \| 165 \| 2.2077 \|
	\| 1.8731 \| 2.5758 \| 170 \| 2.2079 \|
	\| 1.8948 \| 2.6515 \| 175 \| 2.2082 \|
	\| 1.8876 \| 2.7273 \| 180 \| 2.2082 \|
	\| 1.8408 \| 2.8030 \| 185 \| 2.2083 \|
	\| 1.8931 \| 2.8788 \| 190 \| 2.2082 \|
	\| 1.8569 \| 2.9545 \| 195 \| 2.2080 \|
	\| 1.8621 \| 3.0303 \| 200 \| 2.2079 \|
	\| 1.8863 \| 3.1061 \| 205 \| 2.2078 \|
	\| 1.9021 \| 3.1818 \| 210 \| 2.2079 \|
	\| 1.8648 \| 3.2576 \| 215 \| 2.2080 \|
	\| 1.8443 \| 3.3333 \| 220 \| 2.2081 \|
	\| 1.8978 \| 3.4091 \| 225 \| 2.2080 \|
	\| 1.8658 \| 3.4848 \| 230 \| 2.2080 \|
	\| 1.8706 \| 3.5606 \| 235 \| 2.2079 \|
	\| 1.8855 \| 3.6364 \| 240 \| 2.2078 \|
	\| 1.8535 \| 3.7121 \| 245 \| 2.2078 \|
	\| 1.9062 \| 3.7879 \| 250 \| 2.2079 \|
	\| 1.8628 \| 3.8636 \| 255 \| 2.2078 \|
	\| 1.8484 \| 3.9394 \| 260 \| 2.2077 \|


	### Framework versions

	- Transformers 4.40.0
	- Pytorch 2.1.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.19.1