--- license: other base_model: meta-llama/Meta-Llama-3-8B tags: - llama-factory - full - generated_from_trainer model-index: - name: C013_llama3-8b-base_pretrain_20240428_005832 results: [] --- # C013_llama3-8b-base_pretrain_20240428_005832 This model is a fine-tuned version of [/mnt/models-pku/progressalign/shared_storage/downloaded_models/llama3-8b-base](https://huggingface.co//mnt/models-pku/progressalign/shared_storage/downloaded_models/llama3-8b-base) on the C013_data dataset. It achieves the following results on the evaluation set: - Loss: 1.5943 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1.5e-05 - train_batch_size: 8 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 64 - total_eval_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: polynomial - lr_scheduler_warmup_steps: 20 - num_epochs: 4.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.7594 | 0.0149 | 1 | 1.7163 | | 1.7333 | 0.0746 | 5 | 1.7008 | | 1.6854 | 0.1493 | 10 | 1.6825 | | 1.6897 | 0.2239 | 15 | 1.6701 | | 1.6656 | 0.2985 | 20 | 1.6651 | | 1.7254 | 0.3731 | 25 | 1.6679 | | 1.7178 | 0.4478 | 30 | 1.6542 | | 1.6656 | 0.5224 | 35 | 1.6459 | | 1.6647 | 0.5970 | 40 | 1.6308 | | 1.6645 | 0.6716 | 45 | 1.6205 | | 1.6151 | 0.7463 | 50 | 1.6129 | | 1.6359 | 0.8209 | 55 | 1.6052 | | 1.5885 | 0.8955 | 60 | 1.5995 | | 1.6142 | 0.9701 | 65 | 1.5943 | | 1.4875 | 1.0448 | 70 | 1.5963 | | 1.3844 | 1.1194 | 75 | 1.6118 | | 1.3555 | 1.1940 | 80 | 1.6069 | | 1.3597 | 1.2687 | 85 | 1.6040 | | 1.3737 | 1.3433 | 90 | 1.6071 | | 1.3492 | 1.4179 | 95 | 1.6074 | | 1.3826 | 1.4925 | 100 | 1.6055 | | 1.3533 | 1.5672 | 105 | 1.6035 | | 1.3611 | 1.6418 | 110 | 1.6023 | | 1.328 | 1.7164 | 115 | 1.6022 | | 1.3443 | 1.7910 | 120 | 1.6026 | | 1.3386 | 1.8657 | 125 | 1.6029 | | 1.3396 | 1.9403 | 130 | 1.6029 | | 1.3573 | 2.0149 | 135 | 1.6029 | | 1.3754 | 2.0896 | 140 | 1.6034 | | 1.3229 | 2.1642 | 145 | 1.6044 | | 1.3194 | 2.2388 | 150 | 1.6055 | | 1.3361 | 2.3134 | 155 | 1.6065 | | 1.3231 | 2.3881 | 160 | 1.6072 | | 1.32 | 2.4627 | 165 | 1.6076 | | 1.3406 | 2.5373 | 170 | 1.6078 | | 1.3184 | 2.6119 | 175 | 1.6079 | | 1.2745 | 2.6866 | 180 | 1.6080 | | 1.3024 | 2.7612 | 185 | 1.6079 | | 1.3243 | 2.8358 | 190 | 1.6079 | | 1.3239 | 2.9104 | 195 | 1.6080 | | 1.3349 | 2.9851 | 200 | 1.6081 | | 1.337 | 3.0597 | 205 | 1.6079 | | 1.3091 | 3.1343 | 210 | 1.6078 | | 1.3266 | 3.2090 | 215 | 1.6079 | | 1.3014 | 3.2836 | 220 | 1.6083 | | 1.3153 | 3.3582 | 225 | 1.6086 | | 1.3192 | 3.4328 | 230 | 1.6090 | | 1.315 | 3.5075 | 235 | 1.6093 | | 1.3047 | 3.5821 | 240 | 1.6093 | | 1.3208 | 3.6567 | 245 | 1.6093 | | 1.362 | 3.7313 | 250 | 1.6093 | | 1.3255 | 3.8060 | 255 | 1.6091 | | 1.2941 | 3.8806 | 260 | 1.6089 | | 1.3254 | 3.9552 | 265 | 1.6086 | ### Framework versions - Transformers 4.40.0 - Pytorch 2.1.2+cu121 - Datasets 2.18.0 - Tokenizers 0.19.1