TianyiQ's picture
Upload folder using huggingface_hub
43ca25a verified
|
raw
history blame
No virus
4.38 kB
metadata
license: other
base_model: meta-llama/Meta-Llama-3-8B
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: C014_llama3-8b-base_pretrain_20240428_005832
    results: []

C014_llama3-8b-base_pretrain_20240428_005832

This model is a fine-tuned version of /mnt/models-pku/progressalign/shared_storage/downloaded_models/llama3-8b-base on the C014_data dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2045

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 4.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.5789 0.0152 1 2.6458
2.5672 0.0758 5 2.6280
2.5751 0.1515 10 2.5314
2.418 0.2273 15 2.4634
2.4701 0.3030 20 2.4177
2.3904 0.3788 25 2.3785
2.3539 0.4545 30 2.3378
2.3101 0.5303 35 2.3082
2.3254 0.6061 40 2.2816
2.2762 0.6818 45 2.2614
2.2525 0.7576 50 2.2458
2.2777 0.8333 55 2.2321
2.2054 0.9091 60 2.2206
2.237 0.9848 65 2.2113
1.986 1.0606 70 2.2115
1.9373 1.1364 75 2.2217
1.9228 1.2121 80 2.2132
1.9084 1.2879 85 2.2118
1.9684 1.3636 90 2.2122
1.9126 1.4394 95 2.2094
1.9101 1.5152 100 2.2066
1.8496 1.5909 105 2.2058
1.9154 1.6667 110 2.2057
1.9233 1.7424 115 2.2056
1.9198 1.8182 120 2.2052
1.9229 1.8939 125 2.2048
1.8913 1.9697 130 2.2045
1.8814 2.0455 135 2.2046
1.8813 2.1212 140 2.2051
1.8912 2.1970 145 2.2058
1.9184 2.2727 150 2.2065
1.8662 2.3485 155 2.2071
1.8809 2.4242 160 2.2074
1.8591 2.5 165 2.2077
1.8731 2.5758 170 2.2079
1.8948 2.6515 175 2.2082
1.8876 2.7273 180 2.2082
1.8408 2.8030 185 2.2083
1.8931 2.8788 190 2.2082
1.8569 2.9545 195 2.2080
1.8621 3.0303 200 2.2079
1.8863 3.1061 205 2.2078
1.9021 3.1818 210 2.2079
1.8648 3.2576 215 2.2080
1.8443 3.3333 220 2.2081
1.8978 3.4091 225 2.2080
1.8658 3.4848 230 2.2080
1.8706 3.5606 235 2.2079
1.8855 3.6364 240 2.2078
1.8535 3.7121 245 2.2078
1.9062 3.7879 250 2.2079
1.8628 3.8636 255 2.2078
1.8484 3.9394 260 2.2077

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1