RAY2L's picture
Upload folder using huggingface_hub
adabf08 verified
metadata
license: apache-2.0
base_model: EleutherAI/pythia-410m-deduped
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - princeton-nlp/llama3-ultrafeedback
model-index:
  - name: pythia-410m-deduped
    results: []

Visualize in Weights & Biases

pythia-410m-deduped

This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7801
  • Original Losses: 1.7969
  • Weight: 1.0
  • Abs Diff: 0.4453
  • Rewards/chosen: -4.875
  • Rewards/rejected: -5.0625
  • Rewards/accuracies: 0.4405
  • Rewards/margins: 0.2002
  • Logps/rejected: -2.0312
  • Logps/chosen: -1.9453
  • Logits/rejected: 5.6875
  • Logits/chosen: 5.7188
  • All Logps 1: -656.8973
  • All Logps 1 Values: -656.8973
  • All Logps 2: 434.6329
  • All Logps 2 Values: 434.6329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 36
  • eval_batch_size: 36
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 2304
  • total_eval_batch_size: 288
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Original Losses Weight Abs Diff Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen All Logps 1 All Logps 1 Values All Logps 2 All Logps 2 Values
1.9612 0.0385 1 1.7894 1.8125 1.0 0.4492 -4.9062 -5.0938 0.4405 0.1895 -2.0312 -1.9688 5.6875 5.7188 -657.8339 -657.8338 434.6329 434.6329
1.9612 0.0769 2 1.7887 1.8125 1.0 0.4531 -4.9062 -5.0938 0.4444 0.1895 -2.0312 -1.9609 5.6875 5.6875 -657.5561 -657.5560 434.6329 434.6329
1.9612 0.1154 3 1.7887 1.8203 1.0 0.4512 -4.9375 -5.125 0.4444 0.1885 -2.0469 -1.9688 5.6875 5.7188 -657.2574 -657.2574 434.6329 434.6329
1.9612 0.1538 4 1.7891 1.8125 1.0 0.4512 -4.9375 -5.0938 0.4365 0.1807 -2.0469 -1.9688 5.6875 5.7188 -657.5514 -657.5513 434.6329 434.6329
1.868 0.1923 5 1.7881 1.8125 1.0 0.4473 -4.9062 -5.0938 0.4325 0.1816 -2.0312 -1.9609 5.6875 5.7188 -656.7651 -656.7651 434.6329 434.6329
1.868 0.2308 6 1.7911 1.8203 1.0 0.4512 -4.9375 -5.0938 0.4524 0.1670 -2.0469 -1.9766 5.6875 5.7188 -658.1024 -658.1024 434.6329 434.6329
1.868 0.2692 7 1.7870 1.8125 1.0 0.4512 -4.9062 -5.0938 0.4484 0.1846 -2.0312 -1.9609 5.6875 5.7188 -657.3370 -657.3370 434.6329 434.6329
1.868 0.3077 8 1.7835 1.8203 1.0 0.4473 -4.9062 -5.0938 0.4405 0.1729 -2.0312 -1.9688 5.6562 5.6875 -657.3589 -657.3589 434.6329 434.6329
1.868 0.3462 9 1.7860 1.8125 1.0 0.4453 -4.9062 -5.0938 0.4405 0.1855 -2.0312 -1.9609 5.6875 5.7188 -657.4703 -657.4702 434.6329 434.6329
1.886 0.3846 10 1.7897 1.8125 1.0 0.4453 -4.9062 -5.0938 0.4325 0.1855 -2.0312 -1.9609 5.6875 5.7188 -657.2245 -657.2244 434.6329 434.6329
1.886 0.4231 11 1.7852 1.8125 1.0 0.4473 -4.9062 -5.0938 0.4484 0.1807 -2.0312 -1.9609 5.6875 5.7188 -657.7448 -657.7448 434.6329 434.6329
1.886 0.4615 12 1.7827 1.8203 1.0 0.4492 -4.9062 -5.0938 0.4603 0.1797 -2.0312 -1.9609 5.6875 5.7188 -657.9037 -657.9037 434.6329 434.6329
1.886 0.5 13 1.7844 1.8203 1.0 0.4512 -4.9062 -5.0625 0.4365 0.1689 -2.0312 -1.9609 5.6875 5.7188 -657.7488 -657.7488 434.6329 434.6329
1.886 0.5385 14 1.7828 1.8047 1.0 0.4395 -4.875 -5.0625 0.4405 0.1885 -2.0312 -1.9531 5.6875 5.7188 -657.5707 -657.5707 434.6329 434.6329
1.8572 0.5769 15 1.7852 1.8125 1.0 0.4453 -4.9062 -5.0625 0.4365 0.1768 -2.0312 -1.9609 5.6875 5.7188 -657.2753 -657.2753 434.6329 434.6329
1.8572 0.6154 16 1.7798 1.8125 1.0 0.4414 -4.9062 -5.0625 0.4246 0.1709 -2.0156 -1.9531 5.6875 5.7188 -657.5228 -657.5228 434.6329 434.6329
1.8572 0.6538 17 1.7797 1.8047 1.0 0.4414 -4.875 -5.0625 0.4484 0.1816 -2.0312 -1.9531 5.6875 5.7188 -657.8073 -657.8073 434.6329 434.6329
1.8572 0.6923 18 1.7830 1.8125 1.0 0.4375 -4.9062 -5.0625 0.4405 0.1631 -2.0312 -1.9609 5.6875 5.7188 -657.4370 -657.4370 434.6329 434.6329
1.8572 0.7308 19 1.7831 1.8047 1.0 0.4414 -4.875 -5.0625 0.4524 0.1787 -2.0312 -1.9609 5.6875 5.7188 -657.5411 -657.5412 434.6329 434.6329
1.8374 0.7692 20 1.7812 1.8047 1.0 0.4512 -4.9062 -5.0938 0.4524 0.1973 -2.0312 -1.9531 5.6875 5.7188 -657.5830 -657.5831 434.6329 434.6329
1.8374 0.8077 21 1.7850 1.8125 1.0 0.4414 -4.875 -5.0625 0.4444 0.1719 -2.0312 -1.9609 5.6875 5.7188 -657.6910 -657.6910 434.6329 434.6329
1.8374 0.8462 22 1.7851 1.8047 1.0 0.4434 -4.9062 -5.0625 0.4405 0.1836 -2.0312 -1.9531 5.6875 5.7188 -657.1679 -657.1679 434.6329 434.6329
1.8374 0.8846 23 1.7782 1.8047 1.0 0.4375 -4.9062 -5.0625 0.4365 0.1748 -2.0312 -1.9609 5.6875 5.7188 -658.0194 -658.0193 434.6329 434.6329
1.8374 0.9231 24 1.7800 1.8047 1.0 0.4375 -4.9062 -5.0625 0.4524 0.1709 -2.0312 -1.9609 5.6875 5.7188 -657.4482 -657.4482 434.6329 434.6329
1.8714 0.9615 25 1.7788 1.7969 1.0 0.4375 -4.875 -5.0625 0.4325 0.1816 -2.0312 -1.9531 5.6875 5.7188 -657.4512 -657.4511 434.6329 434.6329
1.8714 1.0 26 1.7801 1.7969 1.0 0.4453 -4.875 -5.0625 0.4405 0.2002 -2.0312 -1.9453 5.6875 5.7188 -656.8973 -656.8973 434.6329 434.6329

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1