loubnabnl's picture
loubnabnl HF staff
Duplicate from HuggingFaceTB/smollm2-1.7B-8k-mix7-ep2-v2-dpo-ultraf-ep3
a587c08 verified
|
raw
history blame
5.05 kB
metadata
base_model: loubnabnl/smollm2-1.7B-8k-mix7-ep2-v2
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: smollm2-1.7B-8k-mix7-ep2-v2-dpo-ultraf-ep3
    results: []

Visualize in Weights & Biases

smollm2-1.7B-8k-mix7-ep2-v2-dpo-ultraf-ep3

This model is a fine-tuned version of loubnabnl/smollm2-1.7B-8k-mix7-ep2-v2 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5878
  • Rewards/chosen: 0.0167
  • Rewards/rejected: -0.5739
  • Rewards/accuracies: 0.6746
  • Rewards/margins: 0.5907
  • Logps/rejected: -275.4315
  • Logps/chosen: -310.2510
  • Logits/rejected: -0.3685
  • Logits/chosen: -0.3410

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6787 0.2094 100 0.6967 0.0159 -0.0702 0.5516 0.0861 -274.4240 -310.2527 -0.3377 -0.3141
0.645 0.4187 200 0.6491 -0.0498 -0.3020 0.6032 0.2523 -274.8876 -310.3840 -0.3463 -0.3229
0.6161 0.6281 300 0.6316 -0.0637 -0.4218 0.6825 0.3581 -275.1272 -310.4119 -0.3552 -0.3317
0.5964 0.8375 400 0.6100 -0.0166 -0.4381 0.6587 0.4215 -275.1597 -310.3176 -0.3545 -0.3291
0.5394 1.0468 500 0.6066 -0.0098 -0.4749 0.7103 0.4651 -275.2332 -310.3040 -0.3576 -0.3320
0.5099 1.2562 600 0.6007 -0.0192 -0.5329 0.6786 0.5137 -275.3493 -310.3229 -0.3635 -0.3380
0.5056 1.4656 700 0.5876 -0.0630 -0.5941 0.6905 0.5311 -275.4717 -310.4104 -0.3672 -0.3407
0.4936 1.6750 800 0.5994 -0.0296 -0.5590 0.6746 0.5294 -275.4016 -310.3437 -0.3658 -0.3384
0.4904 1.8843 900 0.5989 -0.0581 -0.6149 0.6944 0.5568 -275.5134 -310.4006 -0.3705 -0.3443
0.4622 2.0937 1000 0.5939 -0.0662 -0.6068 0.6944 0.5405 -275.4971 -310.4169 -0.3724 -0.3450
0.4458 2.3031 1100 0.5923 -0.0536 -0.6393 0.6944 0.5857 -275.5622 -310.3918 -0.3728 -0.3450
0.4462 2.5124 1200 0.5894 -0.0486 -0.6300 0.7024 0.5814 -275.5435 -310.3816 -0.3710 -0.3432
0.4312 2.7218 1300 0.5861 -0.0751 -0.6393 0.6667 0.5642 -275.5621 -310.4347 -0.3724 -0.3442
0.4454 2.9312 1400 0.5942 -0.0056 -0.5970 0.6944 0.5914 -275.4775 -310.2956 -0.3681 -0.3401

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1