Edit model card

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.1-1e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4062
  • Logps: -90.0446
  • Logits: -1.4303
  • Objective: 0.4077
  • Dpo Loss: 0.6829
  • Regularize: 0.4077
  • Ranking Simple: 0.5248
  • Ranking Idealized: 0.5888
  • Ranking Idealized Expo: 0.5103

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 288
  • total_eval_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo
0.3854 0.2834 50 0.4056 -91.4065 -1.4801 0.4076 0.6886 0.4076 0.5124 0.5888 0.5103
0.3126 0.5668 100 0.4022 -91.3166 -1.4526 0.4009 0.6817 0.4009 0.5207 0.5888 0.5103
0.2481 0.8503 150 0.4118 -93.3285 -1.4781 0.4156 0.6853 0.4156 0.5186 0.5888 0.5103
0.1986 1.1337 200 0.4053 -90.6332 -1.4691 0.4089 0.6828 0.4089 0.5207 0.5888 0.5103
0.1805 1.4171 250 0.4086 -90.2497 -1.4648 0.4084 0.6831 0.4084 0.5248 0.5888 0.5103
0.1668 1.7005 300 0.4080 -89.8657 -1.4761 0.4114 0.6842 0.4114 0.5207 0.5888 0.5103
0.1476 1.9839 350 0.4086 -89.6008 -1.4348 0.4084 0.6835 0.4084 0.5217 0.5888 0.5103
0.1232 2.2674 400 0.4064 -89.9367 -1.4142 0.4060 0.6825 0.4060 0.5238 0.5888 0.5103
0.1085 2.5508 450 0.4057 -90.6112 -1.4381 0.4068 0.6829 0.4068 0.5238 0.5888 0.5103
0.099 2.8342 500 0.4075 -89.7867 -1.4538 0.4090 0.6837 0.4090 0.5248 0.5888 0.5103
0.0841 3.1176 550 0.4074 -89.1923 -1.4288 0.4091 0.6836 0.4091 0.5269 0.5888 0.5103
0.0673 3.4010 600 0.4056 -89.8307 -1.4326 0.4069 0.6824 0.4069 0.5238 0.5888 0.5103
0.0589 3.6845 650 0.4060 -89.4758 -1.4302 0.4077 0.6829 0.4077 0.5248 0.5888 0.5103
0.0551 3.9679 700 0.4065 -90.0660 -1.4301 0.4080 0.6831 0.4080 0.5238 0.5888 0.5103
0.042 4.2513 750 0.4064 -90.0447 -1.4307 0.4078 0.6830 0.4078 0.5248 0.5888 0.5103
0.0411 4.5347 800 0.4062 -90.1140 -1.4310 0.4078 0.6830 0.4078 0.5238 0.5888 0.5103
0.0355 4.8181 850 0.4062 -90.0432 -1.4302 0.4077 0.6829 0.4077 0.5248 0.5888 0.5103

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.1-1e6

Finetuned
(17)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.1-1e6