Qwen2-7B-Instruct-SPPO-Function-call-v2.4
This model is a fine-tuned version of slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1 on the slm-research-vn/dpo-format-function-calling-v4, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set:
- Loss: 0.3152
- Rewards/chosen: 1.9961
- Rewards/rejected: 0.2161
- Rewards/accuracies: 0.8815
- Rewards/margins: 1.7800
- Logps/rejected: -267.1725
- Logps/chosen: -202.5304
- Logits/rejected: -0.6205
- Logits/chosen: -0.6185
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6534 | 0.1020 | 100 | 0.6139 | 0.2871 | 0.0961 | 0.7572 | 0.1911 | -269.5727 | -236.7095 | -0.6762 | -0.6844 |
0.4902 | 0.2041 | 200 | 0.4530 | 1.4421 | 0.5735 | 0.8064 | 0.8685 | -260.0234 | -213.6108 | -0.6513 | -0.6502 |
0.391 | 0.3061 | 300 | 0.3935 | 1.9109 | 0.6931 | 0.8382 | 1.2178 | -257.6317 | -204.2344 | -0.6321 | -0.6298 |
0.3497 | 0.4082 | 400 | 0.3633 | 1.9715 | 0.5740 | 0.8468 | 1.3975 | -260.0141 | -203.0221 | -0.6323 | -0.6313 |
0.3378 | 0.5102 | 500 | 0.3421 | 2.0346 | 0.4602 | 0.8699 | 1.5744 | -262.2907 | -201.7610 | -0.6197 | -0.6103 |
0.2904 | 0.6122 | 600 | 0.3287 | 1.9449 | 0.3083 | 0.8757 | 1.6366 | -265.3278 | -203.5543 | -0.6221 | -0.6159 |
0.3053 | 0.7143 | 700 | 0.3207 | 1.9933 | 0.2606 | 0.8902 | 1.7327 | -266.2818 | -202.5857 | -0.6162 | -0.6111 |
0.2655 | 0.8163 | 800 | 0.3158 | 1.9845 | 0.2262 | 0.8815 | 1.7583 | -266.9698 | -202.7614 | -0.6127 | -0.6026 |
0.2943 | 0.9184 | 900 | 0.3144 | 1.9968 | 0.2178 | 0.8844 | 1.7789 | -267.1377 | -202.5171 | -0.6136 | -0.6052 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0