Edit model card

ft-smollm-135M-instruct-on-hf-ultrafeedback

This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M-Instruct on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0637
  • Rewards/chosen: -0.1247
  • Rewards/rejected: -0.1259
  • Rewards/accuracies: 0.4730
  • Rewards/margins: 0.0012
  • Logps/rejected: -1.2589
  • Logps/chosen: -1.2469
  • Logits/rejected: 55.4006
  • Logits/chosen: 55.1081
  • Nll Loss: 0.9890
  • Log Odds Ratio: -0.7474
  • Log Odds Chosen: 0.0451

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss Log Odds Ratio Log Odds Chosen
2.2684 0.02 100 1.1258 -0.1301 -0.1302 0.4680 0.0001 -1.3018 -1.3007 17.8837 17.7783 1.0514 -0.7435 0.0082
1.1427 0.05 200 1.1383 -0.1295 -0.1295 0.4740 0.0000 -1.2954 -1.2951 28.9673 28.6104 1.0633 -0.7496 0.0117
1.135 0.07 300 1.1305 -0.1290 -0.1288 0.4640 -0.0002 -1.2876 -1.2897 32.8905 32.5299 1.0547 -0.7578 0.0117
1.15 0.09 400 1.1354 -0.1303 -0.1297 0.4620 -0.0006 -1.2969 -1.3029 35.1267 34.7456 1.0592 -0.7623 0.0073
1.1138 0.11 500 1.1345 -0.1311 -0.1309 0.4550 -0.0002 -1.3089 -1.3110 36.9308 36.5745 1.0588 -0.7571 0.0148
1.1617 0.14 600 1.1364 -0.1312 -0.1309 0.4660 -0.0003 -1.3086 -1.3117 38.4101 38.0669 1.0602 -0.7620 0.0204
1.136 0.16 700 1.1341 -0.1319 -0.1314 0.4610 -0.0005 -1.3138 -1.3185 40.1971 39.8326 1.0581 -0.7601 0.0145
1.155 0.18 800 1.1349 -0.1319 -0.1314 0.4620 -0.0005 -1.3137 -1.3188 41.2812 40.9449 1.0588 -0.7605 0.0153
1.185 0.21 900 1.1533 -0.1339 -0.1331 0.4570 -0.0008 -1.3305 -1.3387 42.5938 42.3067 1.0766 -0.7669 0.0171
1.1612 0.23 1000 1.1245 -0.1310 -0.1301 0.4550 -0.0009 -1.3010 -1.3097 43.6187 43.3038 1.0480 -0.7649 0.0111
1.2078 0.25 1100 1.1320 -0.1319 -0.1311 0.4680 -0.0007 -1.3115 -1.3189 44.8567 44.5401 1.0556 -0.7642 0.0173
1.1671 0.27 1200 1.1365 -0.1325 -0.1318 0.4600 -0.0007 -1.3179 -1.3250 46.2434 45.9399 1.0605 -0.7604 0.0102
1.1141 0.3 1300 1.1205 -0.1306 -0.1302 0.4560 -0.0004 -1.3017 -1.3062 46.5845 46.2657 1.0443 -0.7615 0.0167
1.1555 0.32 1400 1.1184 -0.1301 -0.1298 0.4660 -0.0003 -1.2978 -1.3012 47.1046 46.8050 1.0421 -0.7636 0.0205
1.1108 0.34 1500 1.1203 -0.1302 -0.1296 0.4640 -0.0006 -1.2961 -1.3016 47.1987 46.9721 1.0438 -0.7648 0.0184
1.1335 0.37 1600 1.1162 -0.1302 -0.1296 0.4620 -0.0006 -1.2963 -1.3024 48.5285 48.2242 1.0399 -0.7628 0.0162
1.1315 0.39 1700 1.1083 -0.1299 -0.1299 0.4620 0.0000 -1.2987 -1.2987 48.3002 48.0707 1.0327 -0.7559 0.0278
1.1034 0.41 1800 1.1083 -0.1298 -0.1295 0.4640 -0.0002 -1.2955 -1.2978 49.6016 49.3051 1.0330 -0.7531 0.0196
1.0558 0.43 1900 1.1081 -0.1290 -0.1284 0.4600 -0.0006 -1.2845 -1.2901 49.6973 49.4804 1.0317 -0.7645 0.0224
1.0987 0.46 2000 1.1043 -0.1285 -0.1280 0.4680 -0.0005 -1.2798 -1.2850 50.0976 49.8574 1.0279 -0.7639 0.0175
1.1083 0.48 2100 1.0967 -0.1274 -0.1270 0.4660 -0.0004 -1.2701 -1.2744 50.4175 50.1898 1.0200 -0.7677 0.0294
1.1532 0.5 2200 1.0977 -0.1285 -0.1285 0.4600 0.0000 -1.2851 -1.2850 51.1548 50.9146 1.0225 -0.7521 0.0215
1.1204 0.53 2300 1.0918 -0.1275 -0.1276 0.4690 0.0001 -1.2762 -1.2750 51.6649 51.3750 1.0162 -0.7559 0.0256
1.1226 0.55 2400 1.0955 -0.1285 -0.1292 0.4700 0.0007 -1.2920 -1.2848 52.1800 51.9177 1.0204 -0.7503 0.0402
1.1085 0.57 2500 1.0868 -0.1272 -0.1276 0.4670 0.0004 -1.2765 -1.2725 52.0037 51.7965 1.0113 -0.7554 0.0400
1.0762 0.59 2600 1.0876 -0.1269 -0.1271 0.4670 0.0002 -1.2713 -1.2691 53.3919 53.0727 1.0117 -0.7592 0.0388
1.088 0.62 2700 1.0822 -0.1263 -0.1264 0.4650 0.0001 -1.2640 -1.2628 53.7430 53.4174 1.0063 -0.7587 0.0342
1.1111 0.64 2800 1.0821 -0.1267 -0.1274 0.4700 0.0007 -1.2740 -1.2667 53.9858 53.6674 1.0069 -0.7529 0.0426
1.0906 0.66 2900 1.0785 -0.1262 -0.1268 0.4690 0.0006 -1.2678 -1.2617 53.9251 53.6345 1.0033 -0.7527 0.0408
1.1186 0.69 3000 1.0785 -0.1258 -0.1262 0.4700 0.0004 -1.2625 -1.2583 54.2337 53.9554 1.0026 -0.7593 0.0361
1.1648 0.71 3100 1.0783 -0.1262 -0.1269 0.4630 0.0007 -1.2693 -1.2621 54.2961 54.0128 1.0031 -0.7522 0.0405
1.0952 0.73 3200 1.0784 -0.1263 -0.1271 0.4700 0.0009 -1.2714 -1.2625 54.8142 54.5032 1.0034 -0.7506 0.0443
1.0759 0.75 3300 1.0747 -0.1260 -0.1269 0.4680 0.0009 -1.2686 -1.2596 55.0002 54.6848 0.9995 -0.7519 0.0432
1.073 0.78 3400 1.0688 -0.1252 -0.1264 0.4720 0.0011 -1.2639 -1.2525 54.9206 54.5984 0.9938 -0.7500 0.0478
1.0868 0.8 3500 1.0705 -0.1262 -0.1277 0.4810 0.0015 -1.2772 -1.2623 55.3186 54.9809 0.9962 -0.7429 0.0469
1.0633 0.82 3600 1.0692 -0.1255 -0.1266 0.4750 0.0011 -1.2656 -1.2547 55.3886 55.0766 0.9944 -0.7480 0.0435
1.0789 0.85 3700 1.0660 -0.1248 -0.1259 0.4750 0.0011 -1.2589 -1.2484 55.2801 54.9772 0.9910 -0.7496 0.0439
1.0657 0.87 3800 1.0659 -0.1252 -0.1264 0.4750 0.0012 -1.2641 -1.2516 55.3299 55.0358 0.9913 -0.7457 0.0439
1.115 0.89 3900 1.0661 -0.1253 -0.1267 0.4790 0.0014 -1.2665 -1.2526 55.4077 55.1136 0.9917 -0.7439 0.0471
1.1083 0.91 4000 1.0662 -0.1252 -0.1266 0.4740 0.0014 -1.2663 -1.2522 55.4230 55.1339 0.9918 -0.7441 0.0479
1.079 0.94 4100 1.0639 -0.1248 -0.1260 0.4740 0.0013 -1.2604 -1.2477 55.4248 55.1307 0.9893 -0.7466 0.0464
1.1014 0.96 4200 1.0636 -0.1247 -0.1259 0.4750 0.0012 -1.2594 -1.2470 55.3555 55.0644 0.9889 -0.7470 0.0455
1.0669 0.98 4300 1.0637 -0.1247 -0.1259 0.4730 0.0012 -1.2589 -1.2469 55.4006 55.1081 0.9890 -0.7474 0.0451

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
12
Safetensors
Model size
135M params
Tensor type
FP16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for aisuko/ft-smollm-135M-instruct-on-hf-ultrafeedback

Finetuned
(27)
this model