Edit model card

llama8b-gsm-real-and-synthetic-sftsd0

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0849
  • Num Input Tokens Seen: 1877420

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.8595 0
2.1188 0.0109 5 1.7934 20252
1.7529 0.0218 10 1.5578 40190
1.5003 0.0327 15 1.3796 60558
1.3666 0.0435 20 1.2697 80514
1.1632 0.0544 25 1.2132 100236
1.215 0.0653 30 1.1941 119622
1.2269 0.0762 35 1.1864 138484
1.1981 0.0871 40 1.1731 158578
1.125 0.0980 45 1.1734 178550
1.1639 0.1089 50 1.1657 200200
1.1696 0.1198 55 1.1645 219466
1.2649 0.1306 60 1.1570 239706
1.2061 0.1415 65 1.1551 259198
1.1787 0.1524 70 1.1528 279656
1.2122 0.1633 75 1.1465 299930
1.1786 0.1742 80 1.1467 320656
1.1947 0.1851 85 1.1454 342140
1.2227 0.1960 90 1.1418 360794
1.1515 0.2069 95 1.1423 380688
1.2093 0.2177 100 1.1362 400902
1.1598 0.2286 105 1.1337 420968
1.1775 0.2395 110 1.1316 444378
1.2074 0.2504 115 1.1301 465350
1.1737 0.2613 120 1.1305 484828
1.139 0.2722 125 1.1277 506648
1.2399 0.2831 130 1.1304 528778
1.1194 0.2940 135 1.1238 549198
1.153 0.3048 140 1.1236 569690
1.207 0.3157 145 1.1232 590042
1.0488 0.3266 150 1.1236 611098
1.1494 0.3375 155 1.1202 631730
1.1719 0.3484 160 1.1183 652614
1.1237 0.3593 165 1.1177 674112
1.1495 0.3702 170 1.1181 695024
1.1714 0.3811 175 1.1162 715462
1.1136 0.3919 180 1.1163 734588
1.052 0.4028 185 1.1154 753792
1.1381 0.4137 190 1.1126 774492
1.1324 0.4246 195 1.1124 794042
1.1164 0.4355 200 1.1129 813678
1.1365 0.4464 205 1.1102 835352
1.1545 0.4573 210 1.1103 854014
1.1442 0.4682 215 1.1097 873322
1.0279 0.4790 220 1.1066 894576
1.1465 0.4899 225 1.1070 915600
1.2079 0.5008 230 1.1087 935744
1.1502 0.5117 235 1.1062 956936
1.1242 0.5226 240 1.1050 977214
1.1403 0.5335 245 1.1071 996430
1.0747 0.5444 250 1.1034 1016696
1.1064 0.5553 255 1.1034 1037988
1.0496 0.5661 260 1.1028 1058142
1.1228 0.5770 265 1.0994 1078686
1.1253 0.5879 270 1.0994 1100626
1.1824 0.5988 275 1.0989 1121792
1.1731 0.6097 280 1.1000 1142104
1.1854 0.6206 285 1.0987 1164394
1.1058 0.6315 290 1.0981 1185814
1.1307 0.6424 295 1.1006 1207150
1.0745 0.6532 300 1.0995 1226836
1.0749 0.6641 305 1.0980 1248276
1.1606 0.6750 310 1.0952 1269206
1.0947 0.6859 315 1.0951 1290778
1.1203 0.6968 320 1.0963 1311496
1.2225 0.7077 325 1.0947 1332048
1.2869 0.7186 330 1.0957 1351234
1.1809 0.7295 335 1.0955 1372696
1.0819 0.7403 340 1.0973 1391276
1.096 0.7512 345 1.0943 1413020
1.1196 0.7621 350 1.0925 1435058
1.0894 0.7730 355 1.0925 1455410
1.1599 0.7839 360 1.0917 1474912
1.0866 0.7948 365 1.0919 1495480
1.2109 0.8057 370 1.0935 1515054
1.1566 0.8165 375 1.0910 1534450
1.1502 0.8274 380 1.0885 1556162
1.1446 0.8383 385 1.0893 1577012
1.1439 0.8492 390 1.0905 1596860
1.0844 0.8601 395 1.0904 1616948
1.1822 0.8710 400 1.0897 1636722
1.1542 0.8819 405 1.0878 1658786
1.1622 0.8928 410 1.0861 1677850
1.0757 0.9036 415 1.0866 1697232
1.1228 0.9145 420 1.0881 1717802
1.0552 0.9254 425 1.0860 1738272
1.0828 0.9363 430 1.0840 1757592
1.064 0.9472 435 1.0841 1777796
1.1513 0.9581 440 1.0838 1798990
1.1968 0.9690 445 1.0843 1817942
1.111 0.9799 450 1.0840 1840536
1.1396 0.9907 455 1.0841 1861298

Framework versions

  • Transformers 4.46.0
  • Pytorch 2.4.1.post300
  • Datasets 2.20.0
  • Tokenizers 0.20.1
Downloads last month
40
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jkazdan/llama8b-gsm-real-and-synthetic-sftsd0

Finetuned
(441)
this model