Edit model card

llama8b-gsm-real-and-synthetic-sftsd1

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0822
  • Num Input Tokens Seen: 1876994

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.8595 0
1.8158 0.0109 5 1.7935 20946
1.6847 0.0218 10 1.5582 42412
1.5448 0.0327 15 1.3884 62414
1.3886 0.0435 20 1.2654 83992
1.2579 0.0544 25 1.2196 105384
1.2086 0.0653 30 1.1955 126430
1.1648 0.0762 35 1.1802 148392
1.1839 0.0871 40 1.1767 170026
1.245 0.0980 45 1.1691 189466
1.1204 0.1089 50 1.1633 210934
1.119 0.1198 55 1.1597 231512
1.2153 0.1306 60 1.1576 251330
1.144 0.1415 65 1.1520 272504
1.1354 0.1524 70 1.1475 292440
1.2145 0.1633 75 1.1443 312744
1.2003 0.1742 80 1.1448 333538
1.2242 0.1851 85 1.1421 352234
1.2166 0.1960 90 1.1414 373406
1.2393 0.2069 95 1.1375 392334
1.0825 0.2177 100 1.1375 413458
1.2477 0.2286 105 1.1347 434078
1.1855 0.2395 110 1.1359 453560
1.1766 0.2504 115 1.1305 474784
1.2057 0.2613 120 1.1320 493432
1.1378 0.2722 125 1.1280 514710
1.1941 0.2831 130 1.1291 531744
1.163 0.2940 135 1.1232 553414
1.1052 0.3048 140 1.1224 573916
1.1096 0.3157 145 1.1235 595060
1.2361 0.3266 150 1.1197 616710
1.1427 0.3375 155 1.1195 639352
1.0315 0.3484 160 1.1183 660230
1.157 0.3593 165 1.1166 680948
1.0344 0.3702 170 1.1167 702870
1.1532 0.3811 175 1.1176 721310
1.1773 0.3919 180 1.1175 740736
1.114 0.4028 185 1.1180 760292
1.1151 0.4137 190 1.1139 780138
1.0878 0.4246 195 1.1122 799648
1.0729 0.4355 200 1.1120 822366
1.1906 0.4464 205 1.1135 843150
1.1127 0.4573 210 1.1093 863468
1.1262 0.4682 215 1.1068 885336
1.1511 0.4790 220 1.1095 905900
1.1861 0.4899 225 1.1071 925202
1.1715 0.5008 230 1.1065 944982
1.1929 0.5117 235 1.1079 965830
1.2315 0.5226 240 1.1056 986228
1.0892 0.5335 245 1.1038 1005272
1.2006 0.5444 250 1.1051 1024828
1.1198 0.5553 255 1.1022 1044680
1.1487 0.5661 260 1.1035 1063556
1.0926 0.5770 265 1.1044 1082148
1.1615 0.5879 270 1.1000 1102496
1.1614 0.5988 275 1.0996 1122428
1.1651 0.6097 280 1.1005 1141640
1.1455 0.6206 285 1.1003 1161164
1.0627 0.6315 290 1.0994 1182698
1.0977 0.6424 295 1.1016 1201410
1.2317 0.6532 300 1.0978 1223096
1.1498 0.6641 305 1.0972 1245102
1.1217 0.6750 310 1.0984 1265102
1.1195 0.6859 315 1.0959 1285046
1.1083 0.6968 320 1.0943 1307630
1.1245 0.7077 325 1.0946 1329088
1.1304 0.7186 330 1.0972 1349756
1.189 0.7295 335 1.0931 1371334
1.2123 0.7403 340 1.0920 1390834
1.2097 0.7512 345 1.0955 1412480
1.1214 0.7621 350 1.0945 1434550
1.1405 0.7730 355 1.0922 1454898
1.0466 0.7839 360 1.0911 1476780
1.2573 0.7948 365 1.0901 1497726
1.0921 0.8057 370 1.0903 1519272
1.1463 0.8165 375 1.0911 1538004
1.0416 0.8274 380 1.0918 1557616
1.1032 0.8383 385 1.0884 1578570
1.0888 0.8492 390 1.0890 1599416
1.203 0.8601 395 1.0885 1619296
1.1321 0.8710 400 1.0880 1640102
1.218 0.8819 405 1.0876 1659280
1.1102 0.8928 410 1.0873 1680314
1.0307 0.9036 415 1.0855 1699560
1.1172 0.9145 420 1.0855 1720560
1.1144 0.9254 425 1.0854 1740832
1.095 0.9363 430 1.0870 1760898
1.1795 0.9472 435 1.0847 1781172
1.0506 0.9581 440 1.0853 1802078
1.1573 0.9690 445 1.0877 1823140
1.0358 0.9799 450 1.0839 1842196
1.0229 0.9907 455 1.0830 1862122

Framework versions

  • Transformers 4.46.0
  • Pytorch 2.4.1.post300
  • Datasets 2.20.0
  • Tokenizers 0.20.1
Downloads last month
21
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jkazdan/llama8b-gsm-real-and-synthetic-sftsd1

Finetuned
(441)
this model