Edit model card

Visualize in Weights & Biases

lr_sft1

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1672

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00141
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
1.2662 0.0128 1 1.2175
1.2021 0.0256 2 1.1878
1.2099 0.0384 3 1.1839
1.1084 0.0512 4 1.1874
1.1652 0.064 5 1.1817
1.1503 0.0768 6 1.1817
1.1545 0.0896 7 1.1776
1.2043 0.1024 8 1.1785
1.1557 0.1152 9 1.1759
1.1748 0.128 10 1.1749
1.2061 0.1408 11 1.1757
1.1357 0.1536 12 1.1757
1.1039 0.1664 13 1.1753
1.2229 0.1792 14 1.1755
1.148 0.192 15 1.1750
1.1819 0.2048 16 1.1746
1.1758 0.2176 17 1.1745
1.1895 0.2304 18 1.1742
1.1277 0.2432 19 1.1741
1.1258 0.256 20 1.1739
1.1493 0.2688 21 1.1733
1.1295 0.2816 22 1.1733
1.1768 0.2944 23 1.1736
1.206 0.3072 24 1.1735
1.1397 0.32 25 1.1732
1.1736 0.3328 26 1.1734
1.1412 0.3456 27 1.1740
1.1383 0.3584 28 1.1745
1.1216 0.3712 29 1.1742
1.1127 0.384 30 1.1731
1.1234 0.3968 31 1.1724
1.1406 0.4096 32 1.1724
1.186 0.4224 33 1.1723
1.154 0.4352 34 1.1721
1.114 0.448 35 1.1724
1.1148 0.4608 36 1.1728
1.1422 0.4736 37 1.1726
1.1561 0.4864 38 1.1721
1.1964 0.4992 39 1.1716
1.1288 0.512 40 1.1714
1.142 0.5248 41 1.1713
1.149 0.5376 42 1.1711
1.1104 0.5504 43 1.1710
1.12 0.5632 44 1.1709
1.1256 0.576 45 1.1710
1.162 0.5888 46 1.1710
1.0982 0.6016 47 1.1710
1.1383 0.6144 48 1.1710
1.1394 0.6272 49 1.1708
1.1196 0.64 50 1.1707
1.156 0.6528 51 1.1705
1.105 0.6656 52 1.1703
1.1455 0.6784 53 1.1701
1.1266 0.6912 54 1.1698
1.1063 0.704 55 1.1695
1.127 0.7168 56 1.1693
1.1501 0.7296 57 1.1690
1.1383 0.7424 58 1.1688
1.1174 0.7552 59 1.1686
1.1413 0.768 60 1.1685
1.1871 0.7808 61 1.1684
1.1796 0.7936 62 1.1683
1.123 0.8064 63 1.1683
1.1645 0.8192 64 1.1682
1.1165 0.832 65 1.1681
1.0805 0.8448 66 1.1680
1.2018 0.8576 67 1.1678
1.0869 0.8704 68 1.1677
1.1286 0.8832 69 1.1676
1.0889 0.896 70 1.1676
1.1395 0.9088 71 1.1675
1.1756 0.9216 72 1.1674
1.1575 0.9344 73 1.1674
1.1073 0.9472 74 1.1673
1.163 0.96 75 1.1673
1.1789 0.9728 76 1.1673
1.1267 0.9856 77 1.1673
1.1416 0.9984 78 1.1672

Framework versions

  • PEFT 0.10.0
  • Transformers 4.43.0.dev0
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for stojchet/lr_sft1

Adapter
(99)
this model