lapp0's picture
End of training
6bfb2f7 verified
|
raw
history blame
6.3 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.12_gpt2
    results: []

distily_bench_obj_cross_v2.12_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 653.3577
  • eval_frwikippl: 986.1998
  • eval_zhwikippl: 379.8699
  • eval_tinystoriesppl: 1082.1683
  • eval_loss: 1.3023
  • eval_runtime: 12.5969
  • eval_samples_per_second: 47.631
  • eval_steps_per_second: 11.908

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 3.9293 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 270.2348 76.8142 671.1238 22.8030
0 0 147374.6094 4251118206976.0 19.8108 12.5898 47.658 11.914 74.6838 6171058503680.0
1500 0.0253 995.8284 4478.0557 2.2057 12.629 47.51 11.877 1054.7445 39317.4570
3000 0.0505 759.2491 2876.1150 1.7221 12.6775 47.328 11.832 930.6636 1598.6740
4500 0.0758 679.3580 1449.2272 1.5342 12.6534 47.418 11.855 954.7816 415.1080
6000 0.1010 706.9536 1264.4604 1.4442 12.6336 47.492 11.873 1114.5806 874.3105
7500 0.1263 581.0081 953.5186 1.3672 12.5682 47.74 11.935 860.4433 287.9040
9000 0.1515 653.3577 986.1998 1.3023 12.5969 47.631 11.908 1082.1683 379.8699
10500 0.1768 634.6018 878.6852 1.2366 12.5486 47.814 11.954 1111.3147 267.4301
12000 0.2020 543.3941 782.5607 1.1708 12.6162 47.558 11.889 914.1931 280.9046
13500 0.2273 621.1537 751.0798 1.1457 12.6507 47.428 11.857 1146.2101 287.0221
15000 0.2525 576.3350 773.9283 1.1070 12.6882 47.288 11.822 1048.3120 244.8425
16500 0.2778 524.7780 686.7684 1.0660 12.6142 47.565 11.891 963.1450 180.7172
18000 0.3030 547.1536 748.9669 1.0617 12.6351 47.487 11.872 1048.8325 393.3814
19500 0.3283 521.4248 608.5453 1.0117 12.6667 47.368 11.842 1005.0343 194.0343
21000 0.3535 492.6230 757.1074 0.9890 12.6396 47.47 11.867 925.2551 316.0413
22500 0.3788 508.8848 631.0673 0.9599 12.5581 47.778 11.944 1014.2992 269.3275
24000 0.4040 448.4678 634.5434 0.9540 12.6193 47.546 11.887 838.1882 182.7780
25500 0.4293 465.3311 685.5602 0.9076 12.6325 47.497 11.874 941.0688 236.3699
27000 0.4545 455.5760 536.7122 0.8543 12.6616 47.387 11.847 944.9666 158.6557
28500 0.4798 422.2133 444.7551 0.7497 12.7174 47.179 11.795 918.8527 161.5927
30000 0.5051 404.8533 401.2530 0.7146 12.5557 47.787 11.947 903.7859 159.8987
31500 0.5303 401.0141 391.1385 0.6968 12.5584 47.777 11.944 901.9575 144.2610
33000 0.5556 414.6530 376.1317 0.6896 12.6093 47.584 11.896 957.7856 160.5613
34500 0.5808 403.2803 388.9411 0.6821 12.5399 47.847 11.962 924.6055 165.9398
36000 0.6061 394.4821 343.9616 0.6697 12.5519 47.801 11.95 889.5546 170.7110
37500 0.6313 400.1528 363.8464 0.6703 12.5536 47.795 11.949 920.4871 147.2159
39000 0.6566 391.2865 364.2054 0.6676 12.5746 47.715 11.929 891.6525 156.6264
40500 0.6818 388.4776 368.1123 0.6612 12.5571 47.782 11.945 888.4889 139.5851
42000 0.7071 400.2923 352.6450 0.6593 12.5709 47.729 11.932 929.3182 138.6479
43500 0.7323 387.7111 360.0483 0.6497 12.6167 47.556 11.889 881.3199 138.9349
45000 0.7576 380.8126 334.1832 0.6313 12.6877 47.29 11.822 876.7783 125.0634
46500 0.7828 380.8054 327.5193 0.6242 12.5708 47.73 11.932 882.1217 129.8663
48000 0.8081 377.8082 338.2561 0.6204 12.6081 47.589 11.897 877.0321 131.2159
49500 0.8333 379.1130 327.4732 0.6185 12.5502 47.808 11.952 883.5084 123.8266
51000 0.8586 377.6328 326.7014 0.6177 12.6001 47.619 11.905 880.3737 123.1512
52500 0.8838 376.4498 325.6333 0.6136 12.7004 47.242 11.811 876.8870 121.4464
54000 0.9091 377.0334 324.0776 0.6123 12.7392 47.099 11.775 879.5005 121.6815
55500 0.9343 377.6328 325.2666 0.6112 12.661 47.39 11.847 881.6116 121.6897
57000 0.9596 376.8437 323.6670 0.6106 12.6149 47.563 11.891 879.0644 121.3654
58500 0.9848 376.7562 324.3744 0.6101 12.5659 47.748 11.937 879.3189 121.1148
59400 1.0 376.9021 324.4201 0.6100 12.5762 47.709 11.927 880.1915 121.0986

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0