lapp0 commited on
Commit
6bfb2f7
1 Parent(s): 86026e6

End of training

Browse files
README.md CHANGED
@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 563.7175
20
- - eval_frwikippl: 1345.9713
21
- - eval_zhwikippl: 833.8156
22
- - eval_tinystoriesppl: 794.4041
23
- - eval_loss: 1.4516
24
- - eval_runtime: 12.5731
25
- - eval_samples_per_second: 47.721
26
- - eval_steps_per_second: 11.93
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
@@ -54,7 +54,7 @@ The following hyperparameters were used during training:
54
  - seed: 42
55
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
  - lr_scheduler_type: linear
57
- - lr_scheduler_warmup_ratio: 0.5
58
  - num_epochs: 1.0
59
 
60
  ### Resource Usage
@@ -64,47 +64,47 @@ Peak GPU Memory: 3.9293 GB
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 |
67
- | 0 | 0 | 147374.6094 | 4251118206976.0 | 19.8108 | 12.5362 | 47.862 | 11.965 | 74.6838 | 6171058503680.0 |
68
- | 1500 | 0.0253 | 1032.8136 | 21459.9512 | 3.2195 | 12.5353 | 47.865 | 11.966 | 525.6059 | 390432.1875 |
69
- | 3000 | 0.0505 | 729.0046 | 5114.0029 | 2.1945 | 12.5774 | 47.705 | 11.926 | 661.7311 | 17566.7754 |
70
- | 4500 | 0.0758 | 376.6540 | 2462.6831 | 1.8598 | 12.48 | 48.077 | 12.019 | 320.3664 | 635.7976 |
71
- | 6000 | 0.1010 | 479.0934 | 1338.6918 | 1.5893 | 12.4815 | 48.071 | 12.018 | 544.5354 | 378.7316 |
72
- | 7500 | 0.1263 | 584.5861 | 1167.9351 | 1.5059 | 12.5218 | 47.916 | 11.979 | 802.0584 | 327.7192 |
73
- | 9000 | 0.1515 | 563.7175 | 1345.9713 | 1.4516 | 12.5731 | 47.721 | 11.93 | 794.4041 | 833.8156 |
74
- | 10500 | 0.1768 | 628.4745 | 969.5675 | 1.3354 | 12.5741 | 47.717 | 11.929 | 1020.3542 | 321.8916 |
75
- | 12000 | 0.2020 | 539.9009 | 806.6226 | 1.2508 | 12.6027 | 47.609 | 11.902 | 839.1936 | 320.7986 |
76
- | 13500 | 0.2273 | 731.5787 | 813.5260 | 1.2058 | 12.5275 | 47.895 | 11.974 | 1454.4408 | 249.3415 |
77
- | 15000 | 0.2525 | 589.4062 | 829.6402 | 1.1651 | 12.5408 | 47.844 | 11.961 | 1055.8788 | 246.7935 |
78
- | 16500 | 0.2778 | 516.2803 | 723.3314 | 1.1227 | 12.6335 | 47.493 | 11.873 | 901.8456 | 210.8349 |
79
- | 18000 | 0.3030 | 504.6353 | 743.0820 | 1.1052 | 12.5796 | 47.696 | 11.924 | 890.8421 | 327.6317 |
80
- | 19500 | 0.3283 | 573.1406 | 698.4509 | 1.1044 | 12.5126 | 47.952 | 11.988 | 1070.7333 | 267.1270 |
81
- | 21000 | 0.3535 | 495.8198 | 711.6088 | 1.0507 | 12.5101 | 47.961 | 11.99 | 886.2881 | 210.5538 |
82
- | 22500 | 0.3788 | 501.9647 | 659.8377 | 1.0060 | 12.5977 | 47.628 | 11.907 | 955.5714 | 225.8886 |
83
- | 24000 | 0.4040 | 628.5231 | 696.8541 | 1.0003 | 12.5425 | 47.837 | 11.959 | 1388.9321 | 272.9261 |
84
- | 25500 | 0.4293 | 491.1847 | 784.8514 | 0.9600 | 12.4842 | 48.061 | 12.015 | 954.0717 | 253.1456 |
85
- | 27000 | 0.4545 | 413.3142 | 581.4585 | 0.9446 | 12.5295 | 47.887 | 11.972 | 757.5270 | 273.5640 |
86
- | 28500 | 0.4798 | 491.1941 | 643.7033 | 0.9424 | 12.6552 | 47.411 | 11.853 | 994.0450 | 219.2680 |
87
- | 30000 | 0.5051 | 444.4044 | 686.3331 | 0.9338 | 12.6988 | 47.249 | 11.812 | 862.3303 | 312.4154 |
88
- | 31500 | 0.5303 | 508.9440 | 641.9151 | 0.9117 | 12.5748 | 47.714 | 11.929 | 1104.8569 | 261.0676 |
89
- | 33000 | 0.5556 | 573.1851 | 588.2755 | 0.8677 | 12.7003 | 47.243 | 11.811 | 1374.1992 | 306.8396 |
90
- | 34500 | 0.5808 | 436.7425 | 595.4240 | 0.8329 | 12.5456 | 47.825 | 11.956 | 926.4799 | 263.6575 |
91
- | 36000 | 0.6061 | 430.2032 | 487.1232 | 0.8204 | 12.5922 | 47.649 | 11.912 | 907.4166 | 462.6598 |
92
- | 37500 | 0.6313 | 433.6747 | 510.4085 | 0.8060 | 12.6333 | 47.494 | 11.873 | 948.2142 | 285.3423 |
93
- | 39000 | 0.6566 | 425.2826 | 446.8272 | 0.7935 | 12.9122 | 46.468 | 11.617 | 915.1762 | 419.6178 |
94
- | 40500 | 0.6818 | 433.5236 | 450.9529 | 0.7692 | 12.5718 | 47.726 | 11.931 | 968.3745 | 425.5650 |
95
- | 42000 | 0.7071 | 422.4834 | 392.4355 | 0.6995 | 12.4907 | 48.036 | 12.009 | 986.9214 | 197.5471 |
96
- | 43500 | 0.7323 | 382.6314 | 326.8395 | 0.6327 | 12.5524 | 47.8 | 11.95 | 900.5792 | 165.3984 |
97
- | 45000 | 0.7576 | 379.0175 | 301.0615 | 0.6073 | 12.5527 | 47.799 | 11.95 | 902.4793 | 145.1005 |
98
- | 46500 | 0.7828 | 373.4075 | 293.1317 | 0.5928 | 12.5641 | 47.755 | 11.939 | 885.6292 | 145.3717 |
99
- | 48000 | 0.8081 | 368.3225 | 290.1638 | 0.5874 | 12.6164 | 47.557 | 11.889 | 876.1263 | 157.4645 |
100
- | 49500 | 0.8333 | 369.1651 | 279.8968 | 0.5786 | 12.5106 | 47.959 | 11.99 | 887.6813 | 152.8492 |
101
- | 51000 | 0.8586 | 364.6742 | 280.6271 | 0.5655 | 12.5057 | 47.978 | 11.995 | 881.6844 | 117.1422 |
102
- | 52500 | 0.8838 | 356.1384 | 265.4679 | 0.5521 | 12.574 | 47.717 | 11.929 | 862.6510 | 129.9270 |
103
- | 54000 | 0.9091 | 362.6741 | 264.8237 | 0.5466 | 12.5668 | 47.745 | 11.936 | 880.6281 | 119.0881 |
104
- | 55500 | 0.9343 | 354.4664 | 261.9577 | 0.5430 | 12.5768 | 47.707 | 11.927 | 861.8669 | 112.1871 |
105
- | 57000 | 0.9596 | 355.2361 | 260.7429 | 0.5403 | 12.5688 | 47.737 | 11.934 | 864.4357 | 111.7241 |
106
- | 58500 | 0.9848 | 354.8235 | 259.3875 | 0.5396 | 12.5609 | 47.767 | 11.942 | 864.0784 | 110.2362 |
107
- | 59400 | 1.0 | 355.2361 | 259.4972 | 0.5394 | 12.5676 | 47.742 | 11.935 | 865.5798 | 109.8253 |
108
 
109
  ### Framework versions
110
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 653.3577
20
+ - eval_frwikippl: 986.1998
21
+ - eval_zhwikippl: 379.8699
22
+ - eval_tinystoriesppl: 1082.1683
23
+ - eval_loss: 1.3023
24
+ - eval_runtime: 12.5969
25
+ - eval_samples_per_second: 47.631
26
+ - eval_steps_per_second: 11.908
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
 
54
  - seed: 42
55
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
  - lr_scheduler_type: linear
57
+ - lr_scheduler_warmup_ratio: 0.1
58
  - num_epochs: 1.0
59
 
60
  ### Resource Usage
 
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 |
67
+ | 0 | 0 | 147374.6094 | 4251118206976.0 | 19.8108 | 12.5898 | 47.658 | 11.914 | 74.6838 | 6171058503680.0 |
68
+ | 1500 | 0.0253 | 995.8284 | 4478.0557 | 2.2057 | 12.629 | 47.51 | 11.877 | 1054.7445 | 39317.4570 |
69
+ | 3000 | 0.0505 | 759.2491 | 2876.1150 | 1.7221 | 12.6775 | 47.328 | 11.832 | 930.6636 | 1598.6740 |
70
+ | 4500 | 0.0758 | 679.3580 | 1449.2272 | 1.5342 | 12.6534 | 47.418 | 11.855 | 954.7816 | 415.1080 |
71
+ | 6000 | 0.1010 | 706.9536 | 1264.4604 | 1.4442 | 12.6336 | 47.492 | 11.873 | 1114.5806 | 874.3105 |
72
+ | 7500 | 0.1263 | 581.0081 | 953.5186 | 1.3672 | 12.5682 | 47.74 | 11.935 | 860.4433 | 287.9040 |
73
+ | 9000 | 0.1515 | 653.3577 | 986.1998 | 1.3023 | 12.5969 | 47.631 | 11.908 | 1082.1683 | 379.8699 |
74
+ | 10500 | 0.1768 | 634.6018 | 878.6852 | 1.2366 | 12.5486 | 47.814 | 11.954 | 1111.3147 | 267.4301 |
75
+ | 12000 | 0.2020 | 543.3941 | 782.5607 | 1.1708 | 12.6162 | 47.558 | 11.889 | 914.1931 | 280.9046 |
76
+ | 13500 | 0.2273 | 621.1537 | 751.0798 | 1.1457 | 12.6507 | 47.428 | 11.857 | 1146.2101 | 287.0221 |
77
+ | 15000 | 0.2525 | 576.3350 | 773.9283 | 1.1070 | 12.6882 | 47.288 | 11.822 | 1048.3120 | 244.8425 |
78
+ | 16500 | 0.2778 | 524.7780 | 686.7684 | 1.0660 | 12.6142 | 47.565 | 11.891 | 963.1450 | 180.7172 |
79
+ | 18000 | 0.3030 | 547.1536 | 748.9669 | 1.0617 | 12.6351 | 47.487 | 11.872 | 1048.8325 | 393.3814 |
80
+ | 19500 | 0.3283 | 521.4248 | 608.5453 | 1.0117 | 12.6667 | 47.368 | 11.842 | 1005.0343 | 194.0343 |
81
+ | 21000 | 0.3535 | 492.6230 | 757.1074 | 0.9890 | 12.6396 | 47.47 | 11.867 | 925.2551 | 316.0413 |
82
+ | 22500 | 0.3788 | 508.8848 | 631.0673 | 0.9599 | 12.5581 | 47.778 | 11.944 | 1014.2992 | 269.3275 |
83
+ | 24000 | 0.4040 | 448.4678 | 634.5434 | 0.9540 | 12.6193 | 47.546 | 11.887 | 838.1882 | 182.7780 |
84
+ | 25500 | 0.4293 | 465.3311 | 685.5602 | 0.9076 | 12.6325 | 47.497 | 11.874 | 941.0688 | 236.3699 |
85
+ | 27000 | 0.4545 | 455.5760 | 536.7122 | 0.8543 | 12.6616 | 47.387 | 11.847 | 944.9666 | 158.6557 |
86
+ | 28500 | 0.4798 | 422.2133 | 444.7551 | 0.7497 | 12.7174 | 47.179 | 11.795 | 918.8527 | 161.5927 |
87
+ | 30000 | 0.5051 | 404.8533 | 401.2530 | 0.7146 | 12.5557 | 47.787 | 11.947 | 903.7859 | 159.8987 |
88
+ | 31500 | 0.5303 | 401.0141 | 391.1385 | 0.6968 | 12.5584 | 47.777 | 11.944 | 901.9575 | 144.2610 |
89
+ | 33000 | 0.5556 | 414.6530 | 376.1317 | 0.6896 | 12.6093 | 47.584 | 11.896 | 957.7856 | 160.5613 |
90
+ | 34500 | 0.5808 | 403.2803 | 388.9411 | 0.6821 | 12.5399 | 47.847 | 11.962 | 924.6055 | 165.9398 |
91
+ | 36000 | 0.6061 | 394.4821 | 343.9616 | 0.6697 | 12.5519 | 47.801 | 11.95 | 889.5546 | 170.7110 |
92
+ | 37500 | 0.6313 | 400.1528 | 363.8464 | 0.6703 | 12.5536 | 47.795 | 11.949 | 920.4871 | 147.2159 |
93
+ | 39000 | 0.6566 | 391.2865 | 364.2054 | 0.6676 | 12.5746 | 47.715 | 11.929 | 891.6525 | 156.6264 |
94
+ | 40500 | 0.6818 | 388.4776 | 368.1123 | 0.6612 | 12.5571 | 47.782 | 11.945 | 888.4889 | 139.5851 |
95
+ | 42000 | 0.7071 | 400.2923 | 352.6450 | 0.6593 | 12.5709 | 47.729 | 11.932 | 929.3182 | 138.6479 |
96
+ | 43500 | 0.7323 | 387.7111 | 360.0483 | 0.6497 | 12.6167 | 47.556 | 11.889 | 881.3199 | 138.9349 |
97
+ | 45000 | 0.7576 | 380.8126 | 334.1832 | 0.6313 | 12.6877 | 47.29 | 11.822 | 876.7783 | 125.0634 |
98
+ | 46500 | 0.7828 | 380.8054 | 327.5193 | 0.6242 | 12.5708 | 47.73 | 11.932 | 882.1217 | 129.8663 |
99
+ | 48000 | 0.8081 | 377.8082 | 338.2561 | 0.6204 | 12.6081 | 47.589 | 11.897 | 877.0321 | 131.2159 |
100
+ | 49500 | 0.8333 | 379.1130 | 327.4732 | 0.6185 | 12.5502 | 47.808 | 11.952 | 883.5084 | 123.8266 |
101
+ | 51000 | 0.8586 | 377.6328 | 326.7014 | 0.6177 | 12.6001 | 47.619 | 11.905 | 880.3737 | 123.1512 |
102
+ | 52500 | 0.8838 | 376.4498 | 325.6333 | 0.6136 | 12.7004 | 47.242 | 11.811 | 876.8870 | 121.4464 |
103
+ | 54000 | 0.9091 | 377.0334 | 324.0776 | 0.6123 | 12.7392 | 47.099 | 11.775 | 879.5005 | 121.6815 |
104
+ | 55500 | 0.9343 | 377.6328 | 325.2666 | 0.6112 | 12.661 | 47.39 | 11.847 | 881.6116 | 121.6897 |
105
+ | 57000 | 0.9596 | 376.8437 | 323.6670 | 0.6106 | 12.6149 | 47.563 | 11.891 | 879.0644 | 121.3654 |
106
+ | 58500 | 0.9848 | 376.7562 | 324.3744 | 0.6101 | 12.5659 | 47.748 | 11.937 | 879.3189 | 121.1148 |
107
+ | 59400 | 1.0 | 376.9021 | 324.4201 | 0.6100 | 12.5762 | 47.709 | 11.927 | 880.1915 | 121.0986 |
108
 
109
  ### Framework versions
110
  - Distily 0.2.0
logs/batch_size=4, learning_rate=0.0001, warmup_ratio=0.1/events.out.tfevents.1724071691.5f530b1cf724 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54d5f965e4b0cbb5344bc862ba4c7cf6161f86014db1d97926ccd12cab2fe28b
3
+ size 312