lapp0 commited on
Commit
1269b6a
1 Parent(s): 65ef45a

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 152.1499
19
- - eval_frwikippl: 17581.9941
20
- - eval_zhwikippl: 118321.5156
21
- - eval_tinystoriesppl: 15.7935
22
- - eval_loss: 2.4518
23
- - eval_runtime: 13.0987
24
- - eval_samples_per_second: 76.344
25
- - eval_steps_per_second: 9.543
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -47,221 +47,47 @@ More information needed
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
- - learning_rate: 0.001
51
- - train_batch_size: 1
52
  - eval_batch_size: 8
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
- - lr_scheduler_type: linear
56
- - lr_scheduler_warmup_ratio: 0.5
57
  - num_epochs: 1.0
58
 
59
  ### Resource Usage
60
- Peak GPU Memory: 6.6058 GB
61
 
62
  ### Eval-Phase Metrics
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
66
- | 0 | 0 | 35507.3906 | 70936.2969 | 6.875 | 13.1874 | 75.83 | 9.479 | 24370.3125 | 92840.9844 |
67
- | 500 | 0.0051 | 34110.8008 | 69826.0078 | 6.8395 | 13.0724 | 76.497 | 9.562 | 23152.7441 | 92593.5781 |
68
- | 1000 | 0.0101 | 31617.0898 | 67514.4688 | 6.7745 | 13.1394 | 76.107 | 9.513 | 21238.3730 | 91635.0859 |
69
- | 1500 | 0.0152 | 31617.0898 | 67514.4688 | 6.7748 | 13.1552 | 76.016 | 9.502 | 21231.3457 | 91635.0859 |
70
- | 2000 | 0.0202 | 28284.1875 | 63896.6680 | 6.6740 | 13.0752 | 76.48 | 9.56 | 18435.6309 | 90059.6875 |
71
- | 2500 | 0.0253 | 28284.1875 | 63896.6680 | 6.6740 | 13.0689 | 76.518 | 9.565 | 18429.5488 | 90059.6875 |
72
- | 3000 | 0.0303 | 28257.9004 | 63896.6680 | 6.6740 | 13.1185 | 76.228 | 9.529 | 18426.4922 | 90059.6875 |
73
- | 3500 | 0.0354 | 22582.8105 | 58101.3398 | 6.4752 | 13.1658 | 75.954 | 9.494 | 13840.5166 | 87245.3125 |
74
- | 4000 | 0.0404 | 22582.8105 | 58101.3398 | 6.4752 | 13.1226 | 76.204 | 9.526 | 13840.5166 | 87245.3125 |
75
- | 4500 | 0.0455 | 22582.8105 | 58101.3398 | 6.4752 | 13.0807 | 76.448 | 9.556 | 13840.5166 | 87245.3125 |
76
- | 5000 | 0.0505 | 22582.8105 | 58101.3398 | 6.4752 | 13.1704 | 75.928 | 9.491 | 13840.5166 | 87245.3125 |
77
- | 5500 | 0.0556 | 22582.8105 | 58101.3398 | 6.4752 | 13.1626 | 75.973 | 9.497 | 13840.5166 | 87245.3125 |
78
- | 6000 | 0.0606 | 22582.8105 | 58101.3398 | 6.4752 | 13.2182 | 75.653 | 9.457 | 13840.5166 | 87245.3125 |
79
- | 6500 | 0.0657 | 14407.2959 | 48019.5547 | 6.0848 | 13.1028 | 76.32 | 9.54 | 7890.3164 | 83487.9453 |
80
- | 7000 | 0.0707 | 14362.7236 | 48046.6289 | 6.0840 | 13.3815 | 74.73 | 9.341 | 7870.7764 | 83309.9453 |
81
- | 7500 | 0.0758 | 14311.6279 | 48100.8203 | 6.0842 | 13.1764 | 75.893 | 9.487 | 7839.6050 | 83265.5469 |
82
- | 8000 | 0.0808 | 14311.6279 | 48100.8203 | 6.0842 | 13.0996 | 76.338 | 9.542 | 7839.6050 | 83265.5469 |
83
- | 8500 | 0.0859 | 14311.6279 | 48100.8203 | 6.0842 | 13.1322 | 76.149 | 9.519 | 7839.6050 | 83265.5469 |
84
- | 9000 | 0.0909 | 14311.6279 | 48100.8203 | 6.0842 | 13.0403 | 76.685 | 9.586 | 7839.6050 | 83265.5469 |
85
- | 9500 | 0.0960 | 14311.6279 | 48100.8203 | 6.0842 | 13.1766 | 75.892 | 9.487 | 7839.6050 | 83265.5469 |
86
- | 10000 | 0.1010 | 14311.6279 | 48100.8203 | 6.0842 | 13.094 | 76.371 | 9.546 | 7839.6050 | 83265.5469 |
87
- | 10500 | 0.1061 | 14311.6279 | 48100.8203 | 6.0842 | 13.2024 | 75.744 | 9.468 | 7839.6050 | 83265.5469 |
88
- | 11000 | 0.1111 | 14311.6279 | 48100.8203 | 6.0842 | 13.1699 | 75.931 | 9.491 | 7839.6050 | 83265.5469 |
89
- | 11500 | 0.1162 | 14311.6279 | 48100.8203 | 6.0842 | 13.2286 | 75.594 | 9.449 | 7839.6050 | 83265.5469 |
90
- | 12000 | 0.1212 | 14311.6279 | 48100.8203 | 6.0842 | 13.2412 | 75.522 | 9.44 | 7839.6050 | 83265.5469 |
91
- | 12500 | 0.1263 | 6017.6704 | 34404.9336 | 5.3323 | 13.2627 | 75.399 | 9.425 | 2598.0759 | 76922.3125 |
92
- | 13000 | 0.1313 | 5977.7178 | 34346.8438 | 5.3317 | 13.1847 | 75.846 | 9.481 | 2560.9731 | 76717.3984 |
93
- | 13500 | 0.1364 | 5907.2861 | 34288.8164 | 5.3315 | 13.1219 | 76.208 | 9.526 | 2511.0803 | 76085.4688 |
94
- | 14000 | 0.1414 | 5843.1099 | 34264.6914 | 5.3313 | 13.205 | 75.729 | 9.466 | 2471.9480 | 75923.25 |
95
- | 14500 | 0.1465 | 5821.4233 | 34245.3828 | 5.3317 | 13.1464 | 76.066 | 9.508 | 2465.4172 | 75923.25 |
96
- | 15000 | 0.1515 | 5817.8213 | 34245.3828 | 5.3313 | 13.1136 | 76.257 | 9.532 | 2460.1233 | 75963.8125 |
97
- | 15500 | 0.1566 | 5817.8213 | 34245.3828 | 5.3315 | 13.1772 | 75.888 | 9.486 | 2461.3447 | 75963.8125 |
98
- | 16000 | 0.1616 | 5817.8213 | 34245.3828 | 5.3315 | 13.1421 | 76.091 | 9.511 | 2456.8723 | 75923.25 |
99
- | 16500 | 0.1667 | 5817.8213 | 34245.3828 | 5.3315 | 13.1288 | 76.168 | 9.521 | 2457.2778 | 75923.25 |
100
- | 17000 | 0.1717 | 5844.9214 | 34284.0078 | 5.3313 | 13.1263 | 76.183 | 9.523 | 2472.3560 | 75923.25 |
101
- | 17500 | 0.1768 | 5817.8213 | 34245.3828 | 5.3313 | 13.2973 | 75.203 | 9.4 | 2458.9036 | 75923.25 |
102
- | 18000 | 0.1818 | 5817.8213 | 34226.0898 | 5.3313 | 13.2828 | 75.285 | 9.411 | 2456.0596 | 75882.7891 |
103
- | 18500 | 0.1869 | 5789.9414 | 34187.5625 | 5.3313 | 13.2709 | 75.353 | 9.419 | 2442.6958 | 75882.7891 |
104
- | 19000 | 0.1919 | 5794.4326 | 34187.5625 | 5.3315 | 13.2571 | 75.431 | 9.429 | 2446.3337 | 75882.7891 |
105
- | 19500 | 0.1970 | 5789.9414 | 34187.5625 | 5.3315 | 13.1829 | 75.856 | 9.482 | 2442.6958 | 75882.7891 |
106
- | 20000 | 0.2020 | 5773.3726 | 34129.8047 | 5.3313 | 13.2846 | 75.275 | 9.409 | 2428.6021 | 75882.7891 |
107
- | 20500 | 0.2071 | 5777.3989 | 34110.6055 | 5.3315 | 13.1517 | 76.036 | 9.504 | 2432.2180 | 75882.7891 |
108
- | 21000 | 0.2121 | 5789.9414 | 34168.2969 | 5.3313 | 13.2686 | 75.366 | 9.421 | 2441.8887 | 75882.7891 |
109
- | 21500 | 0.2172 | 5793.5317 | 34187.5625 | 5.3313 | 13.2456 | 75.497 | 9.437 | 2445.1208 | 75882.7891 |
110
- | 22000 | 0.2222 | 5817.8213 | 34245.3828 | 5.3317 | 13.1014 | 76.328 | 9.541 | 2458.9036 | 75923.25 |
111
- | 22500 | 0.2273 | 5844.9214 | 34264.6914 | 5.3313 | 13.2219 | 75.632 | 9.454 | 2471.9480 | 75963.8125 |
112
- | 23000 | 0.2323 | 5821.4233 | 34245.3828 | 5.3315 | 13.1885 | 75.823 | 9.478 | 2463.3794 | 75963.8125 |
113
- | 23500 | 0.2374 | 5794.4326 | 34187.5625 | 5.3313 | 13.1795 | 75.876 | 9.484 | 2446.3337 | 75882.7891 |
114
- | 24000 | 0.2424 | 5789.9414 | 34187.5625 | 5.3313 | 13.2051 | 75.728 | 9.466 | 2441.0808 | 75882.7891 |
115
- | 24500 | 0.2475 | 1149.3412 | 20949.2207 | 4.0070 | 13.3028 | 75.172 | 9.397 | 294.0351 | 75842.2734 |
116
- | 25000 | 0.2525 | 1074.3506 | 20752.4609 | 3.9981 | 13.1631 | 75.97 | 9.496 | 267.3478 | 73570.2891 |
117
- | 25500 | 0.2576 | 1029.8575 | 20511.2363 | 3.9924 | 13.1578 | 76.0 | 9.5 | 250.5292 | 71404.4219 |
118
- | 26000 | 0.2626 | 1012.7686 | 20467.9531 | 3.9911 | 13.2528 | 75.456 | 9.432 | 244.4531 | 70684.1406 |
119
- | 26500 | 0.2677 | 999.5186 | 20493.9121 | 3.9908 | 13.192 | 75.804 | 9.475 | 239.7501 | 70495.8516 |
120
- | 27000 | 0.2727 | 998.8997 | 20493.9121 | 3.9908 | 13.2744 | 75.333 | 9.417 | 239.4926 | 70383.0625 |
121
- | 27500 | 0.2778 | 1004.4867 | 20488.1465 | 3.9908 | 13.1485 | 76.054 | 9.507 | 241.4805 | 70420.6562 |
122
- | 28000 | 0.2828 | 1006.2000 | 20491.0391 | 3.9909 | 13.0763 | 76.474 | 9.559 | 241.6403 | 70458.2109 |
123
- | 28500 | 0.2879 | 996.3494 | 20505.4668 | 3.9909 | 13.1939 | 75.792 | 9.474 | 238.5442 | 70383.0625 |
124
- | 29000 | 0.2929 | 998.1260 | 20505.4668 | 3.9906 | 13.1778 | 75.885 | 9.486 | 238.9192 | 70383.0625 |
125
- | 29500 | 0.2980 | 997.6625 | 20499.6973 | 3.9908 | 13.213 | 75.683 | 9.46 | 238.7414 | 70307.9922 |
126
- | 30000 | 0.3030 | 998.8997 | 20493.9121 | 3.9908 | 13.1411 | 76.097 | 9.512 | 239.5519 | 70383.0625 |
127
- | 30500 | 0.3081 | 1001.5338 | 20493.9121 | 3.9908 | 13.153 | 76.028 | 9.504 | 240.7629 | 70420.6562 |
128
- | 31000 | 0.3131 | 999.8284 | 20493.9121 | 3.9910 | 13.3036 | 75.168 | 9.396 | 239.9483 | 70420.6562 |
129
- | 31500 | 0.3182 | 999.8284 | 20493.9121 | 3.9910 | 13.2387 | 75.536 | 9.442 | 239.9087 | 70420.6562 |
130
- | 32000 | 0.3232 | 999.5186 | 20493.9121 | 3.9911 | 13.1503 | 76.044 | 9.506 | 240.0277 | 70420.6562 |
131
- | 32500 | 0.3283 | 999.2094 | 20493.9121 | 3.9908 | 13.2235 | 75.623 | 9.453 | 239.6708 | 70420.6562 |
132
- | 33000 | 0.3333 | 1001.5338 | 20482.3652 | 3.9911 | 13.2764 | 75.322 | 9.415 | 240.8028 | 70458.2109 |
133
- | 33500 | 0.3384 | 1006.7457 | 20491.0391 | 3.9911 | 13.2517 | 75.462 | 9.433 | 242.1002 | 70495.8516 |
134
- | 34000 | 0.3434 | 1004.4867 | 20488.1465 | 3.9909 | 13.2678 | 75.37 | 9.421 | 241.4805 | 70458.2109 |
135
- | 34500 | 0.3485 | 1000.1384 | 20482.3652 | 3.9910 | 13.2497 | 75.473 | 9.434 | 240.4448 | 70458.2109 |
136
- | 35000 | 0.3535 | 998.5901 | 20493.9121 | 3.9906 | 13.2197 | 75.645 | 9.456 | 239.3936 | 70383.0625 |
137
- | 35500 | 0.3586 | 995.2692 | 20499.6973 | 3.9906 | 13.1934 | 75.795 | 9.474 | 237.5405 | 70195.5703 |
138
- | 36000 | 0.3636 | 1000.7585 | 20482.3652 | 3.9908 | 13.2099 | 75.701 | 9.463 | 240.6436 | 70420.6562 |
139
- | 36500 | 0.3687 | 1003.2421 | 20482.3652 | 3.9911 | 13.3105 | 75.128 | 9.391 | 241.1414 | 70458.2109 |
140
- | 37000 | 0.3737 | 1001.8443 | 20493.9121 | 3.9911 | 13.2361 | 75.551 | 9.444 | 241.0020 | 70420.6562 |
141
- | 37500 | 0.3788 | 1001.8443 | 20493.9121 | 3.9911 | 13.2705 | 75.355 | 9.419 | 241.0020 | 70420.6562 |
142
- | 38000 | 0.3838 | 1003.7086 | 20482.3652 | 3.9910 | 13.1558 | 76.012 | 9.502 | 241.1614 | 70458.2109 |
143
- | 38500 | 0.3889 | 999.5186 | 20493.9121 | 3.9908 | 13.192 | 75.804 | 9.475 | 239.8492 | 70420.6562 |
144
- | 39000 | 0.3939 | 999.5186 | 20493.9121 | 3.9908 | 13.2403 | 75.527 | 9.441 | 240.0079 | 70420.6562 |
145
- | 39500 | 0.3990 | 1005.8882 | 20488.1465 | 3.9910 | 13.1678 | 75.943 | 9.493 | 241.6403 | 70458.2109 |
146
- | 40000 | 0.4040 | 999.5186 | 20493.9121 | 3.9908 | 13.3426 | 74.948 | 9.369 | 239.9087 | 70458.2109 |
147
- | 40500 | 0.4091 | 999.2094 | 20493.9121 | 3.9908 | 13.2985 | 75.196 | 9.4 | 239.6906 | 70383.0625 |
148
- | 41000 | 0.4141 | 1005.8882 | 20488.1465 | 3.9910 | 13.2757 | 75.325 | 9.416 | 241.6403 | 70458.2109 |
149
- | 41500 | 0.4192 | 1007.5261 | 20491.0391 | 3.9914 | 13.2637 | 75.393 | 9.424 | 242.2403 | 70458.2109 |
150
- | 42000 | 0.4242 | 1002.6209 | 20482.3652 | 3.9910 | 13.1078 | 76.29 | 9.536 | 241.1016 | 70458.2109 |
151
- | 42500 | 0.4293 | 998.1260 | 20517.0273 | 3.9905 | 13.1146 | 76.251 | 9.531 | 239.2946 | 70383.0625 |
152
- | 43000 | 0.4343 | 995.4234 | 20499.6973 | 3.9908 | 13.2198 | 75.644 | 9.456 | 237.5013 | 70270.5156 |
153
- | 43500 | 0.4394 | 998.1260 | 20493.9121 | 3.9908 | 13.222 | 75.631 | 9.454 | 239.2946 | 70383.0625 |
154
- | 44000 | 0.4444 | 1002.6209 | 20482.3652 | 3.9910 | 13.1082 | 76.288 | 9.536 | 241.1614 | 70458.2109 |
155
- | 44500 | 0.4495 | 1001.2239 | 20482.3652 | 3.9910 | 13.2651 | 75.386 | 9.423 | 240.8426 | 70458.2109 |
156
- | 45000 | 0.4545 | 998.8997 | 20493.9121 | 3.9905 | 13.2549 | 75.444 | 9.43 | 239.5123 | 70383.0625 |
157
- | 45500 | 0.4596 | 1000.4484 | 20493.9121 | 3.9909 | 13.1817 | 75.863 | 9.483 | 240.0079 | 70383.0625 |
158
- | 46000 | 0.4646 | 1008.3849 | 20491.0391 | 3.9909 | 13.177 | 75.89 | 9.486 | 242.4205 | 70458.2109 |
159
- | 46500 | 0.4697 | 998.2806 | 20493.9121 | 3.9908 | 13.1683 | 75.94 | 9.492 | 239.3541 | 70383.0625 |
160
- | 47000 | 0.4747 | 999.8284 | 20493.9121 | 3.9906 | 13.2304 | 75.584 | 9.448 | 240.0872 | 70383.0625 |
161
- | 47500 | 0.4798 | 1006.2000 | 20502.5723 | 3.9909 | 13.1925 | 75.801 | 9.475 | 241.6403 | 70458.2109 |
162
- | 48000 | 0.4848 | 998.1260 | 20517.0273 | 3.9909 | 13.1673 | 75.946 | 9.493 | 239.3145 | 70383.0625 |
163
- | 48500 | 0.4899 | 134.8255 | 30153.1016 | 2.7420 | 13.1443 | 76.078 | 9.51 | 11.1570 | 235067.75 |
164
- | 49000 | 0.4949 | 128.3738 | 24338.1211 | 2.6060 | 13.2042 | 75.733 | 9.467 | 11.0835 | 188877.9219 |
165
- | 49500 | 0.5 | 148.0629 | 18772.1543 | 2.4684 | 13.2334 | 75.566 | 9.446 | 14.6732 | 138973.7031 |
166
- | 50000 | 0.5051 | 156.5679 | 16792.9102 | 2.4576 | 13.2265 | 75.606 | 9.451 | 16.5473 | 114776.9922 |
167
- | 50500 | 0.5101 | 153.5055 | 17701.2773 | 2.4520 | 13.1908 | 75.81 | 9.476 | 15.9536 | 120102.6562 |
168
- | 51000 | 0.5152 | 152.0734 | 17676.3613 | 2.4519 | 13.1825 | 75.858 | 9.482 | 15.7641 | 118195.3203 |
169
- | 51500 | 0.5202 | 152.6280 | 17636.5723 | 2.4518 | 13.1021 | 76.324 | 9.54 | 15.8917 | 118700.9297 |
170
- | 52000 | 0.5253 | 151.8144 | 17495.5352 | 2.4516 | 13.1172 | 76.236 | 9.529 | 15.7746 | 117315.6719 |
171
- | 52500 | 0.5303 | 152.1971 | 17532.5312 | 2.4516 | 13.1603 | 75.986 | 9.498 | 15.8268 | 118258.4609 |
172
- | 53000 | 0.5354 | 152.6162 | 17552.3066 | 2.4518 | 13.0995 | 76.339 | 9.542 | 15.8963 | 118321.5156 |
173
- | 53500 | 0.5404 | 152.0969 | 17641.5352 | 2.4519 | 13.1548 | 76.018 | 9.502 | 15.7628 | 119272.3828 |
174
- | 54000 | 0.5455 | 152.1971 | 17581.9941 | 2.4516 | 13.1914 | 75.807 | 9.476 | 15.8170 | 118258.4609 |
175
- | 54500 | 0.5505 | 152.1440 | 17581.9941 | 2.4516 | 13.1116 | 76.269 | 9.534 | 15.7967 | 118258.4609 |
176
- | 55000 | 0.5556 | 152.3032 | 17631.5938 | 2.4519 | 13.162 | 75.976 | 9.497 | 15.7961 | 118954.5469 |
177
- | 55500 | 0.5606 | 151.3857 | 17495.5352 | 2.4518 | 13.1938 | 75.793 | 9.474 | 15.7173 | 117159.2578 |
178
- | 56000 | 0.5657 | 152.5571 | 17581.9941 | 2.4518 | 13.1714 | 75.922 | 9.49 | 15.8812 | 118574.3203 |
179
- | 56500 | 0.5707 | 151.2334 | 17505.3984 | 2.4515 | 13.108 | 76.289 | 9.536 | 15.6907 | 116971.9219 |
180
- | 57000 | 0.5758 | 152.5334 | 17542.4160 | 2.4516 | 13.1795 | 75.875 | 9.484 | 15.8891 | 118258.4609 |
181
- | 57500 | 0.5808 | 152.4154 | 17611.7480 | 2.4519 | 13.2191 | 75.648 | 9.456 | 15.8412 | 118764.3359 |
182
- | 58000 | 0.5859 | 152.0144 | 17537.4824 | 2.4516 | 13.1568 | 76.006 | 9.501 | 15.7739 | 118069.25 |
183
- | 58500 | 0.5909 | 152.4980 | 17542.4160 | 2.4516 | 13.2239 | 75.621 | 9.453 | 15.8773 | 118258.4609 |
184
- | 59000 | 0.5960 | 151.9203 | 17552.3066 | 2.4518 | 13.2703 | 75.356 | 9.42 | 15.7687 | 118321.5156 |
185
- | 59500 | 0.6010 | 152.4862 | 17621.6758 | 2.4516 | 13.1748 | 75.903 | 9.488 | 15.8825 | 118574.3203 |
186
- | 60000 | 0.6061 | 152.3976 | 17601.8242 | 2.4518 | 13.2989 | 75.194 | 9.399 | 15.8163 | 118511.125 |
187
- | 60500 | 0.6111 | 152.4862 | 17581.9941 | 2.4518 | 13.2397 | 75.53 | 9.441 | 15.8727 | 118764.3359 |
188
- | 61000 | 0.6162 | 151.4678 | 17525.1270 | 2.4516 | 13.1482 | 76.056 | 9.507 | 15.7264 | 118258.4609 |
189
- | 61500 | 0.6212 | 152.4154 | 17562.2012 | 2.4516 | 13.1563 | 76.009 | 9.501 | 15.8595 | 118511.125 |
190
- | 62000 | 0.6263 | 152.8055 | 17552.3066 | 2.4516 | 13.1118 | 76.267 | 9.533 | 15.9160 | 118447.8516 |
191
- | 62500 | 0.6313 | 152.0144 | 17581.9941 | 2.4519 | 13.1961 | 75.78 | 9.473 | 15.7661 | 118447.8516 |
192
- | 63000 | 0.6364 | 152.7227 | 17621.6758 | 2.4516 | 13.2102 | 75.699 | 9.462 | 15.8904 | 118637.6641 |
193
- | 63500 | 0.6414 | 152.1971 | 17572.1035 | 2.4516 | 13.3278 | 75.031 | 9.379 | 15.8111 | 118321.5156 |
194
- | 64000 | 0.6465 | 152.0969 | 17507.8535 | 2.4516 | 13.2988 | 75.195 | 9.399 | 15.8144 | 118195.3203 |
195
- | 64500 | 0.6515 | 152.8173 | 17636.5723 | 2.4519 | 13.2059 | 75.724 | 9.465 | 15.9147 | 119018.0938 |
196
- | 65000 | 0.6566 | 151.9673 | 17562.2012 | 2.4519 | 13.3332 | 75.001 | 9.375 | 15.7674 | 117503.6719 |
197
- | 65500 | 0.6616 | 152.4862 | 17631.5938 | 2.4518 | 13.2498 | 75.473 | 9.434 | 15.8779 | 118447.8516 |
198
- | 66000 | 0.6667 | 152.0144 | 17542.4160 | 2.4518 | 13.1884 | 75.824 | 9.478 | 15.7967 | 117754.7266 |
199
- | 66500 | 0.6717 | 152.4862 | 17552.3066 | 2.4516 | 13.2577 | 75.428 | 9.429 | 15.8845 | 118511.125 |
200
- | 67000 | 0.6768 | 152.4626 | 17552.3066 | 2.4516 | 13.294 | 75.222 | 9.403 | 15.8759 | 118511.125 |
201
- | 67500 | 0.6818 | 152.2089 | 17581.9941 | 2.4518 | 13.1401 | 76.103 | 9.513 | 15.8118 | 118637.6641 |
202
- | 68000 | 0.6869 | 151.8967 | 17581.9941 | 2.4519 | 13.2487 | 75.479 | 9.435 | 15.7505 | 118321.5156 |
203
- | 68500 | 0.6919 | 152.5098 | 17581.9941 | 2.4519 | 13.1358 | 76.128 | 9.516 | 15.8792 | 118764.3359 |
204
- | 69000 | 0.6970 | 152.1204 | 17572.1035 | 2.4516 | 13.1347 | 76.134 | 9.517 | 15.7889 | 118258.4609 |
205
- | 69500 | 0.7020 | 152.4390 | 17572.1035 | 2.4516 | 13.3358 | 74.986 | 9.373 | 15.8635 | 117943.3203 |
206
- | 70000 | 0.7071 | 152.4095 | 17581.9941 | 2.4516 | 13.2272 | 75.602 | 9.45 | 15.8464 | 118258.4609 |
207
- | 70500 | 0.7121 | 152.1971 | 17581.9941 | 2.4518 | 13.2901 | 75.244 | 9.405 | 15.8007 | 118321.5156 |
208
- | 71000 | 0.7172 | 152.4744 | 17581.9941 | 2.4519 | 13.2844 | 75.276 | 9.41 | 15.8681 | 118574.3203 |
209
- | 71500 | 0.7222 | 152.4862 | 17661.4316 | 2.4520 | 13.1669 | 75.948 | 9.494 | 15.8517 | 119145.1719 |
210
- | 72000 | 0.7273 | 152.4390 | 17621.6758 | 2.4518 | 13.1698 | 75.931 | 9.491 | 15.8615 | 118511.125 |
211
- | 72500 | 0.7323 | 152.4095 | 17581.9941 | 2.4518 | 13.2893 | 75.249 | 9.406 | 15.8464 | 118321.5156 |
212
- | 73000 | 0.7374 | 151.8967 | 17517.7246 | 2.4518 | 13.2449 | 75.501 | 9.438 | 15.7661 | 118258.4609 |
213
- | 73500 | 0.7424 | 152.1499 | 17532.5312 | 2.4516 | 13.2259 | 75.609 | 9.451 | 15.8268 | 117880.4609 |
214
- | 74000 | 0.7475 | 152.3032 | 17572.1035 | 2.4516 | 13.239 | 75.535 | 9.442 | 15.8320 | 118069.25 |
215
- | 74500 | 0.7525 | 152.7108 | 17581.9941 | 2.4516 | 13.2845 | 75.276 | 9.409 | 15.8891 | 118447.8516 |
216
- | 75000 | 0.7576 | 152.4862 | 17646.5156 | 2.4519 | 13.2765 | 75.321 | 9.415 | 15.8681 | 118891.1484 |
217
- | 75500 | 0.7626 | 152.1499 | 17601.8242 | 2.4519 | 13.2882 | 75.255 | 9.407 | 15.7863 | 118511.125 |
218
- | 76000 | 0.7677 | 152.1440 | 17581.9941 | 2.4518 | 13.3023 | 75.175 | 9.397 | 15.7863 | 118384.7266 |
219
- | 76500 | 0.7727 | 152.2089 | 17581.9941 | 2.4518 | 13.2007 | 75.754 | 9.469 | 15.7980 | 118384.7266 |
220
- | 77000 | 0.7778 | 152.1440 | 17611.7480 | 2.4518 | 13.1838 | 75.851 | 9.481 | 15.7863 | 118447.8516 |
221
- | 77500 | 0.7828 | 151.8967 | 17572.1035 | 2.4516 | 13.2756 | 75.326 | 9.416 | 15.7550 | 118258.4609 |
222
- | 78000 | 0.7879 | 152.3976 | 17581.9941 | 2.4518 | 13.1946 | 75.789 | 9.474 | 15.8255 | 118384.7266 |
223
- | 78500 | 0.7929 | 151.8497 | 17581.9941 | 2.4518 | 13.2414 | 75.521 | 9.44 | 15.7472 | 118447.8516 |
224
- | 79000 | 0.7980 | 151.6968 | 17581.9941 | 2.4518 | 13.225 | 75.615 | 9.452 | 15.7374 | 118447.8516 |
225
- | 79500 | 0.8030 | 152.1440 | 17581.9941 | 2.4518 | 13.247 | 75.489 | 9.436 | 15.7863 | 118321.5156 |
226
- | 80000 | 0.8081 | 152.4390 | 17611.7480 | 2.4519 | 13.2043 | 75.733 | 9.467 | 15.8589 | 118574.3203 |
227
- | 80500 | 0.8131 | 152.4390 | 17641.5352 | 2.4519 | 13.2656 | 75.383 | 9.423 | 15.8504 | 118764.3359 |
228
- | 81000 | 0.8182 | 152.0144 | 17601.8242 | 2.4519 | 13.2737 | 75.337 | 9.417 | 15.7674 | 118321.5156 |
229
- | 81500 | 0.8232 | 152.1440 | 17611.7480 | 2.4519 | 13.2567 | 75.434 | 9.429 | 15.7824 | 118447.8516 |
230
- | 82000 | 0.8283 | 151.6146 | 17581.9941 | 2.4518 | 13.1554 | 76.015 | 9.502 | 15.7329 | 118258.4609 |
231
- | 82500 | 0.8333 | 152.0734 | 17572.1035 | 2.4518 | 13.2061 | 75.722 | 9.465 | 15.7700 | 118321.5156 |
232
- | 83000 | 0.8384 | 152.2561 | 17562.2012 | 2.4518 | 13.0615 | 76.561 | 9.57 | 15.8007 | 118447.8516 |
233
- | 83500 | 0.8434 | 152.4390 | 17581.9941 | 2.4518 | 13.1488 | 76.052 | 9.507 | 15.8648 | 118637.6641 |
234
- | 84000 | 0.8485 | 151.9673 | 17572.1035 | 2.4518 | 13.1842 | 75.848 | 9.481 | 15.7700 | 118447.8516 |
235
- | 84500 | 0.8535 | 152.2561 | 17581.9941 | 2.4518 | 13.1531 | 76.028 | 9.503 | 15.8046 | 118511.125 |
236
- | 85000 | 0.8586 | 151.5383 | 17559.7227 | 2.4518 | 13.0464 | 76.65 | 9.581 | 15.7303 | 118069.25 |
237
- | 85500 | 0.8636 | 152.3976 | 17572.1035 | 2.4518 | 13.082 | 76.441 | 9.555 | 15.8203 | 118195.3203 |
238
- | 86000 | 0.8687 | 151.8497 | 17572.1035 | 2.4518 | 13.1831 | 75.854 | 9.482 | 15.7537 | 118258.4609 |
239
- | 86500 | 0.8737 | 151.8497 | 17581.9941 | 2.4518 | 13.1182 | 76.23 | 9.529 | 15.7537 | 118321.5156 |
240
- | 87000 | 0.8788 | 151.7203 | 17581.9941 | 2.4518 | 13.1367 | 76.123 | 9.515 | 15.7452 | 118258.4609 |
241
- | 87500 | 0.8838 | 151.8497 | 17581.9941 | 2.4516 | 13.217 | 75.66 | 9.458 | 15.7498 | 118258.4609 |
242
- | 88000 | 0.8889 | 152.0615 | 17581.9941 | 2.4518 | 13.1568 | 76.006 | 9.501 | 15.7700 | 118384.7266 |
243
- | 88500 | 0.8939 | 152.2442 | 17581.9941 | 2.4518 | 13.1319 | 76.15 | 9.519 | 15.7961 | 118384.7266 |
244
- | 89000 | 0.8990 | 152.1971 | 17581.9941 | 2.4518 | 13.0928 | 76.378 | 9.547 | 15.7974 | 118447.8516 |
245
- | 89500 | 0.9040 | 151.7203 | 17562.2012 | 2.4518 | 13.1415 | 76.095 | 9.512 | 15.7433 | 118447.8516 |
246
- | 90000 | 0.9091 | 151.8497 | 17572.1035 | 2.4518 | 13.1841 | 75.849 | 9.481 | 15.7537 | 118447.8516 |
247
- | 90500 | 0.9141 | 151.9673 | 17581.9941 | 2.4518 | 13.1491 | 76.051 | 9.506 | 15.7628 | 118384.7266 |
248
- | 91000 | 0.9192 | 151.8497 | 17572.1035 | 2.4516 | 13.1755 | 75.899 | 9.487 | 15.7537 | 118258.4609 |
249
- | 91500 | 0.9242 | 151.6498 | 17572.1035 | 2.4518 | 13.0781 | 76.464 | 9.558 | 15.7413 | 118258.4609 |
250
- | 92000 | 0.9293 | 151.6146 | 17549.8301 | 2.4518 | 13.1794 | 75.876 | 9.485 | 15.7400 | 118132.3203 |
251
- | 92500 | 0.9343 | 151.9673 | 17572.1035 | 2.4516 | 13.1589 | 75.994 | 9.499 | 15.7661 | 118258.4609 |
252
- | 93000 | 0.9394 | 152.0615 | 17572.1035 | 2.4518 | 13.1847 | 75.846 | 9.481 | 15.7791 | 118258.4609 |
253
- | 93500 | 0.9444 | 151.9673 | 17572.1035 | 2.4516 | 13.298 | 75.199 | 9.4 | 15.7628 | 118258.4609 |
254
- | 94000 | 0.9495 | 151.8497 | 17552.3066 | 2.4516 | 13.2436 | 75.508 | 9.439 | 15.7537 | 118132.3203 |
255
- | 94500 | 0.9545 | 151.8967 | 17552.3066 | 2.4516 | 13.1788 | 75.88 | 9.485 | 15.7583 | 118195.3203 |
256
- | 95000 | 0.9596 | 151.8967 | 17552.3066 | 2.4518 | 13.173 | 75.913 | 9.489 | 15.7596 | 118258.4609 |
257
- | 95500 | 0.9646 | 151.8967 | 17562.2012 | 2.4516 | 13.2802 | 75.3 | 9.413 | 15.7583 | 118258.4609 |
258
- | 96000 | 0.9697 | 152.0615 | 17572.1035 | 2.4516 | 13.2413 | 75.522 | 9.44 | 15.7713 | 118258.4609 |
259
- | 96500 | 0.9747 | 152.1440 | 17581.9941 | 2.4516 | 13.1021 | 76.324 | 9.54 | 15.7876 | 118321.5156 |
260
- | 97000 | 0.9798 | 152.1440 | 17581.9941 | 2.4516 | 13.1182 | 76.23 | 9.529 | 15.7922 | 118321.5156 |
261
- | 97500 | 0.9848 | 152.1971 | 17581.9941 | 2.4518 | 13.2024 | 75.744 | 9.468 | 15.7974 | 118321.5156 |
262
- | 98000 | 0.9899 | 152.1499 | 17581.9941 | 2.4518 | 13.1624 | 75.974 | 9.497 | 15.7961 | 118321.5156 |
263
- | 98500 | 0.9949 | 152.1499 | 17581.9941 | 2.4518 | 13.1912 | 75.808 | 9.476 | 15.7935 | 118321.5156 |
264
- | 99000 | 1.0 | 152.1499 | 17581.9941 | 2.4518 | 13.0987 | 76.344 | 9.543 | 15.7935 | 118321.5156 |
265
 
266
  ### Framework versions
267
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 28257.9004
19
+ - eval_frwikippl: 63896.6680
20
+ - eval_zhwikippl: 90059.6875
21
+ - eval_tinystoriesppl: 18426.4922
22
+ - eval_loss: 6.6740
23
+ - eval_runtime: 13.137
24
+ - eval_samples_per_second: 76.121
25
+ - eval_steps_per_second: 9.515
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
+ - learning_rate: 4e-05
51
+ - train_batch_size: 8
52
  - eval_batch_size: 8
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: constant
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
+ Peak GPU Memory: 8.0568 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
+ | 0 | 0 | 35507.3906 | 70936.2969 | 6.875 | 13.2774 | 75.316 | 9.414 | 24370.3125 | 92840.9844 |
66
+ | 500 | 0.0404 | 28284.1875 | 63896.6680 | 6.6737 | 13.1884 | 75.824 | 9.478 | 18447.8379 | 90059.6875 |
67
+ | 1000 | 0.0808 | 28284.1875 | 63896.6680 | 6.6740 | 13.221 | 75.637 | 9.455 | 18444.7754 | 90059.6875 |
68
+ | 1500 | 0.1212 | 28284.1875 | 63896.6680 | 6.6740 | 13.1643 | 75.963 | 9.495 | 18444.7754 | 90059.6875 |
69
+ | 2000 | 0.1616 | 28284.1875 | 63896.6680 | 6.6740 | 13.2331 | 75.568 | 9.446 | 18438.6914 | 90059.6875 |
70
+ | 2500 | 0.2020 | 28284.1875 | 63896.6680 | 6.6740 | 13.1865 | 75.835 | 9.479 | 18432.5898 | 90059.6875 |
71
+ | 3000 | 0.2424 | 28257.9004 | 63896.6680 | 6.6740 | 13.246 | 75.494 | 9.437 | 18426.4922 | 90059.6875 |
72
+ | 3500 | 0.2828 | 28257.9004 | 63896.6680 | 6.6740 | 13.1762 | 75.895 | 9.487 | 18426.4922 | 90059.6875 |
73
+ | 4000 | 0.3232 | 28257.9004 | 63896.6680 | 6.6740 | 13.3585 | 74.859 | 9.357 | 18426.4922 | 90059.6875 |
74
+ | 4500 | 0.3636 | 28257.9004 | 63896.6680 | 6.6740 | 13.1842 | 75.848 | 9.481 | 18426.4922 | 90059.6875 |
75
+ | 5000 | 0.4040 | 28257.9004 | 63896.6680 | 6.6740 | 13.2694 | 75.361 | 9.42 | 18426.4922 | 90059.6875 |
76
+ | 5500 | 0.4444 | 28257.9004 | 63896.6680 | 6.6740 | 13.2102 | 75.699 | 9.462 | 18426.4922 | 90059.6875 |
77
+ | 6000 | 0.4848 | 28257.9004 | 63896.6680 | 6.6740 | 13.3012 | 75.181 | 9.398 | 18426.4922 | 90059.6875 |
78
+ | 6500 | 0.5253 | 28257.9004 | 63896.6680 | 6.6740 | 13.1704 | 75.928 | 9.491 | 18426.4922 | 90059.6875 |
79
+ | 7000 | 0.5657 | 28257.9004 | 63896.6680 | 6.6740 | 13.2236 | 75.622 | 9.453 | 18426.4922 | 90059.6875 |
80
+ | 7500 | 0.6061 | 28257.9004 | 63896.6680 | 6.6740 | 13.2333 | 75.567 | 9.446 | 18426.4922 | 90059.6875 |
81
+ | 8000 | 0.6465 | 28257.9004 | 63896.6680 | 6.6740 | 13.1385 | 76.112 | 9.514 | 18426.4922 | 90059.6875 |
82
+ | 8500 | 0.6869 | 28257.9004 | 63896.6680 | 6.6740 | 13.2297 | 75.588 | 9.448 | 18426.4922 | 90059.6875 |
83
+ | 9000 | 0.7273 | 28257.9004 | 63896.6680 | 6.6740 | 13.1073 | 76.293 | 9.537 | 18426.4922 | 90059.6875 |
84
+ | 9500 | 0.7677 | 28257.9004 | 63896.6680 | 6.6740 | 13.137 | 76.121 | 9.515 | 18426.4922 | 90059.6875 |
85
+ | 10000 | 0.8081 | 28257.9004 | 63896.6680 | 6.6740 | 13.0862 | 76.417 | 9.552 | 18426.4922 | 90059.6875 |
86
+ | 10500 | 0.8485 | 28257.9004 | 63896.6680 | 6.6740 | 13.17 | 75.93 | 9.491 | 18426.4922 | 90059.6875 |
87
+ | 11000 | 0.8889 | 28257.9004 | 63896.6680 | 6.6740 | 13.211 | 75.694 | 9.462 | 18426.4922 | 90059.6875 |
88
+ | 11500 | 0.9293 | 28257.9004 | 63896.6680 | 6.6740 | 13.1171 | 76.237 | 9.53 | 18426.4922 | 90059.6875 |
89
+ | 12000 | 0.9697 | 28257.9004 | 63896.6680 | 6.6740 | 13.2484 | 75.481 | 9.435 | 18426.4922 | 90059.6875 |
90
+ | 12375 | 1.0 | 28257.9004 | 63896.6680 | 6.6740 | 13.2116 | 75.691 | 9.461 | 18426.4922 | 90059.6875 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
92
  ### Framework versions
93
  - Distily 0.2.0
logs/events.out.tfevents.1723989018.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:438ef4a30b0b2593692b1ac3a19f0c743ee2bcc791e763b8c5b8b0fdf520d3cf
3
+ size 5859432
logs/events.out.tfevents.1723991806.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07761d2e0f9002fc11b646370a66d7fe112ff8591fb441f5a6e457b7ee19ef1b
3
+ size 307
logs/learning_rate=0.001, lr_scheduler_type=linear, per_device_train_batch_size=1, warmup_ratio=0.5/completed.flag ADDED
File without changes
logs/learning_rate=0.001, lr_scheduler_type=linear, per_device_train_batch_size=1, warmup_ratio=0.5/events.out.tfevents.1723988864.93d6cbb3ad53 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7e2b4340121398f18458f794194962de6e0e2c843e47716e1d3b8bfd4edca670
3
- size 312
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d5301c950a082d8b6f48b16a8d2cf8ce680f5fef2934e2ec0f7f9b6c8b998a7
3
+ size 588
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c9fca5f7c327628cb96b57c1d7e2688dbe0324ab9bb072927b72b7bac7ea310
3
  size 137033984
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e728acb318ba05b5f36bf136d346ef1f3ce69b7247fe37b21b6097691677009e
3
  size 137033984
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:227089a157ab33c7a2001ae05505709a997e85675ec0725f7eafd19b8e859138
3
- size 1017948104
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e61b0b87b1c473f9bd2bbb0edd556da12cbc4e1b0589ef638d5b437883b91e4d
3
+ size 1017947976