End of training
Browse files- README.md +38 -212
- logs/events.out.tfevents.1723989018.93d6cbb3ad53 +3 -0
- logs/events.out.tfevents.1723991806.93d6cbb3ad53 +3 -0
- logs/learning_rate=0.001, lr_scheduler_type=linear, per_device_train_batch_size=1, warmup_ratio=0.5/completed.flag +0 -0
- logs/learning_rate=0.001, lr_scheduler_type=linear, per_device_train_batch_size=1, warmup_ratio=0.5/events.out.tfevents.1723988864.93d6cbb3ad53 +2 -2
- model.safetensors +1 -1
- training_args.bin +2 -2
README.md
CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
-
- eval_enwikippl:
|
19 |
-
- eval_frwikippl:
|
20 |
-
- eval_zhwikippl:
|
21 |
-
- eval_tinystoriesppl:
|
22 |
-
- eval_loss:
|
23 |
-
- eval_runtime: 13.
|
24 |
-
- eval_samples_per_second: 76.
|
25 |
-
- eval_steps_per_second: 9.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -47,221 +47,47 @@ More information needed
|
|
47 |
The following hyperparameters were used during training:
|
48 |
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
|
49 |
- train_embeddings: True
|
50 |
-
- learning_rate:
|
51 |
-
- train_batch_size:
|
52 |
- eval_batch_size: 8
|
53 |
- seed: 42
|
54 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
55 |
-
- lr_scheduler_type:
|
56 |
-
- lr_scheduler_warmup_ratio: 0.5
|
57 |
- num_epochs: 1.0
|
58 |
|
59 |
### Resource Usage
|
60 |
-
Peak GPU Memory:
|
61 |
|
62 |
### Eval-Phase Metrics
|
63 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
64 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
65 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
66 |
-
| 0 | 0 | 35507.3906 | 70936.2969 | 6.875 | 13.
|
67 |
-
| 500 | 0.
|
68 |
-
| 1000 | 0.
|
69 |
-
| 1500 | 0.
|
70 |
-
| 2000 | 0.
|
71 |
-
| 2500 | 0.
|
72 |
-
| 3000 | 0.
|
73 |
-
| 3500 | 0.
|
74 |
-
| 4000 | 0.
|
75 |
-
| 4500 | 0.
|
76 |
-
| 5000 | 0.
|
77 |
-
| 5500 | 0.
|
78 |
-
| 6000 | 0.
|
79 |
-
| 6500 | 0.
|
80 |
-
| 7000 | 0.
|
81 |
-
| 7500 | 0.
|
82 |
-
| 8000 | 0.
|
83 |
-
| 8500 | 0.
|
84 |
-
| 9000 | 0.
|
85 |
-
| 9500 | 0.
|
86 |
-
| 10000 | 0.
|
87 |
-
| 10500 | 0.
|
88 |
-
| 11000 | 0.
|
89 |
-
| 11500 | 0.
|
90 |
-
| 12000 | 0.
|
91 |
-
|
|
92 |
-
| 13000 | 0.1313 | 5977.7178 | 34346.8438 | 5.3317 | 13.1847 | 75.846 | 9.481 | 2560.9731 | 76717.3984 |
|
93 |
-
| 13500 | 0.1364 | 5907.2861 | 34288.8164 | 5.3315 | 13.1219 | 76.208 | 9.526 | 2511.0803 | 76085.4688 |
|
94 |
-
| 14000 | 0.1414 | 5843.1099 | 34264.6914 | 5.3313 | 13.205 | 75.729 | 9.466 | 2471.9480 | 75923.25 |
|
95 |
-
| 14500 | 0.1465 | 5821.4233 | 34245.3828 | 5.3317 | 13.1464 | 76.066 | 9.508 | 2465.4172 | 75923.25 |
|
96 |
-
| 15000 | 0.1515 | 5817.8213 | 34245.3828 | 5.3313 | 13.1136 | 76.257 | 9.532 | 2460.1233 | 75963.8125 |
|
97 |
-
| 15500 | 0.1566 | 5817.8213 | 34245.3828 | 5.3315 | 13.1772 | 75.888 | 9.486 | 2461.3447 | 75963.8125 |
|
98 |
-
| 16000 | 0.1616 | 5817.8213 | 34245.3828 | 5.3315 | 13.1421 | 76.091 | 9.511 | 2456.8723 | 75923.25 |
|
99 |
-
| 16500 | 0.1667 | 5817.8213 | 34245.3828 | 5.3315 | 13.1288 | 76.168 | 9.521 | 2457.2778 | 75923.25 |
|
100 |
-
| 17000 | 0.1717 | 5844.9214 | 34284.0078 | 5.3313 | 13.1263 | 76.183 | 9.523 | 2472.3560 | 75923.25 |
|
101 |
-
| 17500 | 0.1768 | 5817.8213 | 34245.3828 | 5.3313 | 13.2973 | 75.203 | 9.4 | 2458.9036 | 75923.25 |
|
102 |
-
| 18000 | 0.1818 | 5817.8213 | 34226.0898 | 5.3313 | 13.2828 | 75.285 | 9.411 | 2456.0596 | 75882.7891 |
|
103 |
-
| 18500 | 0.1869 | 5789.9414 | 34187.5625 | 5.3313 | 13.2709 | 75.353 | 9.419 | 2442.6958 | 75882.7891 |
|
104 |
-
| 19000 | 0.1919 | 5794.4326 | 34187.5625 | 5.3315 | 13.2571 | 75.431 | 9.429 | 2446.3337 | 75882.7891 |
|
105 |
-
| 19500 | 0.1970 | 5789.9414 | 34187.5625 | 5.3315 | 13.1829 | 75.856 | 9.482 | 2442.6958 | 75882.7891 |
|
106 |
-
| 20000 | 0.2020 | 5773.3726 | 34129.8047 | 5.3313 | 13.2846 | 75.275 | 9.409 | 2428.6021 | 75882.7891 |
|
107 |
-
| 20500 | 0.2071 | 5777.3989 | 34110.6055 | 5.3315 | 13.1517 | 76.036 | 9.504 | 2432.2180 | 75882.7891 |
|
108 |
-
| 21000 | 0.2121 | 5789.9414 | 34168.2969 | 5.3313 | 13.2686 | 75.366 | 9.421 | 2441.8887 | 75882.7891 |
|
109 |
-
| 21500 | 0.2172 | 5793.5317 | 34187.5625 | 5.3313 | 13.2456 | 75.497 | 9.437 | 2445.1208 | 75882.7891 |
|
110 |
-
| 22000 | 0.2222 | 5817.8213 | 34245.3828 | 5.3317 | 13.1014 | 76.328 | 9.541 | 2458.9036 | 75923.25 |
|
111 |
-
| 22500 | 0.2273 | 5844.9214 | 34264.6914 | 5.3313 | 13.2219 | 75.632 | 9.454 | 2471.9480 | 75963.8125 |
|
112 |
-
| 23000 | 0.2323 | 5821.4233 | 34245.3828 | 5.3315 | 13.1885 | 75.823 | 9.478 | 2463.3794 | 75963.8125 |
|
113 |
-
| 23500 | 0.2374 | 5794.4326 | 34187.5625 | 5.3313 | 13.1795 | 75.876 | 9.484 | 2446.3337 | 75882.7891 |
|
114 |
-
| 24000 | 0.2424 | 5789.9414 | 34187.5625 | 5.3313 | 13.2051 | 75.728 | 9.466 | 2441.0808 | 75882.7891 |
|
115 |
-
| 24500 | 0.2475 | 1149.3412 | 20949.2207 | 4.0070 | 13.3028 | 75.172 | 9.397 | 294.0351 | 75842.2734 |
|
116 |
-
| 25000 | 0.2525 | 1074.3506 | 20752.4609 | 3.9981 | 13.1631 | 75.97 | 9.496 | 267.3478 | 73570.2891 |
|
117 |
-
| 25500 | 0.2576 | 1029.8575 | 20511.2363 | 3.9924 | 13.1578 | 76.0 | 9.5 | 250.5292 | 71404.4219 |
|
118 |
-
| 26000 | 0.2626 | 1012.7686 | 20467.9531 | 3.9911 | 13.2528 | 75.456 | 9.432 | 244.4531 | 70684.1406 |
|
119 |
-
| 26500 | 0.2677 | 999.5186 | 20493.9121 | 3.9908 | 13.192 | 75.804 | 9.475 | 239.7501 | 70495.8516 |
|
120 |
-
| 27000 | 0.2727 | 998.8997 | 20493.9121 | 3.9908 | 13.2744 | 75.333 | 9.417 | 239.4926 | 70383.0625 |
|
121 |
-
| 27500 | 0.2778 | 1004.4867 | 20488.1465 | 3.9908 | 13.1485 | 76.054 | 9.507 | 241.4805 | 70420.6562 |
|
122 |
-
| 28000 | 0.2828 | 1006.2000 | 20491.0391 | 3.9909 | 13.0763 | 76.474 | 9.559 | 241.6403 | 70458.2109 |
|
123 |
-
| 28500 | 0.2879 | 996.3494 | 20505.4668 | 3.9909 | 13.1939 | 75.792 | 9.474 | 238.5442 | 70383.0625 |
|
124 |
-
| 29000 | 0.2929 | 998.1260 | 20505.4668 | 3.9906 | 13.1778 | 75.885 | 9.486 | 238.9192 | 70383.0625 |
|
125 |
-
| 29500 | 0.2980 | 997.6625 | 20499.6973 | 3.9908 | 13.213 | 75.683 | 9.46 | 238.7414 | 70307.9922 |
|
126 |
-
| 30000 | 0.3030 | 998.8997 | 20493.9121 | 3.9908 | 13.1411 | 76.097 | 9.512 | 239.5519 | 70383.0625 |
|
127 |
-
| 30500 | 0.3081 | 1001.5338 | 20493.9121 | 3.9908 | 13.153 | 76.028 | 9.504 | 240.7629 | 70420.6562 |
|
128 |
-
| 31000 | 0.3131 | 999.8284 | 20493.9121 | 3.9910 | 13.3036 | 75.168 | 9.396 | 239.9483 | 70420.6562 |
|
129 |
-
| 31500 | 0.3182 | 999.8284 | 20493.9121 | 3.9910 | 13.2387 | 75.536 | 9.442 | 239.9087 | 70420.6562 |
|
130 |
-
| 32000 | 0.3232 | 999.5186 | 20493.9121 | 3.9911 | 13.1503 | 76.044 | 9.506 | 240.0277 | 70420.6562 |
|
131 |
-
| 32500 | 0.3283 | 999.2094 | 20493.9121 | 3.9908 | 13.2235 | 75.623 | 9.453 | 239.6708 | 70420.6562 |
|
132 |
-
| 33000 | 0.3333 | 1001.5338 | 20482.3652 | 3.9911 | 13.2764 | 75.322 | 9.415 | 240.8028 | 70458.2109 |
|
133 |
-
| 33500 | 0.3384 | 1006.7457 | 20491.0391 | 3.9911 | 13.2517 | 75.462 | 9.433 | 242.1002 | 70495.8516 |
|
134 |
-
| 34000 | 0.3434 | 1004.4867 | 20488.1465 | 3.9909 | 13.2678 | 75.37 | 9.421 | 241.4805 | 70458.2109 |
|
135 |
-
| 34500 | 0.3485 | 1000.1384 | 20482.3652 | 3.9910 | 13.2497 | 75.473 | 9.434 | 240.4448 | 70458.2109 |
|
136 |
-
| 35000 | 0.3535 | 998.5901 | 20493.9121 | 3.9906 | 13.2197 | 75.645 | 9.456 | 239.3936 | 70383.0625 |
|
137 |
-
| 35500 | 0.3586 | 995.2692 | 20499.6973 | 3.9906 | 13.1934 | 75.795 | 9.474 | 237.5405 | 70195.5703 |
|
138 |
-
| 36000 | 0.3636 | 1000.7585 | 20482.3652 | 3.9908 | 13.2099 | 75.701 | 9.463 | 240.6436 | 70420.6562 |
|
139 |
-
| 36500 | 0.3687 | 1003.2421 | 20482.3652 | 3.9911 | 13.3105 | 75.128 | 9.391 | 241.1414 | 70458.2109 |
|
140 |
-
| 37000 | 0.3737 | 1001.8443 | 20493.9121 | 3.9911 | 13.2361 | 75.551 | 9.444 | 241.0020 | 70420.6562 |
|
141 |
-
| 37500 | 0.3788 | 1001.8443 | 20493.9121 | 3.9911 | 13.2705 | 75.355 | 9.419 | 241.0020 | 70420.6562 |
|
142 |
-
| 38000 | 0.3838 | 1003.7086 | 20482.3652 | 3.9910 | 13.1558 | 76.012 | 9.502 | 241.1614 | 70458.2109 |
|
143 |
-
| 38500 | 0.3889 | 999.5186 | 20493.9121 | 3.9908 | 13.192 | 75.804 | 9.475 | 239.8492 | 70420.6562 |
|
144 |
-
| 39000 | 0.3939 | 999.5186 | 20493.9121 | 3.9908 | 13.2403 | 75.527 | 9.441 | 240.0079 | 70420.6562 |
|
145 |
-
| 39500 | 0.3990 | 1005.8882 | 20488.1465 | 3.9910 | 13.1678 | 75.943 | 9.493 | 241.6403 | 70458.2109 |
|
146 |
-
| 40000 | 0.4040 | 999.5186 | 20493.9121 | 3.9908 | 13.3426 | 74.948 | 9.369 | 239.9087 | 70458.2109 |
|
147 |
-
| 40500 | 0.4091 | 999.2094 | 20493.9121 | 3.9908 | 13.2985 | 75.196 | 9.4 | 239.6906 | 70383.0625 |
|
148 |
-
| 41000 | 0.4141 | 1005.8882 | 20488.1465 | 3.9910 | 13.2757 | 75.325 | 9.416 | 241.6403 | 70458.2109 |
|
149 |
-
| 41500 | 0.4192 | 1007.5261 | 20491.0391 | 3.9914 | 13.2637 | 75.393 | 9.424 | 242.2403 | 70458.2109 |
|
150 |
-
| 42000 | 0.4242 | 1002.6209 | 20482.3652 | 3.9910 | 13.1078 | 76.29 | 9.536 | 241.1016 | 70458.2109 |
|
151 |
-
| 42500 | 0.4293 | 998.1260 | 20517.0273 | 3.9905 | 13.1146 | 76.251 | 9.531 | 239.2946 | 70383.0625 |
|
152 |
-
| 43000 | 0.4343 | 995.4234 | 20499.6973 | 3.9908 | 13.2198 | 75.644 | 9.456 | 237.5013 | 70270.5156 |
|
153 |
-
| 43500 | 0.4394 | 998.1260 | 20493.9121 | 3.9908 | 13.222 | 75.631 | 9.454 | 239.2946 | 70383.0625 |
|
154 |
-
| 44000 | 0.4444 | 1002.6209 | 20482.3652 | 3.9910 | 13.1082 | 76.288 | 9.536 | 241.1614 | 70458.2109 |
|
155 |
-
| 44500 | 0.4495 | 1001.2239 | 20482.3652 | 3.9910 | 13.2651 | 75.386 | 9.423 | 240.8426 | 70458.2109 |
|
156 |
-
| 45000 | 0.4545 | 998.8997 | 20493.9121 | 3.9905 | 13.2549 | 75.444 | 9.43 | 239.5123 | 70383.0625 |
|
157 |
-
| 45500 | 0.4596 | 1000.4484 | 20493.9121 | 3.9909 | 13.1817 | 75.863 | 9.483 | 240.0079 | 70383.0625 |
|
158 |
-
| 46000 | 0.4646 | 1008.3849 | 20491.0391 | 3.9909 | 13.177 | 75.89 | 9.486 | 242.4205 | 70458.2109 |
|
159 |
-
| 46500 | 0.4697 | 998.2806 | 20493.9121 | 3.9908 | 13.1683 | 75.94 | 9.492 | 239.3541 | 70383.0625 |
|
160 |
-
| 47000 | 0.4747 | 999.8284 | 20493.9121 | 3.9906 | 13.2304 | 75.584 | 9.448 | 240.0872 | 70383.0625 |
|
161 |
-
| 47500 | 0.4798 | 1006.2000 | 20502.5723 | 3.9909 | 13.1925 | 75.801 | 9.475 | 241.6403 | 70458.2109 |
|
162 |
-
| 48000 | 0.4848 | 998.1260 | 20517.0273 | 3.9909 | 13.1673 | 75.946 | 9.493 | 239.3145 | 70383.0625 |
|
163 |
-
| 48500 | 0.4899 | 134.8255 | 30153.1016 | 2.7420 | 13.1443 | 76.078 | 9.51 | 11.1570 | 235067.75 |
|
164 |
-
| 49000 | 0.4949 | 128.3738 | 24338.1211 | 2.6060 | 13.2042 | 75.733 | 9.467 | 11.0835 | 188877.9219 |
|
165 |
-
| 49500 | 0.5 | 148.0629 | 18772.1543 | 2.4684 | 13.2334 | 75.566 | 9.446 | 14.6732 | 138973.7031 |
|
166 |
-
| 50000 | 0.5051 | 156.5679 | 16792.9102 | 2.4576 | 13.2265 | 75.606 | 9.451 | 16.5473 | 114776.9922 |
|
167 |
-
| 50500 | 0.5101 | 153.5055 | 17701.2773 | 2.4520 | 13.1908 | 75.81 | 9.476 | 15.9536 | 120102.6562 |
|
168 |
-
| 51000 | 0.5152 | 152.0734 | 17676.3613 | 2.4519 | 13.1825 | 75.858 | 9.482 | 15.7641 | 118195.3203 |
|
169 |
-
| 51500 | 0.5202 | 152.6280 | 17636.5723 | 2.4518 | 13.1021 | 76.324 | 9.54 | 15.8917 | 118700.9297 |
|
170 |
-
| 52000 | 0.5253 | 151.8144 | 17495.5352 | 2.4516 | 13.1172 | 76.236 | 9.529 | 15.7746 | 117315.6719 |
|
171 |
-
| 52500 | 0.5303 | 152.1971 | 17532.5312 | 2.4516 | 13.1603 | 75.986 | 9.498 | 15.8268 | 118258.4609 |
|
172 |
-
| 53000 | 0.5354 | 152.6162 | 17552.3066 | 2.4518 | 13.0995 | 76.339 | 9.542 | 15.8963 | 118321.5156 |
|
173 |
-
| 53500 | 0.5404 | 152.0969 | 17641.5352 | 2.4519 | 13.1548 | 76.018 | 9.502 | 15.7628 | 119272.3828 |
|
174 |
-
| 54000 | 0.5455 | 152.1971 | 17581.9941 | 2.4516 | 13.1914 | 75.807 | 9.476 | 15.8170 | 118258.4609 |
|
175 |
-
| 54500 | 0.5505 | 152.1440 | 17581.9941 | 2.4516 | 13.1116 | 76.269 | 9.534 | 15.7967 | 118258.4609 |
|
176 |
-
| 55000 | 0.5556 | 152.3032 | 17631.5938 | 2.4519 | 13.162 | 75.976 | 9.497 | 15.7961 | 118954.5469 |
|
177 |
-
| 55500 | 0.5606 | 151.3857 | 17495.5352 | 2.4518 | 13.1938 | 75.793 | 9.474 | 15.7173 | 117159.2578 |
|
178 |
-
| 56000 | 0.5657 | 152.5571 | 17581.9941 | 2.4518 | 13.1714 | 75.922 | 9.49 | 15.8812 | 118574.3203 |
|
179 |
-
| 56500 | 0.5707 | 151.2334 | 17505.3984 | 2.4515 | 13.108 | 76.289 | 9.536 | 15.6907 | 116971.9219 |
|
180 |
-
| 57000 | 0.5758 | 152.5334 | 17542.4160 | 2.4516 | 13.1795 | 75.875 | 9.484 | 15.8891 | 118258.4609 |
|
181 |
-
| 57500 | 0.5808 | 152.4154 | 17611.7480 | 2.4519 | 13.2191 | 75.648 | 9.456 | 15.8412 | 118764.3359 |
|
182 |
-
| 58000 | 0.5859 | 152.0144 | 17537.4824 | 2.4516 | 13.1568 | 76.006 | 9.501 | 15.7739 | 118069.25 |
|
183 |
-
| 58500 | 0.5909 | 152.4980 | 17542.4160 | 2.4516 | 13.2239 | 75.621 | 9.453 | 15.8773 | 118258.4609 |
|
184 |
-
| 59000 | 0.5960 | 151.9203 | 17552.3066 | 2.4518 | 13.2703 | 75.356 | 9.42 | 15.7687 | 118321.5156 |
|
185 |
-
| 59500 | 0.6010 | 152.4862 | 17621.6758 | 2.4516 | 13.1748 | 75.903 | 9.488 | 15.8825 | 118574.3203 |
|
186 |
-
| 60000 | 0.6061 | 152.3976 | 17601.8242 | 2.4518 | 13.2989 | 75.194 | 9.399 | 15.8163 | 118511.125 |
|
187 |
-
| 60500 | 0.6111 | 152.4862 | 17581.9941 | 2.4518 | 13.2397 | 75.53 | 9.441 | 15.8727 | 118764.3359 |
|
188 |
-
| 61000 | 0.6162 | 151.4678 | 17525.1270 | 2.4516 | 13.1482 | 76.056 | 9.507 | 15.7264 | 118258.4609 |
|
189 |
-
| 61500 | 0.6212 | 152.4154 | 17562.2012 | 2.4516 | 13.1563 | 76.009 | 9.501 | 15.8595 | 118511.125 |
|
190 |
-
| 62000 | 0.6263 | 152.8055 | 17552.3066 | 2.4516 | 13.1118 | 76.267 | 9.533 | 15.9160 | 118447.8516 |
|
191 |
-
| 62500 | 0.6313 | 152.0144 | 17581.9941 | 2.4519 | 13.1961 | 75.78 | 9.473 | 15.7661 | 118447.8516 |
|
192 |
-
| 63000 | 0.6364 | 152.7227 | 17621.6758 | 2.4516 | 13.2102 | 75.699 | 9.462 | 15.8904 | 118637.6641 |
|
193 |
-
| 63500 | 0.6414 | 152.1971 | 17572.1035 | 2.4516 | 13.3278 | 75.031 | 9.379 | 15.8111 | 118321.5156 |
|
194 |
-
| 64000 | 0.6465 | 152.0969 | 17507.8535 | 2.4516 | 13.2988 | 75.195 | 9.399 | 15.8144 | 118195.3203 |
|
195 |
-
| 64500 | 0.6515 | 152.8173 | 17636.5723 | 2.4519 | 13.2059 | 75.724 | 9.465 | 15.9147 | 119018.0938 |
|
196 |
-
| 65000 | 0.6566 | 151.9673 | 17562.2012 | 2.4519 | 13.3332 | 75.001 | 9.375 | 15.7674 | 117503.6719 |
|
197 |
-
| 65500 | 0.6616 | 152.4862 | 17631.5938 | 2.4518 | 13.2498 | 75.473 | 9.434 | 15.8779 | 118447.8516 |
|
198 |
-
| 66000 | 0.6667 | 152.0144 | 17542.4160 | 2.4518 | 13.1884 | 75.824 | 9.478 | 15.7967 | 117754.7266 |
|
199 |
-
| 66500 | 0.6717 | 152.4862 | 17552.3066 | 2.4516 | 13.2577 | 75.428 | 9.429 | 15.8845 | 118511.125 |
|
200 |
-
| 67000 | 0.6768 | 152.4626 | 17552.3066 | 2.4516 | 13.294 | 75.222 | 9.403 | 15.8759 | 118511.125 |
|
201 |
-
| 67500 | 0.6818 | 152.2089 | 17581.9941 | 2.4518 | 13.1401 | 76.103 | 9.513 | 15.8118 | 118637.6641 |
|
202 |
-
| 68000 | 0.6869 | 151.8967 | 17581.9941 | 2.4519 | 13.2487 | 75.479 | 9.435 | 15.7505 | 118321.5156 |
|
203 |
-
| 68500 | 0.6919 | 152.5098 | 17581.9941 | 2.4519 | 13.1358 | 76.128 | 9.516 | 15.8792 | 118764.3359 |
|
204 |
-
| 69000 | 0.6970 | 152.1204 | 17572.1035 | 2.4516 | 13.1347 | 76.134 | 9.517 | 15.7889 | 118258.4609 |
|
205 |
-
| 69500 | 0.7020 | 152.4390 | 17572.1035 | 2.4516 | 13.3358 | 74.986 | 9.373 | 15.8635 | 117943.3203 |
|
206 |
-
| 70000 | 0.7071 | 152.4095 | 17581.9941 | 2.4516 | 13.2272 | 75.602 | 9.45 | 15.8464 | 118258.4609 |
|
207 |
-
| 70500 | 0.7121 | 152.1971 | 17581.9941 | 2.4518 | 13.2901 | 75.244 | 9.405 | 15.8007 | 118321.5156 |
|
208 |
-
| 71000 | 0.7172 | 152.4744 | 17581.9941 | 2.4519 | 13.2844 | 75.276 | 9.41 | 15.8681 | 118574.3203 |
|
209 |
-
| 71500 | 0.7222 | 152.4862 | 17661.4316 | 2.4520 | 13.1669 | 75.948 | 9.494 | 15.8517 | 119145.1719 |
|
210 |
-
| 72000 | 0.7273 | 152.4390 | 17621.6758 | 2.4518 | 13.1698 | 75.931 | 9.491 | 15.8615 | 118511.125 |
|
211 |
-
| 72500 | 0.7323 | 152.4095 | 17581.9941 | 2.4518 | 13.2893 | 75.249 | 9.406 | 15.8464 | 118321.5156 |
|
212 |
-
| 73000 | 0.7374 | 151.8967 | 17517.7246 | 2.4518 | 13.2449 | 75.501 | 9.438 | 15.7661 | 118258.4609 |
|
213 |
-
| 73500 | 0.7424 | 152.1499 | 17532.5312 | 2.4516 | 13.2259 | 75.609 | 9.451 | 15.8268 | 117880.4609 |
|
214 |
-
| 74000 | 0.7475 | 152.3032 | 17572.1035 | 2.4516 | 13.239 | 75.535 | 9.442 | 15.8320 | 118069.25 |
|
215 |
-
| 74500 | 0.7525 | 152.7108 | 17581.9941 | 2.4516 | 13.2845 | 75.276 | 9.409 | 15.8891 | 118447.8516 |
|
216 |
-
| 75000 | 0.7576 | 152.4862 | 17646.5156 | 2.4519 | 13.2765 | 75.321 | 9.415 | 15.8681 | 118891.1484 |
|
217 |
-
| 75500 | 0.7626 | 152.1499 | 17601.8242 | 2.4519 | 13.2882 | 75.255 | 9.407 | 15.7863 | 118511.125 |
|
218 |
-
| 76000 | 0.7677 | 152.1440 | 17581.9941 | 2.4518 | 13.3023 | 75.175 | 9.397 | 15.7863 | 118384.7266 |
|
219 |
-
| 76500 | 0.7727 | 152.2089 | 17581.9941 | 2.4518 | 13.2007 | 75.754 | 9.469 | 15.7980 | 118384.7266 |
|
220 |
-
| 77000 | 0.7778 | 152.1440 | 17611.7480 | 2.4518 | 13.1838 | 75.851 | 9.481 | 15.7863 | 118447.8516 |
|
221 |
-
| 77500 | 0.7828 | 151.8967 | 17572.1035 | 2.4516 | 13.2756 | 75.326 | 9.416 | 15.7550 | 118258.4609 |
|
222 |
-
| 78000 | 0.7879 | 152.3976 | 17581.9941 | 2.4518 | 13.1946 | 75.789 | 9.474 | 15.8255 | 118384.7266 |
|
223 |
-
| 78500 | 0.7929 | 151.8497 | 17581.9941 | 2.4518 | 13.2414 | 75.521 | 9.44 | 15.7472 | 118447.8516 |
|
224 |
-
| 79000 | 0.7980 | 151.6968 | 17581.9941 | 2.4518 | 13.225 | 75.615 | 9.452 | 15.7374 | 118447.8516 |
|
225 |
-
| 79500 | 0.8030 | 152.1440 | 17581.9941 | 2.4518 | 13.247 | 75.489 | 9.436 | 15.7863 | 118321.5156 |
|
226 |
-
| 80000 | 0.8081 | 152.4390 | 17611.7480 | 2.4519 | 13.2043 | 75.733 | 9.467 | 15.8589 | 118574.3203 |
|
227 |
-
| 80500 | 0.8131 | 152.4390 | 17641.5352 | 2.4519 | 13.2656 | 75.383 | 9.423 | 15.8504 | 118764.3359 |
|
228 |
-
| 81000 | 0.8182 | 152.0144 | 17601.8242 | 2.4519 | 13.2737 | 75.337 | 9.417 | 15.7674 | 118321.5156 |
|
229 |
-
| 81500 | 0.8232 | 152.1440 | 17611.7480 | 2.4519 | 13.2567 | 75.434 | 9.429 | 15.7824 | 118447.8516 |
|
230 |
-
| 82000 | 0.8283 | 151.6146 | 17581.9941 | 2.4518 | 13.1554 | 76.015 | 9.502 | 15.7329 | 118258.4609 |
|
231 |
-
| 82500 | 0.8333 | 152.0734 | 17572.1035 | 2.4518 | 13.2061 | 75.722 | 9.465 | 15.7700 | 118321.5156 |
|
232 |
-
| 83000 | 0.8384 | 152.2561 | 17562.2012 | 2.4518 | 13.0615 | 76.561 | 9.57 | 15.8007 | 118447.8516 |
|
233 |
-
| 83500 | 0.8434 | 152.4390 | 17581.9941 | 2.4518 | 13.1488 | 76.052 | 9.507 | 15.8648 | 118637.6641 |
|
234 |
-
| 84000 | 0.8485 | 151.9673 | 17572.1035 | 2.4518 | 13.1842 | 75.848 | 9.481 | 15.7700 | 118447.8516 |
|
235 |
-
| 84500 | 0.8535 | 152.2561 | 17581.9941 | 2.4518 | 13.1531 | 76.028 | 9.503 | 15.8046 | 118511.125 |
|
236 |
-
| 85000 | 0.8586 | 151.5383 | 17559.7227 | 2.4518 | 13.0464 | 76.65 | 9.581 | 15.7303 | 118069.25 |
|
237 |
-
| 85500 | 0.8636 | 152.3976 | 17572.1035 | 2.4518 | 13.082 | 76.441 | 9.555 | 15.8203 | 118195.3203 |
|
238 |
-
| 86000 | 0.8687 | 151.8497 | 17572.1035 | 2.4518 | 13.1831 | 75.854 | 9.482 | 15.7537 | 118258.4609 |
|
239 |
-
| 86500 | 0.8737 | 151.8497 | 17581.9941 | 2.4518 | 13.1182 | 76.23 | 9.529 | 15.7537 | 118321.5156 |
|
240 |
-
| 87000 | 0.8788 | 151.7203 | 17581.9941 | 2.4518 | 13.1367 | 76.123 | 9.515 | 15.7452 | 118258.4609 |
|
241 |
-
| 87500 | 0.8838 | 151.8497 | 17581.9941 | 2.4516 | 13.217 | 75.66 | 9.458 | 15.7498 | 118258.4609 |
|
242 |
-
| 88000 | 0.8889 | 152.0615 | 17581.9941 | 2.4518 | 13.1568 | 76.006 | 9.501 | 15.7700 | 118384.7266 |
|
243 |
-
| 88500 | 0.8939 | 152.2442 | 17581.9941 | 2.4518 | 13.1319 | 76.15 | 9.519 | 15.7961 | 118384.7266 |
|
244 |
-
| 89000 | 0.8990 | 152.1971 | 17581.9941 | 2.4518 | 13.0928 | 76.378 | 9.547 | 15.7974 | 118447.8516 |
|
245 |
-
| 89500 | 0.9040 | 151.7203 | 17562.2012 | 2.4518 | 13.1415 | 76.095 | 9.512 | 15.7433 | 118447.8516 |
|
246 |
-
| 90000 | 0.9091 | 151.8497 | 17572.1035 | 2.4518 | 13.1841 | 75.849 | 9.481 | 15.7537 | 118447.8516 |
|
247 |
-
| 90500 | 0.9141 | 151.9673 | 17581.9941 | 2.4518 | 13.1491 | 76.051 | 9.506 | 15.7628 | 118384.7266 |
|
248 |
-
| 91000 | 0.9192 | 151.8497 | 17572.1035 | 2.4516 | 13.1755 | 75.899 | 9.487 | 15.7537 | 118258.4609 |
|
249 |
-
| 91500 | 0.9242 | 151.6498 | 17572.1035 | 2.4518 | 13.0781 | 76.464 | 9.558 | 15.7413 | 118258.4609 |
|
250 |
-
| 92000 | 0.9293 | 151.6146 | 17549.8301 | 2.4518 | 13.1794 | 75.876 | 9.485 | 15.7400 | 118132.3203 |
|
251 |
-
| 92500 | 0.9343 | 151.9673 | 17572.1035 | 2.4516 | 13.1589 | 75.994 | 9.499 | 15.7661 | 118258.4609 |
|
252 |
-
| 93000 | 0.9394 | 152.0615 | 17572.1035 | 2.4518 | 13.1847 | 75.846 | 9.481 | 15.7791 | 118258.4609 |
|
253 |
-
| 93500 | 0.9444 | 151.9673 | 17572.1035 | 2.4516 | 13.298 | 75.199 | 9.4 | 15.7628 | 118258.4609 |
|
254 |
-
| 94000 | 0.9495 | 151.8497 | 17552.3066 | 2.4516 | 13.2436 | 75.508 | 9.439 | 15.7537 | 118132.3203 |
|
255 |
-
| 94500 | 0.9545 | 151.8967 | 17552.3066 | 2.4516 | 13.1788 | 75.88 | 9.485 | 15.7583 | 118195.3203 |
|
256 |
-
| 95000 | 0.9596 | 151.8967 | 17552.3066 | 2.4518 | 13.173 | 75.913 | 9.489 | 15.7596 | 118258.4609 |
|
257 |
-
| 95500 | 0.9646 | 151.8967 | 17562.2012 | 2.4516 | 13.2802 | 75.3 | 9.413 | 15.7583 | 118258.4609 |
|
258 |
-
| 96000 | 0.9697 | 152.0615 | 17572.1035 | 2.4516 | 13.2413 | 75.522 | 9.44 | 15.7713 | 118258.4609 |
|
259 |
-
| 96500 | 0.9747 | 152.1440 | 17581.9941 | 2.4516 | 13.1021 | 76.324 | 9.54 | 15.7876 | 118321.5156 |
|
260 |
-
| 97000 | 0.9798 | 152.1440 | 17581.9941 | 2.4516 | 13.1182 | 76.23 | 9.529 | 15.7922 | 118321.5156 |
|
261 |
-
| 97500 | 0.9848 | 152.1971 | 17581.9941 | 2.4518 | 13.2024 | 75.744 | 9.468 | 15.7974 | 118321.5156 |
|
262 |
-
| 98000 | 0.9899 | 152.1499 | 17581.9941 | 2.4518 | 13.1624 | 75.974 | 9.497 | 15.7961 | 118321.5156 |
|
263 |
-
| 98500 | 0.9949 | 152.1499 | 17581.9941 | 2.4518 | 13.1912 | 75.808 | 9.476 | 15.7935 | 118321.5156 |
|
264 |
-
| 99000 | 1.0 | 152.1499 | 17581.9941 | 2.4518 | 13.0987 | 76.344 | 9.543 | 15.7935 | 118321.5156 |
|
265 |
|
266 |
### Framework versions
|
267 |
- Distily 0.2.0
|
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
+
- eval_enwikippl: 28257.9004
|
19 |
+
- eval_frwikippl: 63896.6680
|
20 |
+
- eval_zhwikippl: 90059.6875
|
21 |
+
- eval_tinystoriesppl: 18426.4922
|
22 |
+
- eval_loss: 6.6740
|
23 |
+
- eval_runtime: 13.137
|
24 |
+
- eval_samples_per_second: 76.121
|
25 |
+
- eval_steps_per_second: 9.515
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
47 |
The following hyperparameters were used during training:
|
48 |
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
|
49 |
- train_embeddings: True
|
50 |
+
- learning_rate: 4e-05
|
51 |
+
- train_batch_size: 8
|
52 |
- eval_batch_size: 8
|
53 |
- seed: 42
|
54 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
55 |
+
- lr_scheduler_type: constant
|
|
|
56 |
- num_epochs: 1.0
|
57 |
|
58 |
### Resource Usage
|
59 |
+
Peak GPU Memory: 8.0568 GB
|
60 |
|
61 |
### Eval-Phase Metrics
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
65 |
+
| 0 | 0 | 35507.3906 | 70936.2969 | 6.875 | 13.2774 | 75.316 | 9.414 | 24370.3125 | 92840.9844 |
|
66 |
+
| 500 | 0.0404 | 28284.1875 | 63896.6680 | 6.6737 | 13.1884 | 75.824 | 9.478 | 18447.8379 | 90059.6875 |
|
67 |
+
| 1000 | 0.0808 | 28284.1875 | 63896.6680 | 6.6740 | 13.221 | 75.637 | 9.455 | 18444.7754 | 90059.6875 |
|
68 |
+
| 1500 | 0.1212 | 28284.1875 | 63896.6680 | 6.6740 | 13.1643 | 75.963 | 9.495 | 18444.7754 | 90059.6875 |
|
69 |
+
| 2000 | 0.1616 | 28284.1875 | 63896.6680 | 6.6740 | 13.2331 | 75.568 | 9.446 | 18438.6914 | 90059.6875 |
|
70 |
+
| 2500 | 0.2020 | 28284.1875 | 63896.6680 | 6.6740 | 13.1865 | 75.835 | 9.479 | 18432.5898 | 90059.6875 |
|
71 |
+
| 3000 | 0.2424 | 28257.9004 | 63896.6680 | 6.6740 | 13.246 | 75.494 | 9.437 | 18426.4922 | 90059.6875 |
|
72 |
+
| 3500 | 0.2828 | 28257.9004 | 63896.6680 | 6.6740 | 13.1762 | 75.895 | 9.487 | 18426.4922 | 90059.6875 |
|
73 |
+
| 4000 | 0.3232 | 28257.9004 | 63896.6680 | 6.6740 | 13.3585 | 74.859 | 9.357 | 18426.4922 | 90059.6875 |
|
74 |
+
| 4500 | 0.3636 | 28257.9004 | 63896.6680 | 6.6740 | 13.1842 | 75.848 | 9.481 | 18426.4922 | 90059.6875 |
|
75 |
+
| 5000 | 0.4040 | 28257.9004 | 63896.6680 | 6.6740 | 13.2694 | 75.361 | 9.42 | 18426.4922 | 90059.6875 |
|
76 |
+
| 5500 | 0.4444 | 28257.9004 | 63896.6680 | 6.6740 | 13.2102 | 75.699 | 9.462 | 18426.4922 | 90059.6875 |
|
77 |
+
| 6000 | 0.4848 | 28257.9004 | 63896.6680 | 6.6740 | 13.3012 | 75.181 | 9.398 | 18426.4922 | 90059.6875 |
|
78 |
+
| 6500 | 0.5253 | 28257.9004 | 63896.6680 | 6.6740 | 13.1704 | 75.928 | 9.491 | 18426.4922 | 90059.6875 |
|
79 |
+
| 7000 | 0.5657 | 28257.9004 | 63896.6680 | 6.6740 | 13.2236 | 75.622 | 9.453 | 18426.4922 | 90059.6875 |
|
80 |
+
| 7500 | 0.6061 | 28257.9004 | 63896.6680 | 6.6740 | 13.2333 | 75.567 | 9.446 | 18426.4922 | 90059.6875 |
|
81 |
+
| 8000 | 0.6465 | 28257.9004 | 63896.6680 | 6.6740 | 13.1385 | 76.112 | 9.514 | 18426.4922 | 90059.6875 |
|
82 |
+
| 8500 | 0.6869 | 28257.9004 | 63896.6680 | 6.6740 | 13.2297 | 75.588 | 9.448 | 18426.4922 | 90059.6875 |
|
83 |
+
| 9000 | 0.7273 | 28257.9004 | 63896.6680 | 6.6740 | 13.1073 | 76.293 | 9.537 | 18426.4922 | 90059.6875 |
|
84 |
+
| 9500 | 0.7677 | 28257.9004 | 63896.6680 | 6.6740 | 13.137 | 76.121 | 9.515 | 18426.4922 | 90059.6875 |
|
85 |
+
| 10000 | 0.8081 | 28257.9004 | 63896.6680 | 6.6740 | 13.0862 | 76.417 | 9.552 | 18426.4922 | 90059.6875 |
|
86 |
+
| 10500 | 0.8485 | 28257.9004 | 63896.6680 | 6.6740 | 13.17 | 75.93 | 9.491 | 18426.4922 | 90059.6875 |
|
87 |
+
| 11000 | 0.8889 | 28257.9004 | 63896.6680 | 6.6740 | 13.211 | 75.694 | 9.462 | 18426.4922 | 90059.6875 |
|
88 |
+
| 11500 | 0.9293 | 28257.9004 | 63896.6680 | 6.6740 | 13.1171 | 76.237 | 9.53 | 18426.4922 | 90059.6875 |
|
89 |
+
| 12000 | 0.9697 | 28257.9004 | 63896.6680 | 6.6740 | 13.2484 | 75.481 | 9.435 | 18426.4922 | 90059.6875 |
|
90 |
+
| 12375 | 1.0 | 28257.9004 | 63896.6680 | 6.6740 | 13.2116 | 75.691 | 9.461 | 18426.4922 | 90059.6875 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
### Framework versions
|
93 |
- Distily 0.2.0
|
logs/events.out.tfevents.1723989018.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:438ef4a30b0b2593692b1ac3a19f0c743ee2bcc791e763b8c5b8b0fdf520d3cf
|
3 |
+
size 5859432
|
logs/events.out.tfevents.1723991806.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:07761d2e0f9002fc11b646370a66d7fe112ff8591fb441f5a6e457b7ee19ef1b
|
3 |
+
size 307
|
logs/learning_rate=0.001, lr_scheduler_type=linear, per_device_train_batch_size=1, warmup_ratio=0.5/completed.flag
ADDED
File without changes
|
logs/learning_rate=0.001, lr_scheduler_type=linear, per_device_train_batch_size=1, warmup_ratio=0.5/events.out.tfevents.1723988864.93d6cbb3ad53
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2d5301c950a082d8b6f48b16a8d2cf8ce680f5fef2934e2ec0f7f9b6c8b998a7
|
3 |
+
size 588
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 137033984
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e728acb318ba05b5f36bf136d346ef1f3ce69b7247fe37b21b6097691677009e
|
3 |
size 137033984
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e61b0b87b1c473f9bd2bbb0edd556da12cbc4e1b0589ef638d5b437883b91e4d
|
3 |
+
size 1017947976
|