File size: 6,295 Bytes
fa3e403
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2063507
 
 
 
 
 
 
 
fa3e403
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6bfb2f7
fa3e403
 
 
 
 
 
 
 
 
2063507
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fa3e403
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
base_model: gpt2
library_name: Distily
license: mit
tags:
- generated_from_trainer
model-index:
- name: distily_bench_obj_cross_v2.12_gpt2
  results: []
---

# distily_bench_obj_cross_v2.12_gpt2

This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified).

The [Distily](https://github.com/lapp0/distily) library was used for this distillation.

It achieves the following results on the evaluation set:
- eval_enwikippl: 665.9925
- eval_frwikippl: 995.4457
- eval_zhwikippl: 405.3946
- eval_tinystoriesppl: 1100.5725
- eval_loss: 1.3024
- eval_runtime: 12.5753
- eval_samples_per_second: 47.713
- eval_steps_per_second: 11.928

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment.

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed
-->

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
- train_embeddings: True
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0

### Resource Usage
Peak GPU Memory: 3.9293 GB

### Eval-Phase Metrics
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| **teacher eval** |  | 270.2348 | 76.8142 |  |  |  |  | 671.1238 | 22.8030 |
| 0 | 0 | 147374.6094 | 4251118206976.0 | 19.8108 | 12.6652 | 47.374 | 11.843 | 74.6838 | 6171058503680.0 |
| 1500 | 0.0253 | 1012.5726 | 4501.9321 | 2.2064 | 12.5479 | 47.817 | 11.954 | 1084.7205 | 39061.2969 |
| 3000 | 0.0505 | 761.3547 | 2880.7776 | 1.7218 | 12.6141 | 47.566 | 11.891 | 932.5889 | 1552.8525 |
| 4500 | 0.0758 | 682.1792 | 1444.0309 | 1.5343 | 12.6458 | 47.447 | 11.862 | 963.2644 | 421.1599 |
| 6000 | 0.1010 | 673.6849 | 1216.2458 | 1.4424 | 12.6927 | 47.271 | 11.818 | 1035.7787 | 983.8034 |
| 7500 | 0.1263 | 630.5226 | 924.8793 | 1.3688 | 12.561 | 47.767 | 11.942 | 971.2607 | 351.8923 |
| 9000 | 0.1515 | 665.9925 | 995.4457 | 1.3024 | 12.5753 | 47.713 | 11.928 | 1100.5725 | 405.3946 |
| 10500 | 0.1768 | 649.4595 | 870.4929 | 1.2363 | 12.5912 | 47.652 | 11.913 | 1147.8689 | 379.8699 |
| 12000 | 0.2020 | 552.0709 | 756.2815 | 1.1687 | 12.5514 | 47.804 | 11.951 | 915.4786 | 247.3208 |
| 13500 | 0.2273 | 574.5076 | 775.2103 | 1.1446 | 12.6584 | 47.399 | 11.85 | 1022.3383 | 258.0553 |
| 15000 | 0.2525 | 570.0630 | 872.7639 | 1.1033 | 12.573 | 47.721 | 11.93 | 1034.7090 | 205.1337 |
| 16500 | 0.2778 | 524.1483 | 695.0405 | 1.0708 | 12.5445 | 47.83 | 11.957 | 960.6801 | 179.8155 |
| 18000 | 0.3030 | 558.0261 | 722.4153 | 1.0562 | 12.6414 | 47.463 | 11.866 | 1092.5500 | 238.2534 |
| 19500 | 0.3283 | 535.8491 | 646.8846 | 1.0133 | 12.5343 | 47.869 | 11.967 | 1038.2650 | 224.3871 |
| 21000 | 0.3535 | 498.7090 | 643.3860 | 0.9866 | 12.6044 | 47.602 | 11.901 | 945.8655 | 325.0199 |
| 22500 | 0.3788 | 501.5469 | 612.7169 | 0.9680 | 12.5367 | 47.86 | 11.965 | 979.3635 | 253.6864 |
| 24000 | 0.4040 | 376.6320 | 629.0483 | 0.9542 | 12.5557 | 47.787 | 11.947 | 639.3351 | 209.0216 |
| 25500 | 0.4293 | 481.3532 | 705.2970 | 0.9196 | 12.6849 | 47.3 | 11.825 | 966.3749 | 375.7875 |
| 27000 | 0.4545 | 459.1099 | 522.3182 | 0.8577 | 12.5747 | 47.715 | 11.929 | 958.1420 | 189.4054 |
| 28500 | 0.4798 | 413.4502 | 431.4271 | 0.7560 | 12.5416 | 47.841 | 11.96 | 891.3210 | 176.5119 |
| 30000 | 0.5051 | 403.5616 | 415.3713 | 0.7195 | 12.548 | 47.817 | 11.954 | 882.3771 | 152.6556 |
| 31500 | 0.5303 | 406.3142 | 383.7035 | 0.7008 | 12.7238 | 47.156 | 11.789 | 912.3057 | 155.9905 |
| 33000 | 0.5556 | 424.4844 | 373.8076 | 0.6957 | 12.5614 | 47.765 | 11.941 | 974.8803 | 171.0759 |
| 34500 | 0.5808 | 403.1555 | 398.5213 | 0.6867 | 12.5658 | 47.748 | 11.937 | 913.2111 | 178.8704 |
| 36000 | 0.6061 | 399.7424 | 356.4906 | 0.6771 | 12.5757 | 47.711 | 11.928 | 904.7578 | 169.4632 |
| 37500 | 0.6313 | 398.5905 | 372.6379 | 0.6750 | 12.652 | 47.423 | 11.856 | 912.7961 | 158.8251 |
| 39000 | 0.6566 | 392.1436 | 371.0796 | 0.6723 | 12.6742 | 47.34 | 11.835 | 882.8148 | 176.4061 |
| 40500 | 0.6818 | 393.4750 | 371.6812 | 0.6672 | 12.6703 | 47.355 | 11.839 | 901.9575 | 134.3779 |
| 42000 | 0.7071 | 399.2395 | 357.3452 | 0.6651 | 12.6545 | 47.414 | 11.853 | 913.0604 | 135.6295 |
| 43500 | 0.7323 | 391.1350 | 370.6879 | 0.6558 | 12.6748 | 47.338 | 11.834 | 896.4939 | 156.0113 |
| 45000 | 0.7576 | 382.1500 | 345.0898 | 0.6354 | 12.6893 | 47.284 | 11.821 | 884.7507 | 140.7350 |
| 46500 | 0.7828 | 379.9360 | 334.1126 | 0.6281 | 12.6503 | 47.43 | 11.857 | 877.5396 | 127.1069 |
| 48000 | 0.8081 | 379.3625 | 342.2339 | 0.6241 | 12.6749 | 47.338 | 11.834 | 882.8514 | 128.6507 |
| 49500 | 0.8333 | 379.1130 | 333.6659 | 0.6222 | 12.6951 | 47.262 | 11.816 | 881.2473 | 125.1969 |
| 51000 | 0.8586 | 378.2769 | 332.6569 | 0.6217 | 12.6252 | 47.524 | 11.881 | 883.0703 | 128.0856 |
| 52500 | 0.8838 | 377.0043 | 335.4331 | 0.6182 | 12.6655 | 47.373 | 11.843 | 880.3371 | 128.4364 |
| 54000 | 0.9091 | 376.5811 | 333.1023 | 0.6165 | 12.6459 | 47.446 | 11.862 | 877.0681 | 129.0633 |
| 55500 | 0.9343 | 377.9547 | 333.2431 | 0.6157 | 12.6412 | 47.464 | 11.866 | 883.1432 | 127.1832 |
| 57000 | 0.9596 | 378.2183 | 332.4462 | 0.6147 | 12.6477 | 47.439 | 11.86 | 884.0200 | 126.3209 |
| 58500 | 0.9848 | 377.9839 | 333.1023 | 0.6146 | 12.6522 | 47.422 | 11.856 | 883.7274 | 126.2198 |
| 59400 | 1.0 | 378.0425 | 333.0085 | 0.6147 | 12.651 | 47.427 | 11.857 | 883.7274 | 126.2198 |

### Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.21.0