|
--- |
|
base_model: gpt2 |
|
library_name: Distily |
|
license: mit |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: distily_bench_obj_cross_v2.11_gpt2 |
|
results: [] |
|
--- |
|
|
|
# distily_bench_obj_cross_v2.11_gpt2 |
|
|
|
This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified). |
|
|
|
The [Distily](https://github.com/lapp0/distily) library was used for this distillation. |
|
|
|
It achieves the following results on the evaluation set: |
|
- eval_enwikippl: 840.1149 |
|
- eval_frwikippl: 528.4605 |
|
- eval_zhwikippl: 126.6330 |
|
- eval_tinystoriesppl: 1037.4924 |
|
- eval_loss: 0.5100 |
|
- eval_runtime: 21.5094 |
|
- eval_samples_per_second: 46.491 |
|
- eval_steps_per_second: 11.623 |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
--> |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None)) |
|
- train_embeddings: True |
|
- learning_rate: 4e-05 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 4 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 1.0 |
|
|
|
### Resource Usage |
|
Peak GPU Memory: 3.9285 GB |
|
|
|
### Eval-Phase Metrics |
|
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl | |
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
| **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 | |
|
| 0 | 0 | 120078.375 | 1867851235328.0 | 19.4492 | 21.0652 | 47.472 | 11.868 | 72.8770 | 4013754155008.0 | |
|
| 5000 | 0.0505 | 1216.0441 | 888.1107 | 0.7144 | 21.4135 | 46.7 | 11.675 | 1267.6812 | 332.8297 | |
|
| 10000 | 0.1010 | 1162.2788 | 799.4963 | 0.6619 | 21.4269 | 46.67 | 11.668 | 1249.7319 | 438.5025 | |
|
| 15000 | 0.1515 | 980.3101 | 668.6794 | 0.6395 | 21.4739 | 46.568 | 11.642 | 1056.4025 | 425.3380 | |
|
| 20000 | 0.2020 | 1064.2865 | 759.8051 | 0.6318 | 21.4643 | 46.589 | 11.647 | 1151.2905 | 311.5830 | |
|
| 25000 | 0.2525 | 916.0289 | 621.8902 | 0.5662 | 21.1368 | 47.311 | 11.828 | 1071.6635 | 190.3806 | |
|
| 30000 | 0.3030 | 891.1293 | 582.2575 | 0.5445 | 21.4338 | 46.655 | 11.664 | 1072.1951 | 208.7082 | |
|
| 35000 | 0.3535 | 886.6196 | 544.0957 | 0.5381 | 21.5335 | 46.439 | 11.61 | 1057.8008 | 142.8915 | |
|
| 40000 | 0.4040 | 880.1868 | 549.4098 | 0.5349 | 21.4687 | 46.58 | 11.645 | 1076.1021 | 142.8439 | |
|
| 45000 | 0.4545 | 868.9573 | 564.4311 | 0.5323 | 21.4349 | 46.653 | 11.663 | 1042.4788 | 161.4311 | |
|
| 50000 | 0.5051 | 877.1919 | 541.3246 | 0.5320 | 21.548 | 46.408 | 11.602 | 1058.0631 | 167.7873 | |
|
| 55000 | 0.5556 | 869.4625 | 543.6743 | 0.5313 | 21.4821 | 46.55 | 11.638 | 1043.7725 | 163.6863 | |
|
| 60000 | 0.6061 | 872.2788 | 553.3121 | 0.5305 | 21.4316 | 46.66 | 11.665 | 1068.5228 | 141.9700 | |
|
| 65000 | 0.6566 | 833.5512 | 524.0497 | 0.5156 | 21.1637 | 47.251 | 11.813 | 1028.6963 | 137.2677 | |
|
| 70000 | 0.7071 | 837.5645 | 523.4596 | 0.5133 | 21.4101 | 46.707 | 11.677 | 1031.1652 | 124.3812 | |
|
| 75000 | 0.7576 | 847.7309 | 523.0175 | 0.5129 | 21.1745 | 47.227 | 11.807 | 1047.8357 | 130.6221 | |
|
| 80000 | 0.8081 | 843.6693 | 534.2609 | 0.5125 | 21.388 | 46.755 | 11.689 | 1040.4556 | 125.4979 | |
|
| 85000 | 0.8586 | 843.2120 | 524.1607 | 0.5106 | 21.4851 | 46.544 | 11.636 | 1042.5220 | 126.1609 | |
|
| 90000 | 0.9091 | 842.1672 | 529.2425 | 0.5101 | 21.4494 | 46.621 | 11.655 | 1040.6277 | 126.7345 | |
|
| 95000 | 0.9596 | 838.0835 | 528.3859 | 0.5099 | 21.1216 | 47.345 | 11.836 | 1034.5377 | 126.5655 | |
|
| 99000 | 1.0 | 840.1149 | 528.4605 | 0.5100 | 21.5094 | 46.491 | 11.623 | 1037.4924 | 126.6330 | |
|
|
|
### Framework versions |
|
- Distily 0.2.0 |
|
- Transformers 4.44.0 |
|
- Pytorch 2.3.0 |
|
- Datasets 2.21.0 |
|
|