metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
- generated_from_trainer
model-index:
- name: distily_bench_obj_cross_v2.10
results: []
distily_bench_obj_cross_v2.10
This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 132.5935
- eval_frwikippl: 19405.3008
- eval_zhwikippl: 53229.7070
- eval_tinystoriesppl: 9.1860
- eval_loss: 1.2126
- eval_runtime: 13.0629
- eval_samples_per_second: 76.553
- eval_steps_per_second: 9.569
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
- train_embeddings: True
- learning_rate: 4e-06
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 6.6064 GB
Eval-Phase Metrics
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
---|---|---|---|---|---|---|---|---|---|
teacher eval | 169.9865 | 47377.9414 | 3.9789 | 4998.1294 | |||||
0 | 0 | 50480.5703 | 85684.4844 | 6.8305 | 13.0395 | 76.69 | 9.586 | 33932.0586 | 94692.1562 |
5000 | 0.0505 | 132.2037 | 19438.1211 | 1.2135 | 13.0309 | 76.741 | 9.593 | 9.1868 | 53144.5430 |
10000 | 0.1010 | 132.4087 | 19372.5156 | 1.2127 | 13.048 | 76.64 | 9.58 | 9.1830 | 53201.2852 |
15000 | 0.1515 | 132.5935 | 19416.2402 | 1.2128 | 13.0596 | 76.572 | 9.571 | 9.1887 | 53144.5430 |
20000 | 0.2020 | 132.4292 | 19367.0664 | 1.2127 | 13.037 | 76.705 | 9.588 | 9.1955 | 53343.4375 |
25000 | 0.2525 | 132.5113 | 19367.0664 | 1.2126 | 13.0647 | 76.542 | 9.568 | 9.1890 | 53258.0898 |
30000 | 0.3030 | 132.5935 | 19383.4375 | 1.2125 | 13.0105 | 76.861 | 9.608 | 9.1845 | 53201.2852 |
35000 | 0.3535 | 132.3267 | 19372.5156 | 1.2127 | 13.1134 | 76.258 | 9.532 | 9.1754 | 53229.7070 |
40000 | 0.4040 | 132.5935 | 19367.0664 | 1.2127 | 13.0356 | 76.713 | 9.589 | 9.1928 | 53229.7070 |
45000 | 0.4545 | 132.4908 | 19372.5156 | 1.2126 | 13.0611 | 76.563 | 9.57 | 9.1826 | 53258.0898 |
50000 | 0.5051 | 132.2447 | 19405.3008 | 1.2126 | 13.07 | 76.511 | 9.564 | 9.1803 | 53286.5391 |
55000 | 0.5556 | 132.6346 | 19405.3008 | 1.2126 | 13.0134 | 76.844 | 9.605 | 9.1917 | 53229.7070 |
60000 | 0.6061 | 132.6346 | 19405.3008 | 1.2126 | 13.0453 | 76.656 | 9.582 | 9.1883 | 53258.0898 |
65000 | 0.6566 | 132.6346 | 19394.3652 | 1.2126 | 13.0475 | 76.643 | 9.58 | 9.1928 | 53258.0898 |
70000 | 0.7071 | 132.5935 | 19427.1680 | 1.2125 | 13.0602 | 76.568 | 9.571 | 9.1830 | 53229.7070 |
75000 | 0.7576 | 132.4292 | 19405.3008 | 1.2126 | 13.0658 | 76.535 | 9.567 | 9.1788 | 53229.7070 |
80000 | 0.8081 | 132.6346 | 19405.3008 | 1.2127 | 13.0497 | 76.63 | 9.579 | 9.1871 | 53229.7070 |
85000 | 0.8586 | 132.5935 | 19405.3008 | 1.2126 | 13.0439 | 76.664 | 9.583 | 9.1879 | 53229.7070 |
90000 | 0.9091 | 132.5935 | 19405.3008 | 1.2126 | 13.0368 | 76.706 | 9.588 | 9.1814 | 53229.7070 |
95000 | 0.9596 | 132.5935 | 19405.3008 | 1.2126 | 13.0326 | 76.731 | 9.591 | 9.1868 | 53229.7070 |
99000 | 1.0 | 132.5935 | 19405.3008 | 1.2126 | 13.0629 | 76.553 | 9.569 | 9.1860 | 53229.7070 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.21.0