--- base_model: roneneldan/TinyStories-33M library_name: Distily tags: - generated_from_trainer model-index: - name: distily_bench_obj_cross_v2.10 results: [] --- # distily_bench_obj_cross_v2.10 This student model is distilled from the teacher model [roneneldan/TinyStories-33M](https://huggingface.co/roneneldan/TinyStories-33M) using the dataset (unspecified). The [Distily](https://github.com/lapp0/distily) library was used for this distillation. It achieves the following results on the evaluation set: - eval_enwikippl: 132.5935 - eval_frwikippl: 19405.3008 - eval_zhwikippl: 53229.7070 - eval_tinystoriesppl: 9.1860 - eval_loss: 1.2126 - eval_runtime: 13.0629 - eval_samples_per_second: 76.553 - eval_steps_per_second: 9.569 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None)) - train_embeddings: True - learning_rate: 4e-06 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1.0 ### Resource Usage Peak GPU Memory: 6.6064 GB ### Eval-Phase Metrics | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 | | 0 | 0 | 50480.5703 | 85684.4844 | 6.8305 | 13.0395 | 76.69 | 9.586 | 33932.0586 | 94692.1562 | | 5000 | 0.0505 | 132.2037 | 19438.1211 | 1.2135 | 13.0309 | 76.741 | 9.593 | 9.1868 | 53144.5430 | | 10000 | 0.1010 | 132.4087 | 19372.5156 | 1.2127 | 13.048 | 76.64 | 9.58 | 9.1830 | 53201.2852 | | 15000 | 0.1515 | 132.5935 | 19416.2402 | 1.2128 | 13.0596 | 76.572 | 9.571 | 9.1887 | 53144.5430 | | 20000 | 0.2020 | 132.4292 | 19367.0664 | 1.2127 | 13.037 | 76.705 | 9.588 | 9.1955 | 53343.4375 | | 25000 | 0.2525 | 132.5113 | 19367.0664 | 1.2126 | 13.0647 | 76.542 | 9.568 | 9.1890 | 53258.0898 | | 30000 | 0.3030 | 132.5935 | 19383.4375 | 1.2125 | 13.0105 | 76.861 | 9.608 | 9.1845 | 53201.2852 | | 35000 | 0.3535 | 132.3267 | 19372.5156 | 1.2127 | 13.1134 | 76.258 | 9.532 | 9.1754 | 53229.7070 | | 40000 | 0.4040 | 132.5935 | 19367.0664 | 1.2127 | 13.0356 | 76.713 | 9.589 | 9.1928 | 53229.7070 | | 45000 | 0.4545 | 132.4908 | 19372.5156 | 1.2126 | 13.0611 | 76.563 | 9.57 | 9.1826 | 53258.0898 | | 50000 | 0.5051 | 132.2447 | 19405.3008 | 1.2126 | 13.07 | 76.511 | 9.564 | 9.1803 | 53286.5391 | | 55000 | 0.5556 | 132.6346 | 19405.3008 | 1.2126 | 13.0134 | 76.844 | 9.605 | 9.1917 | 53229.7070 | | 60000 | 0.6061 | 132.6346 | 19405.3008 | 1.2126 | 13.0453 | 76.656 | 9.582 | 9.1883 | 53258.0898 | | 65000 | 0.6566 | 132.6346 | 19394.3652 | 1.2126 | 13.0475 | 76.643 | 9.58 | 9.1928 | 53258.0898 | | 70000 | 0.7071 | 132.5935 | 19427.1680 | 1.2125 | 13.0602 | 76.568 | 9.571 | 9.1830 | 53229.7070 | | 75000 | 0.7576 | 132.4292 | 19405.3008 | 1.2126 | 13.0658 | 76.535 | 9.567 | 9.1788 | 53229.7070 | | 80000 | 0.8081 | 132.6346 | 19405.3008 | 1.2127 | 13.0497 | 76.63 | 9.579 | 9.1871 | 53229.7070 | | 85000 | 0.8586 | 132.5935 | 19405.3008 | 1.2126 | 13.0439 | 76.664 | 9.583 | 9.1879 | 53229.7070 | | 90000 | 0.9091 | 132.5935 | 19405.3008 | 1.2126 | 13.0368 | 76.706 | 9.588 | 9.1814 | 53229.7070 | | 95000 | 0.9596 | 132.5935 | 19405.3008 | 1.2126 | 13.0326 | 76.731 | 9.591 | 9.1868 | 53229.7070 | | 99000 | 1.0 | 132.5935 | 19405.3008 | 1.2126 | 13.0629 | 76.553 | 9.569 | 9.1860 | 53229.7070 | ### Framework versions - Distily 0.2.0 - Transformers 4.44.0 - Pytorch 2.3.0 - Datasets 2.21.0