--- base_model: roneneldan/TinyStories-33M library_name: Distily tags: - generated_from_trainer model-index: - name: distily_bench_obj_cross_v2.10 results: [] --- # distily_bench_obj_cross_v2.10 This student model is distilled from the teacher model [roneneldan/TinyStories-33M](https://huggingface.co/roneneldan/TinyStories-33M) using the dataset (unspecified). The [Distily](https://github.com/lapp0/distily) library was used for this distillation. It achieves the following results on the evaluation set: - eval_enwikippl: 107.6398 - eval_frwikippl: 10204.3643 - eval_zhwikippl: 49954.8242 - eval_tinystoriesppl: 6.6903 - eval_loss: 0.7036 - eval_runtime: 13.0602 - eval_samples_per_second: 76.568 - eval_steps_per_second: 9.571 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None)) - train_embeddings: True - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1.0 ### Resource Usage Peak GPU Memory: 6.6064 GB ### Eval-Phase Metrics | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 | | 0 | 0 | 50480.5703 | 85684.4844 | 6.8305 | 13.0304 | 76.744 | 9.593 | 33932.0586 | 94692.1562 | | 5000 | 0.0505 | 110.8554 | 10584.2598 | 0.7523 | 13.0416 | 76.677 | 9.585 | 6.7911 | 42034.9414 | | 10000 | 0.1010 | 104.0690 | 10210.1172 | 0.7242 | 13.0341 | 76.722 | 9.59 | 6.4174 | 44683.2305 | | 15000 | 0.1515 | 113.6466 | 10400.9941 | 0.7156 | 13.0171 | 76.822 | 9.603 | 7.2840 | 46906.4258 | | 20000 | 0.2020 | 111.4970 | 9877.6748 | 0.7117 | 13.0184 | 76.814 | 9.602 | 7.1889 | 47931.1602 | | 25000 | 0.2525 | 107.3317 | 10121.3330 | 0.7051 | 13.088 | 76.406 | 9.551 | 6.6947 | 49516.9375 | | 30000 | 0.3030 | 107.4814 | 10147.0312 | 0.7042 | 13.0664 | 76.532 | 9.567 | 6.6925 | 49728.7578 | | 35000 | 0.3535 | 107.5147 | 10109.9404 | 0.7041 | 13.0324 | 76.732 | 9.591 | 6.6794 | 49279.6914 | | 40000 | 0.4040 | 107.5064 | 10121.3330 | 0.7041 | 13.1335 | 76.141 | 9.518 | 6.6994 | 49835.0078 | | 45000 | 0.4545 | 107.3816 | 10129.8984 | 0.7039 | 13.1075 | 76.292 | 9.537 | 6.6972 | 49464.1211 | | 50000 | 0.5051 | 107.5231 | 10129.8984 | 0.7040 | 13.0137 | 76.842 | 9.605 | 6.7041 | 49808.4492 | | 55000 | 0.5556 | 107.7482 | 10135.5996 | 0.7040 | 13.0084 | 76.874 | 9.609 | 6.7052 | 49464.1211 | | 60000 | 0.6061 | 107.6064 | 10204.3643 | 0.7040 | 13.0291 | 76.751 | 9.594 | 6.6991 | 49914.8711 | | 65000 | 0.6566 | 107.6981 | 10204.3643 | 0.7037 | 13.0479 | 76.641 | 9.58 | 6.6958 | 49543.3398 | | 70000 | 0.7071 | 107.8484 | 10204.3643 | 0.7036 | 13.0612 | 76.563 | 9.57 | 6.6953 | 49848.3164 | | 75000 | 0.7576 | 107.5897 | 10204.3643 | 0.7036 | 13.1821 | 75.86 | 9.483 | 6.6895 | 49888.2188 | | 80000 | 0.8081 | 107.6398 | 10204.3643 | 0.7037 | 13.1572 | 76.004 | 9.5 | 6.6900 | 49835.0078 | | 85000 | 0.8586 | 107.7148 | 10204.3643 | 0.7037 | 12.9936 | 76.961 | 9.62 | 6.6928 | 49928.1523 | | 90000 | 0.9091 | 107.6398 | 10204.3643 | 0.7035 | 13.0225 | 76.79 | 9.599 | 6.6919 | 49954.8242 | | 95000 | 0.9596 | 107.6398 | 10204.3643 | 0.7036 | 13.0696 | 76.514 | 9.564 | 6.6914 | 49954.8242 | | 99000 | 1.0 | 107.6398 | 10204.3643 | 0.7036 | 13.0602 | 76.568 | 9.571 | 6.6903 | 49954.8242 | ### Framework versions - Distily 0.2.0 - Transformers 4.44.0 - Pytorch 2.3.0 - Datasets 2.21.0