lapp0's picture
End of training
47eb4b6 verified
|
raw
history blame
No virus
4.31 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.10
    results: []

distily_bench_obj_cross_v2.10

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 132.5935
  • eval_frwikippl: 19405.3008
  • eval_zhwikippl: 53229.7070
  • eval_tinystoriesppl: 9.1860
  • eval_loss: 1.2126
  • eval_runtime: 13.0629
  • eval_samples_per_second: 76.553
  • eval_steps_per_second: 9.569

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-06
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6064 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 50480.5703 85684.4844 6.8305 13.0395 76.69 9.586 33932.0586 94692.1562
5000 0.0505 132.2037 19438.1211 1.2135 13.0309 76.741 9.593 9.1868 53144.5430
10000 0.1010 132.4087 19372.5156 1.2127 13.048 76.64 9.58 9.1830 53201.2852
15000 0.1515 132.5935 19416.2402 1.2128 13.0596 76.572 9.571 9.1887 53144.5430
20000 0.2020 132.4292 19367.0664 1.2127 13.037 76.705 9.588 9.1955 53343.4375
25000 0.2525 132.5113 19367.0664 1.2126 13.0647 76.542 9.568 9.1890 53258.0898
30000 0.3030 132.5935 19383.4375 1.2125 13.0105 76.861 9.608 9.1845 53201.2852
35000 0.3535 132.3267 19372.5156 1.2127 13.1134 76.258 9.532 9.1754 53229.7070
40000 0.4040 132.5935 19367.0664 1.2127 13.0356 76.713 9.589 9.1928 53229.7070
45000 0.4545 132.4908 19372.5156 1.2126 13.0611 76.563 9.57 9.1826 53258.0898
50000 0.5051 132.2447 19405.3008 1.2126 13.07 76.511 9.564 9.1803 53286.5391
55000 0.5556 132.6346 19405.3008 1.2126 13.0134 76.844 9.605 9.1917 53229.7070
60000 0.6061 132.6346 19405.3008 1.2126 13.0453 76.656 9.582 9.1883 53258.0898
65000 0.6566 132.6346 19394.3652 1.2126 13.0475 76.643 9.58 9.1928 53258.0898
70000 0.7071 132.5935 19427.1680 1.2125 13.0602 76.568 9.571 9.1830 53229.7070
75000 0.7576 132.4292 19405.3008 1.2126 13.0658 76.535 9.567 9.1788 53229.7070
80000 0.8081 132.6346 19405.3008 1.2127 13.0497 76.63 9.579 9.1871 53229.7070
85000 0.8586 132.5935 19405.3008 1.2126 13.0439 76.664 9.583 9.1879 53229.7070
90000 0.9091 132.5935 19405.3008 1.2126 13.0368 76.706 9.588 9.1814 53229.7070
95000 0.9596 132.5935 19405.3008 1.2126 13.0326 76.731 9.591 9.1868 53229.7070
99000 1.0 132.5935 19405.3008 1.2126 13.0629 76.553 9.569 9.1860 53229.7070

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0