lapp0's picture
End of training
cd11314 verified
|
raw
history blame
No virus
4.39 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.10
    results: []

distily_bench_obj_cross_v2.10

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 4839.0244
  • eval_frwikippl: 39016.5469
  • eval_zhwikippl: 56057.3555
  • eval_tinystoriesppl: 1738.1987
  • eval_loss: 4.8480
  • eval_runtime: 13.0187
  • eval_samples_per_second: 76.813
  • eval_steps_per_second: 9.602

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6048 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 23326.0117 54995.6562 6.1135 13.0206 76.802 9.6 17003.8281 65438.9492
5000 0.0505 4839.0244 39016.5469 4.8478 13.0137 76.842 9.605 1738.1987 56102.2266
10000 0.1010 4839.0244 39016.5469 4.8480 12.9688 77.108 9.639 1738.1987 56087.25
15000 0.1515 4839.0244 39016.5469 4.8488 12.9532 77.201 9.65 1739.0610 56102.2266
20000 0.2020 4839.0244 39016.5469 4.8488 13.014 76.84 9.605 1738.4856 56072.2734
25000 0.2525 4839.0244 39016.5469 4.8485 13.0041 76.899 9.612 1738.1987 56057.3555
30000 0.3030 4839.0244 39016.5469 4.8488 12.9823 77.028 9.629 1738.4856 56057.3555
35000 0.3535 4839.0244 39016.5469 4.8485 13.037 76.705 9.588 1739.0610 56072.2734
40000 0.4040 4839.0244 39016.5469 4.8482 12.9918 76.972 9.621 1738.1987 56057.3555
45000 0.4545 4839.0244 38994.5625 4.8485 12.9951 76.952 9.619 1738.7732 56057.3555
50000 0.5051 4839.0244 39016.5469 4.8482 12.9731 77.083 9.635 1739.3488 56072.2734
55000 0.5556 4839.0244 39016.5469 4.8478 12.9694 77.105 9.638 1738.4856 56057.3555
60000 0.6061 4839.0244 39016.5469 4.8488 13.0353 76.715 9.589 1738.7732 56057.3555
65000 0.6566 4839.0244 39016.5469 4.8478 13.0087 76.872 9.609 1738.1987 56057.3555
70000 0.7071 4839.0244 39016.5469 4.8485 13.033 76.728 9.591 1738.1987 56057.3555
75000 0.7576 4839.0244 39016.5469 4.8480 13.0328 76.729 9.591 1738.4856 56057.3555
80000 0.8081 4839.0244 39016.5469 4.8482 12.9884 76.992 9.624 1737.9111 56057.3555
85000 0.8586 4839.0244 39016.5469 4.8485 13.0047 76.895 9.612 1738.1987 56087.25
90000 0.9091 4839.0244 39016.5469 4.8478 13.0255 76.772 9.597 1738.1987 56057.3555
95000 0.9596 4839.0244 39016.5469 4.8482 13.0007 76.919 9.615 1738.1987 56057.3555
99000 1.0 4839.0244 39016.5469 4.8480 13.0187 76.813 9.602 1738.1987 56057.3555

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0