lapp0's picture
End of training
06e2b60 verified
|
raw
history blame
No virus
4.32 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.10
    results: []

distily_bench_obj_cross_v2.10

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 108.1245
  • eval_frwikippl: 11043.4336
  • eval_zhwikippl: 55788.7734
  • eval_tinystoriesppl: 6.7037
  • eval_loss: 0.7047
  • eval_runtime: 13.0964
  • eval_samples_per_second: 76.357
  • eval_steps_per_second: 9.545

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6064 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 50480.5703 85684.4844 6.8305 13.0511 76.622 9.578 33932.0586 94692.1562
5000 0.0505 107.6398 10878.2363 0.7372 13.0368 76.706 9.588 6.6259 48821.6875
10000 0.1010 103.7832 10693.6533 0.7210 13.0383 76.697 9.587 6.3876 52904.0898
15000 0.1515 113.9463 10959.7607 0.7146 13.0455 76.655 9.582 7.3001 55833.4297
20000 0.2020 102.8906 10842.2969 0.7117 13.0448 76.659 9.582 6.3362 55967.6680
25000 0.2525 107.6648 11021.6855 0.7063 13.0457 76.654 9.582 6.7065 55654.9688
30000 0.3030 107.8986 11027.8887 0.7052 13.0423 76.673 9.584 6.6954 55122.9922
35000 0.3535 107.8986 10953.5859 0.7051 12.9974 76.939 9.617 6.6910 54771.1680
40000 0.4040 107.9989 10941.2451 0.7053 13.0736 76.49 9.561 6.7123 55122.9922
45000 0.4545 107.8317 10986.0273 0.7051 13.0495 76.632 9.579 6.7056 55064.1953
50000 0.5051 107.9905 11037.2217 0.7049 13.0288 76.753 9.594 6.7202 55922.9062
55000 0.5556 108.2753 10973.6602 0.7051 13.0751 76.481 9.56 6.7202 54917.4609
60000 0.6061 108.0324 11037.2217 0.7052 13.0104 76.861 9.608 6.7093 55358.7930
65000 0.6566 108.3089 11043.4336 0.7049 13.0425 76.673 9.584 6.7123 55122.9922
70000 0.7071 108.2418 11043.4336 0.7047 12.9968 76.942 9.618 6.7065 55122.9922
75000 0.7576 107.9069 11043.4336 0.7046 13.0103 76.862 9.608 6.7004 55506.7109
80000 0.8081 108.1915 11043.4336 0.7047 13.0166 76.825 9.603 6.6979 55788.7734
85000 0.8586 108.3089 11043.4336 0.7045 13.0625 76.555 9.569 6.7076 55759.0430
90000 0.9091 108.2083 11043.4336 0.7047 13.0397 76.689 9.586 6.7059 55788.7734
95000 0.9596 108.1999 11043.4336 0.7045 13.0487 76.636 9.579 6.7062 55788.7734
99000 1.0 108.1245 11043.4336 0.7047 13.0964 76.357 9.545 6.7037 55788.7734

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0