lapp0's picture
End of training
7e65596 verified
|
raw
history blame
No virus
4.38 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.10
    results: []

distily_bench_obj_cross_v2.10

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 12766.3359
  • eval_frwikippl: 57742.3438
  • eval_zhwikippl: 65334.25
  • eval_tinystoriesppl: 4770.0942
  • eval_loss: 5.2085
  • eval_runtime: 13.0328
  • eval_samples_per_second: 76.73
  • eval_steps_per_second: 9.591

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6048 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 61801.5039 81001.6719 6.4680 13.0128 76.847 9.606 44522.7852 75358.2109
5000 0.0505 12766.3359 57742.3438 5.2085 12.9999 76.923 9.615 4771.6733 65264.5664
10000 0.1010 12766.3359 57742.3438 5.2085 13.0144 76.838 9.605 4768.5161 65334.25
15000 0.1515 12766.3359 57742.3438 5.2085 13.0239 76.782 9.598 4770.0942 65334.25
20000 0.2020 12766.3359 57742.3438 5.2085 12.9909 76.977 9.622 4769.3076 65334.25
25000 0.2525 12766.3359 57709.8086 5.2083 13.1403 76.102 9.513 4768.5161 65334.25
30000 0.3030 12766.3359 57709.8086 5.2083 13.0382 76.698 9.587 4768.5161 65334.25
35000 0.3535 12766.3359 57742.3438 5.2083 13.0826 76.438 9.555 4770.0942 65334.25
40000 0.4040 12766.3359 57742.3438 5.2085 13.0472 76.645 9.581 4769.3076 65334.25
45000 0.4545 12766.3359 57742.3438 5.2085 13.1664 75.951 9.494 4770.0942 65334.25
50000 0.5051 12766.3359 57742.3438 5.2083 13.047 76.646 9.581 4768.5161 65334.25
55000 0.5556 12766.3359 57742.3438 5.2083 13.2134 75.681 9.46 4768.5161 65334.25
60000 0.6061 12766.3359 57742.3438 5.2087 13.0275 76.761 9.595 4769.3076 65334.25
65000 0.6566 12766.3359 57742.3438 5.2083 13.1101 76.277 9.535 4768.5161 65334.25
70000 0.7071 12766.3359 57742.3438 5.2085 13.0485 76.637 9.58 4771.6733 65334.25
75000 0.7576 12766.3359 57742.3438 5.2085 13.0209 76.8 9.6 4768.5161 65299.4297
80000 0.8081 12766.3359 57742.3438 5.2085 13.0587 76.577 9.572 4771.6733 65334.25
85000 0.8586 12766.3359 57742.3438 5.2085 13.0404 76.685 9.586 4770.0942 65299.4297
90000 0.9091 12766.3359 57742.3438 5.2087 13.0082 76.874 9.609 4770.0942 65334.25
95000 0.9596 12766.3359 57742.3438 5.2085 13.0077 76.878 9.61 4769.3076 65334.25
99000 1.0 12766.3359 57742.3438 5.2085 13.0328 76.73 9.591 4770.0942 65334.25

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0