lapp0's picture
End of training
1269b6a verified
|
raw
history blame
4.95 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.2
    results: []

distily_bench_obj_cross_v2.2

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 28257.9004
  • eval_frwikippl: 63896.6680
  • eval_zhwikippl: 90059.6875
  • eval_tinystoriesppl: 18426.4922
  • eval_loss: 6.6740
  • eval_runtime: 13.137
  • eval_samples_per_second: 76.121
  • eval_steps_per_second: 9.515

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0568 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 35507.3906 70936.2969 6.875 13.2774 75.316 9.414 24370.3125 92840.9844
500 0.0404 28284.1875 63896.6680 6.6737 13.1884 75.824 9.478 18447.8379 90059.6875
1000 0.0808 28284.1875 63896.6680 6.6740 13.221 75.637 9.455 18444.7754 90059.6875
1500 0.1212 28284.1875 63896.6680 6.6740 13.1643 75.963 9.495 18444.7754 90059.6875
2000 0.1616 28284.1875 63896.6680 6.6740 13.2331 75.568 9.446 18438.6914 90059.6875
2500 0.2020 28284.1875 63896.6680 6.6740 13.1865 75.835 9.479 18432.5898 90059.6875
3000 0.2424 28257.9004 63896.6680 6.6740 13.246 75.494 9.437 18426.4922 90059.6875
3500 0.2828 28257.9004 63896.6680 6.6740 13.1762 75.895 9.487 18426.4922 90059.6875
4000 0.3232 28257.9004 63896.6680 6.6740 13.3585 74.859 9.357 18426.4922 90059.6875
4500 0.3636 28257.9004 63896.6680 6.6740 13.1842 75.848 9.481 18426.4922 90059.6875
5000 0.4040 28257.9004 63896.6680 6.6740 13.2694 75.361 9.42 18426.4922 90059.6875
5500 0.4444 28257.9004 63896.6680 6.6740 13.2102 75.699 9.462 18426.4922 90059.6875
6000 0.4848 28257.9004 63896.6680 6.6740 13.3012 75.181 9.398 18426.4922 90059.6875
6500 0.5253 28257.9004 63896.6680 6.6740 13.1704 75.928 9.491 18426.4922 90059.6875
7000 0.5657 28257.9004 63896.6680 6.6740 13.2236 75.622 9.453 18426.4922 90059.6875
7500 0.6061 28257.9004 63896.6680 6.6740 13.2333 75.567 9.446 18426.4922 90059.6875
8000 0.6465 28257.9004 63896.6680 6.6740 13.1385 76.112 9.514 18426.4922 90059.6875
8500 0.6869 28257.9004 63896.6680 6.6740 13.2297 75.588 9.448 18426.4922 90059.6875
9000 0.7273 28257.9004 63896.6680 6.6740 13.1073 76.293 9.537 18426.4922 90059.6875
9500 0.7677 28257.9004 63896.6680 6.6740 13.137 76.121 9.515 18426.4922 90059.6875
10000 0.8081 28257.9004 63896.6680 6.6740 13.0862 76.417 9.552 18426.4922 90059.6875
10500 0.8485 28257.9004 63896.6680 6.6740 13.17 75.93 9.491 18426.4922 90059.6875
11000 0.8889 28257.9004 63896.6680 6.6740 13.211 75.694 9.462 18426.4922 90059.6875
11500 0.9293 28257.9004 63896.6680 6.6740 13.1171 76.237 9.53 18426.4922 90059.6875
12000 0.9697 28257.9004 63896.6680 6.6740 13.2484 75.481 9.435 18426.4922 90059.6875
12375 1.0 28257.9004 63896.6680 6.6740 13.2116 75.691 9.461 18426.4922 90059.6875

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0