lapp0's picture
End of training
29c575f verified
|
raw
history blame
3.17 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 2160.0
  • eval_frwikippl: 9536.0
  • eval_zhwikippl: 98816.0
  • eval_tinystoriesppl: 1960.0
  • eval_loss: 3.2783
  • eval_runtime: 12.9016
  • eval_samples_per_second: 46.506
  • eval_steps_per_second: 11.626

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=kl, layer_mapper=all, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1855425871872.0 61297773248512.0 26.3692 12.8695 46.622 11.655 14495514624.0 11338713661440.0
750 0.1010 2160.0 9536.0 3.2783 12.9016 46.506 11.626 1960.0 98816.0
1500 0.2020 740.0 4640.0 2.2764 12.9025 46.503 11.626 612.0 9728.0
2250 0.3030 450.0 2656.0 1.9411 12.9079 46.483 11.621 344.0 564.0
3000 0.4040 312.0 1544.0 1.6865 12.9248 46.422 11.606 272.0 304.0
3750 0.5051 237.0 964.0 1.4813 12.9469 46.343 11.586 199.0 252.0
4500 0.6061 185.0 712.0 1.2780 13.0095 46.12 11.53 144.0 254.0
5250 0.7071 143.0 520.0 1.1058 13.0283 46.054 11.513 114.5 212.0
6000 0.8081 130.0 478.0 1.0297 12.9977 46.162 11.541 103.0 180.0
6750 0.9091 123.5 442.0 0.9899 12.9475 46.341 11.585 99.0 160.0
7425 1.0 122.5 438.0 0.9819 12.9461 46.346 11.587 97.5 160.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0