lapp0's picture
End of training
b47538a verified
|
raw
history blame
3.17 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1248.0
  • eval_frwikippl: 6560.0
  • eval_zhwikippl: 24448.0
  • eval_tinystoriesppl: 968.0
  • eval_loss: 2.4413
  • eval_runtime: 12.6655
  • eval_samples_per_second: 47.373
  • eval_steps_per_second: 11.843

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.9388 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1065151889408.0 117097988358144.0 20.4033 12.6433 47.456 11.864 4362076160.0 27762668601344.0
750 0.1010 1248.0 6560.0 2.4413 12.6655 47.373 11.843 968.0 24448.0
1500 0.2020 500.0 3712.0 1.8019 12.6495 47.433 11.858 352.0 820.0
2250 0.3030 342.0 1832.0 1.5659 12.681 47.315 11.829 276.0 294.0
3000 0.4040 245.0 932.0 1.3644 12.6925 47.272 11.818 198.0 228.0
3750 0.5051 191.0 680.0 1.2038 12.6837 47.305 11.826 159.0 219.0
4500 0.6061 148.0 596.0 1.0541 12.6874 47.291 11.823 127.5 188.0
5250 0.7071 129.0 442.0 0.9331 12.6692 47.359 11.84 102.5 126.5
6000 0.8081 117.0 412.0 0.8757 12.6912 47.277 11.819 94.5 124.5
6750 0.9091 112.0 398.0 0.8483 12.8574 46.666 11.666 90.0 119.5
7425 1.0 110.5 392.0 0.8433 12.717 47.181 11.795 89.0 119.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0