lapp0's picture
End of training
c477d3b verified
|
raw
history blame
3.17 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 2176.0
  • eval_frwikippl: 8832.0
  • eval_zhwikippl: 127488.0
  • eval_tinystoriesppl: 1776.0
  • eval_loss: 3.2370
  • eval_runtime: 12.9467
  • eval_samples_per_second: 46.344
  • eval_steps_per_second: 11.586

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=kl, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 25.4650 12.9198 46.44 11.61 12079595520.0 98956046499840.0
750 0.1010 2176.0 8832.0 3.2370 12.9467 46.344 11.586 1776.0 127488.0
1500 0.2020 780.0 4704.0 2.2858 12.9531 46.321 11.58 580.0 6528.0
2250 0.3030 448.0 2720.0 1.9337 12.9786 46.23 11.558 358.0 616.0
3000 0.4040 318.0 1424.0 1.6665 12.9898 46.19 11.548 252.0 264.0
3750 0.5051 252.0 968.0 1.4830 12.9776 46.233 11.558 206.0 494.0
4500 0.6061 187.0 680.0 1.2771 12.9626 46.287 11.572 146.0 404.0
5250 0.7071 146.0 556.0 1.1009 12.9778 46.233 11.558 113.0 224.0
6000 0.8081 134.0 490.0 1.0233 12.9863 46.202 11.551 104.0 179.0
6750 0.9091 125.0 464.0 0.9838 12.985 46.207 11.552 96.0 168.0
7425 1.0 124.0 462.0 0.9755 13.0256 46.063 11.516 95.0 162.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0