lapp0's picture
End of training
42649e7 verified
|
raw
history blame
3.17 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1360.0
  • eval_frwikippl: 5600.0
  • eval_zhwikippl: 132096.0
  • eval_tinystoriesppl: 904.0
  • eval_loss: 3.0667
  • eval_runtime: 12.9338
  • eval_samples_per_second: 46.39
  • eval_steps_per_second: 11.598

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=cos, layer_mapper=all, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 20.2008 12.9195 46.441 11.61 12079595520.0 98956046499840.0
750 0.1010 1360.0 5600.0 3.0667 12.9338 46.39 11.598 904.0 132096.0
1500 0.2020 584.0 3600.0 2.2210 12.9258 46.419 11.605 444.0 968.0
2250 0.3030 382.0 2024.0 1.9283 12.9374 46.377 11.594 290.0 372.0
3000 0.4040 268.0 1088.0 1.6657 12.9348 46.387 11.597 230.0 204.0
3750 0.5051 208.0 732.0 1.4758 12.9387 46.372 11.593 174.0 218.0
4500 0.6061 169.0 564.0 1.2952 13.0113 46.114 11.528 145.0 142.0
5250 0.7071 137.0 482.0 1.1321 12.9425 46.359 11.59 111.0 139.0
6000 0.8081 125.0 448.0 1.0644 13.0023 46.146 11.536 100.5 123.5
6750 0.9091 120.0 434.0 1.0300 12.9661 46.274 11.569 96.5 119.0
7425 1.0 119.0 430.0 1.0247 13.1477 45.635 11.409 95.0 118.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0