lapp0's picture
End of training
aed0425 verified
|
raw
history blame
3.48 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.15_gpt2
    results: []

distily_bench_obj_cross_v2.15_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 2352.0
  • eval_frwikippl: 10240.0
  • eval_zhwikippl: 109056.0
  • eval_tinystoriesppl: 1920.0
  • eval_loss: 2.6449
  • eval_runtime: 17.0132
  • eval_samples_per_second: 58.778
  • eval_steps_per_second: 7.347

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=mse, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0892 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1984274890752.0 213305255788544.0 21.1260 17.0018 58.817 7.352 3774873600.0 74217034874880.0
1000 0.0808 516.0 3952.0 1.7644 17.0372 58.695 7.337 412.0 3760.0
2000 0.1616 516.0 3872.0 1.7584 17.0128 58.779 7.347 462.0 860.0
3000 0.2424 864.0 4672.0 2.0719 17.0071 58.799 7.35 788.0 2448.0
4000 0.3232 1888.0 9344.0 2.5277 17.1241 58.397 7.3 1696.0 26880.0
5000 0.4040 2008.0 7712.0 2.5758 17.0318 58.714 7.339 2256.0 48128.0
6000 0.4848 2352.0 9984.0 2.6397 17.1643 58.26 7.283 1856.0 54528.0
7000 0.5657 2416.0 12096.0 2.6472 17.0957 58.494 7.312 1880.0 109568.0
8000 0.6465 2448.0 9856.0 2.6570 17.0094 58.791 7.349 1960.0 115712.0
9000 0.7273 2352.0 10240.0 2.6449 17.0132 58.778 7.347 1920.0 109056.0
10000 0.8081 2320.0 9344.0 2.6556 17.0386 58.69 7.336 2096.0 87040.0
11000 0.8889 2304.0 12224.0 2.6333 17.0346 58.704 7.338 1888.0 130048.0
12000 0.9697 2208.0 10368.0 2.6107 17.0435 58.674 7.334 1808.0 98816.0
12375 1.0 2256.0 10304.0 2.6066 17.0663 58.595 7.324 1696.0 80896.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0