lapp0's picture
End of training
6d140c7 verified
|
raw
history blame
2.86 kB
metadata
base_model: Qwen/Qwen2-0.5B-Instruct
library_name: distily
license: apache-2.0
tags:
  - generated_from_trainer
model-index:
  - name: distily_experiments_loss_kl
    results: []

distily_experiments_loss_kl

This student model is distilled from the teacher model Qwen/Qwen2-0.5B-Instruct using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: nan
  • eval_frwikippl: nan
  • eval_zhwikippl: nan
  • eval_loss: nan
  • eval_runtime: 90.8319
  • eval_samples_per_second: 11.009
  • eval_steps_per_second: 2.752

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_strategy: logits_activations
  • loss_fn: kl
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 19.8804 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 13.0697 11.6518 21.6262
0 0 179854.0938 180181.3125 782.5541 90.5683 11.041 2.76 180786.8438
500 0.0808 nan nan nan 90.7978 11.013 2.753 nan
1000 0.1616 nan nan nan 91.1195 10.975 2.744 nan
1500 0.2424 nan nan nan 90.9106 11.0 2.75 nan
2000 0.3232 nan nan nan 90.7616 11.018 2.754 nan
2500 0.4040 nan nan nan 90.73 11.022 2.755 nan
3000 0.4848 nan nan nan 90.7247 11.022 2.756 nan
3500 0.5657 nan nan nan 90.5701 11.041 2.76 nan
4000 0.6465 nan nan nan 90.768 11.017 2.754 nan
4500 0.7273 nan nan nan 90.8206 11.011 2.753 nan
5000 0.8081 nan nan nan 90.7225 11.023 2.756 nan
5500 0.8889 nan nan nan 90.7674 11.017 2.754 nan
6000 0.9697 nan nan nan 90.7133 11.024 2.756 nan
6187 0.9999 nan nan nan 90.8319 11.009 2.752 nan

Framework versions

  • Distily 0.1.0
  • Transformers 4.43.3
  • Pytorch 2.3.0
  • Datasets 2.20.0