lapp0's picture
Training in progress, step 6187
2b91ced verified
|
raw
history blame
1.97 kB
metadata
license: mit
base_model: gpt2
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_linear_objectives
    results: []

distily_bench_gpt2_linear_objectives

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 8689.6641

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 338272.25
14496.0 0.0808 500 11190.9121
12080.0 0.1616 1000 10443.1357
15856.0 0.2424 1500 10039.6162
12192.0 0.3232 2000 9736.1924
12784.0 0.4040 2500 9512.1279
13216.0 0.4848 3000 9331.4561
10856.0 0.5657 3500 9238.5283
12256.0 0.6465 4000 9076.0957
9856.0 0.7273 4500 8944.4482
12512.0 0.8081 5000 8849.7275
10224.0 0.8889 5500 8798.5918
10688.0 0.9697 6000 8741.2480

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0
  • Tokenizers 0.19.1