pszemraj's picture
End of training
6705ca6 verified
|
raw
history blame
1.95 kB
metadata
library_name: transformers
language:
  - en
license: apache-2.0
base_model: google/flan-t5-xl
tags:
  - generated_from_trainer
model-index:
  - name: flan-t5-xl-summary-map-reduce-1024
    results: []

flan-t5-xl-summary-map-reduce-1024

This model is a fine-tuned version of google/flan-t5-xl on the pszemraj/summary-map-reduce dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6039
  • Num Input Tokens Seen: 7138765

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 17868
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.8172 0.3851 100 0.6644 1364870
0.7664 0.7702 200 0.6271 2744502
0.6584 1.1552 300 0.6146 4137699
0.6348 1.5403 400 0.6049 5518719
0.6372 1.9254 500 0.6038 6895203

Framework versions

  • Transformers 4.46.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.2