flan-t5-xl-summary-map-reduce-1024

A larger t2t model trained to complete the "reduce" step (consolidation step) of map-reduce summarization.

About

Refer to this wiki page or the smaller BART model card for explanations and usage examples.

Comparatively, this model seems to

produce more eloquent final reduced summaries
more "gullible"/sensitive to noise in the input summaries
- i.e. a hallucinated one-off term/name/entity is likely to be mentioned/appear in the reduced summary
agnostic to whitespace in input (by definition, since the t5 tokenizer normalizes whitespace)

Therefore, it's recommended to compare sample outputs of this model and the BART version on your data to see which is better for your use case.

This model is a fine-tuned version of google/flan-t5-xl on the pszemraj/summary-map-reduce-v1 dataset at 1024 context length in/out.

It achieves the following results on the evaluation set:

The following hyperparameters were used during training:

learning_rate: 8e-05
train_batch_size: 2
eval_batch_size: 2
seed: 17868
gradient_accumulation_steps: 32
total_train_batch_size: 64
optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 2.0

Safetensors

Model size

3B params

Tensor type

F32

Base model

Quantized

(3)

this model