mT5-small-sum-de-mit-v1

This is a German summarization model. It is based on the multilingual T5 model google/mt5-small. The special characteristic of this model is that, unlike many other models, it is licensed under a permissive open source license (MIT). Among other things, this license allows commercial use.

This model is provided by the One Conversation team of Deutsche Telekom AG.

Training

The training was conducted with the following hyperparameters:

base model: google/mt5-small
source_prefix: "summarize: "
batch size: 3 (6)
max_source_length: 800
max_target_length: 96
warmup_ratio: 0.3
number of train epochs: 10
gradient accumulation steps: 2
learning rate: 5e-5

Datasets and Preprocessing

The datasets were preprocessed as follows:

The summary was tokenized with the google/mt5-small tokenizer. Then only the records with no more than 94 summary tokens were selected.

This model is trained on the following dataset:

Name	Language	Size	License
SwissText 2019 - Train	de	84,564	Concrete license is unclear. The data was published in the German Text Summarization Challenge.

We have permission to use the Swisstext dataset and release the resulting summarization model under MIT license (see permission-declaration-swisstext.pdf).

Evaluation on MLSUM German Test Set (no beams)

Model	rouge1	rouge2	rougeL	rougeLsum
deutsche-telekom/mt5-small-sum-de-mit-v1 (this)	16.8023	3.5531	12.6884	14.7624
ml6team/mt5-small-german-finetune-mlsum	18.3607	5.3604	14.5456	16.1946
deutsche-telekom/mt5-small-sum-de-en-01	21.7336	7.2614	17.1323	19.3977

License

Licensed under the MIT License (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.