pszemraj
/

led-large-book-summary

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on May 23, 2022

Commit

47f9b27

•

1 Parent(s): bb45ce3

add training details

Files changed (1) hide show

README.md +42 -6

README.md CHANGED Viewed

@@ -128,15 +128,23 @@ result = summarizer(
 ```
 ## Training and evaluation data
 - the [booksum](https://arxiv.org/abs/2105.08209) dataset
-- During training, the input text was the text of the chapter, and the output was the summary text
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
 - train_batch_size: 1
@@ -149,13 +157,41 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 3
-### Training results
 ### Framework versions
-- Transformers 4.16.2
-- Pytorch 1.10.0+cu113
-- Datasets 1.18.3
-- Tokenizers 0.11.0

 ```
+**Important:** To generate the best quality summaries, you should use the global attention mask when decoding, as demonstrated in [this community notebook here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing), see the definition of `generate_answer(batch)`.
 ## Training and evaluation data
 - the [booksum](https://arxiv.org/abs/2105.08209) dataset
+- During training, the input text was the text of the `chapter`, and the output was `summary_text`
 ## Training procedure
+- Training completed on the BookSum dataset for 13 total epochs
+- **The final four epochs combined the training and validation sets as 'train' in an effort to increase generalization.**
 ### Training hyperparameters
+#### Initial Three Epochs
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
 - train_batch_size: 1
 - lr_scheduler_type: linear
 - num_epochs: 3
+#### In-between Epochs
+Unfortunately, don't have all records on-hand for middle epochs, the following should be representative:
+- learning_rate: 4e-05
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- distributed_type: multi-GPU
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 32
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.05
+- num_epochs: 6 (in addition to prior model)
+#### Final Two Epochs
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 2 (in addition to prior model)
 ### Framework versions
+- Transformers 4.19.2
+- Pytorch 1.11.0+cu113
+- Datasets 2.2.2
+- Tokenizers 0.12.1