Text Summarization Model with Seq2Seq and LSTM

This model is a sequence-to-sequence (seq2seq) model for text summarization. It uses a bidirectional LSTM encoder and an LSTM decoder to generate summaries from input articles. The model was trained on a dataset with sequences of length up to 800 tokens.

Dataset

CNN-DailyMail News Text Summarization from kaggle

Model Architecture

Encoder

  • Input Layer: Takes input sequences of length max_len_article.
  • Embedding Layer: Converts input sequences into dense vectors of size 100.
  • Bidirectional LSTM Layer: Processes the embedded input, capturing dependencies in both forward and backward directions. Outputs hidden and cell states from both directions.
  • State Concatenation: Combines forward and backward hidden and cell states to form the final encoder states.

Decoder

  • Input Layer: Takes target sequences of variable length.
  • Embedding Layer: Converts target sequences into dense vectors of size 100.
  • LSTM Layer: Processes the embedded target sequences using an LSTM with the initial states set to the encoder states.
  • Dense Layer: Applies a Dense layer with softmax activation to generate the probabilities for each word in the vocabulary.

Model Summary

Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) [(None, 800)] 0 -
embedding (Embedding) (None, 800, 100) 47,619,900 input_1[0][0]
bidirectional [(None, 200), 160,800 embedding[0][0]
(Bidirectional) (None, 100),
(None, 100),
(None, 100),
(None, 100)]
input_2 (InputLayer) [(None, None)] 0 -
embedding_1 (None, None, 100) 15,515,800 input_2[0][0]
(Embedding)
concatenate (None, 200) 0 bidirectional[0][1]
(Concatenate) bidirectional[0][3]
concatenate_1 (None, 200) 0 bidirectional[0][2]
(Concatenate) bidirectional[0][4]
lstm [(None, None, 200), 240,800 embedding_1[0][0]
(LSTM) (None, 200), concatenate[0][0]
(None, 200)] concatenate_1[0][0]
dense (Dense) (None, None, 155158) 31,186,758 lstm[0][0]

Total params: 94,724,060

Trainable params: 94,724,058

Non-trainable params: 0

Training

The model was trained on a dataset with sequences of length up to 800 tokens using the following configuration:

  • Optimizer: Adam
  • Loss Function: Categorical Crossentropy
  • Metrics: Accuracy

Training Loss and Validation Loss

Epoch Training Loss Validation Loss Time per Epoch (s)
1 3.9044 0.4543 3087
2 0.3429 0.0976 3091
3 0.1054 0.0427 3096
4 0.0490 0.0231 3099
5 0.0203 0.0148 3098

Test Loss

Test Loss
0.014802712015807629

Usage -- I will update this soon

To use this model, you can load it using the Hugging Face Transformers library:

from transformers import TFAutoModel

model = TFAutoModel.from_pretrained('your-model-name')

from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained('your-model-name')
model = TFAutoModelForSeq2SeqLM.from_pretrained('your-model-name')

article = "Your input text here."
inputs = tokenizer.encode("summarize: " + article, return_tensors="tf", max_length=800, truncation=True)
summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print(summary)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.