language:
- en
tags:
- summarization
widget:
- text: >
We here recount the main elements of a classic bag-of-features model
before introducing the simpler DNN-based BagNets in the next paragraph.
Bag-of-feature representations can be described by analogy to bag-of-words
representations. With bag-of-words, one counts the number of occurrences
of words from a vocabulary in a document. This vocabulary contains
important words (but not common ones like "and" or "the") and word
clusters (i.e. semantically similar words like "gigantic" and "enormous"
are subsumed). The counts of each word in the vocabulary are assembled as
one long term vector. This is called the bag-of-words document
representation because all ordering of the words is lost. Likewise,
bag-of-feature representations are based on a vocabulary of visual words
which represent clusters of local image features. The term vector for an
image is then simply the number of occurrences of each visual word in the
vocabulary. This term vector is used as an input to a classifier (e.g. SVM
or MLP). Many successful image classification models have been based on
this pipeline (Csurka et al., 2004; Jurie & Triggs, 2005; Zhang et al.,
2007; Lazebnik et al., 2006), see O’Hara & Draper (2011) for an up-to-date
overview.
- text: >
The goal of reducing sequential computation also forms the foundation of
the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9], all of which
use convolutional neural networks as basic building block, computing
hidden representations in parallel for all input and output positions. In
these models, the number of operations required to relate signals from two
arbitrary input or output positions grows in the distance between
positions, linearly for ConvS2S and logarithmically for ByteNet. This
makes it more difficult to learn dependencies between distant positions
[12]. In the Transformer this is reduced to a constant number of
operations, albeit at the cost of reduced effective resolution due to
averaging attention-weighted positions, an effect we counteract with
Multi-Head Attention as described in section 3.2.
Self-attention, sometimes called intra-attention is an attention mechanism
relating different positions of a single sequence in order to compute a
representation of the sequence. Self-attention has been used successfully
in a variety of tasks including reading comprehension, abstractive
summarization, textual entailment and learning task-independent sentence
representations [4, 27, 28, 22].
End-to-end memory networks are based on a recurrent attention mechanism
instead of sequencealigned recurrence and have been shown to perform well
on simple-language question answering and language modeling tasks [34].
To the best of our knowledge, however, the Transformer is the first
transduction model relying entirely on self-attention to compute
representations of its input and output without using sequencealigned RNNs
or convolution. In the following sections, we will describe the
Transformer, motivate self-attention and discuss its advantages over
models such as [17, 18] and [9].
license:
- mit
pipeline_tag: summarization
Bart-Large Summarization Model
This repository contains the Bart-Large-paper2slides-summarizer Model, which has been fine-tuned on the Automatic Slide Generation from Scientific Papers dataset using unsupervised learning techniques using an algorithm from the paper entitled 'Unsupervised Machine Translation Using Monolingual Corpora Only'. Its primary focus is to summarize scientific texts with precision and accuracy, the model is parallelly trained with the Bart-large-paper2slides-expander from the same contributor.
Model Details
- Model Architecture: Bart-Large
- Fine-tuning Dataset: Automatic Slide Generation from Scientific Papers
- Fine-tuning Method: Unsupervised Learning
Bart (Bidirectional and Auto-Regressive Transformers) is a sequence-to-sequence (seq2seq) model developed by Facebook AI Research. It has shown exceptional performance in various natural language processing (NLP) tasks such as text summarization, text generation, and machine translation.
This particular model, Bart-Large, is the larger version of the Bart model. It consists of 12 encoder and decoder layers and has a total of 400 million parameters.
Usage
To use this model, you can leverage the Hugging Face Transformers library. Here's an example of how to use it in Python:
from transformers import BartTokenizer, BartForConditionalGeneration, pipeline
# Load the model and tokenizer
model_name = "com3dian/Bart-large-paper2slides-summarizer"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)
# Generate summary from input text
input_text = "Your input text here..."
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids)
# Decode generated summaries
summary = tokenizer.decode(output[0], skip_special_tokens=True)
print(summary)
# Or using the pipeline API
summarizer = pipeline("summarization", model=model_name)
summary = summarizer(input_text, max_length=50, min_length=30, do_sample=False)
print(summary)
Ensure you have the transformers
library installed before running the code. You can install it using pip
:
pip install transformers
Model Fine-tuning Details
The fine-tuning process for this model involved training on the slide generation dataset using unsupervised learning techniques. Unsupervised learning refers to training a model without explicit human-labeled targets. Instead, the model learns to back-summarize the input provided by the expansion model, into the original texts.
The specific hyperparameters and training details used for fine-tuning this model are as follows:
- Batch Size: 4
- Learning Rate: 2e-6
- Training Steps: 3*7
- Optimizer: AdamW
Model Performance
The Bart-Large Slide Generation Model has undergone thorough human evaluation in a wide range of scientific domains, including AI, mathematics, statistics, history, geography, and climate science, to compare its performance with the Bart-large-cnn model.
Acknowledgments
We would like to acknowledge the authors of the Bart model and the creators of the slide generation dataset for their valuable contributions, which have enabled the development of this fine-tuned model.
If you use this model or find it helpful in your work, please consider citing the original Bart model, the slide generation dataset, and this paper to provide proper credit to the respective authors.
License
This model and the associated code are released under the MIT license.