BART-Large-CNN-Enhanced

The BART-Large-CNN-Enhanced is a fine-tuned version of the facebook/bart-large-cnn model. It has been optimized on the CNN/DailyMail dataset, achieving a 5% overall improvement in ROUGE scores compared to the base model.

  • Developed by: phanerozoic
  • Model type: BartForConditionalGeneration
  • Source model: facebook/bart-large-cnn
  • License: cc-by-nc-4.0
  • Languages: English

Model Details

BART-Large-CNN-Enhanced utilizes a transformer-based architecture with a sequence-to-sequence approach, tailored specifically for text summarization tasks. This model builds upon the strengths of the original BART architecture by further refining its ability to understand and generate human-like summaries.

Configuration

  • Max input length: 1024 tokens
  • Max target length: 128 tokens
  • Learning rate: 1e-5
  • Batch size: 32
  • Epochs: 1
  • Hardware used: NVIDIA RTX 6000 Ada Lovelace

Training and Evaluation Data

The model was re-trained on 1 epoch of the CNN/DailyMail dataset, a comprehensive collection of news articles paired with human-written summaries. This dataset is widely used as a benchmark for evaluating text summarization models due to its size and the quality of its annotations.

Training Procedure

The training involved fine-tuning the pre-trained facebook/bart-large-cnn model with the following settings:

  • Epochs: 1
  • Batch size: 32
  • Learning rate: 1e-5
  • Training time: 7 hours 19 minutes 24 seconds
  • Loss: 0.618

During training, the model was optimized to reduce the loss function, enhancing its ability to generate summaries that are both concise and informative.

Performance

The fine-tuning process resulted in significant performance improvements:

  • ROUGE-1: 45.37 (5.62% improvement over the base model score of 42.949)
  • ROUGE-2: 22.00 (5.71% improvement over the base model score of 20.815)
  • ROUGE-L: 31.17 (1.80% improvement over the base model score of 30.619)

These scores reflect the model’s enhanced ability to capture the key elements of the source text and produce coherent summaries that are faithful to the original content.

Comparing Performance to Base Model

To illustrate the improvements made by the BART-Large-CNN-Enhanced model, we used the same article featured on the base model's widget, allowing for a direct summary comparison. The article describes the Eiffel Tower, its dimensions, and its historical significance. Below are the summaries generated by both models:

Given Article

The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.

Summary by BART-Large-CNN-Enhanced

The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world.

Summary by Facebook's BART-Large-CNN

The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world.

Analysis

  • Coverage:

    • Enhanced Model: Includes key details such as the Eiffel Tower being the tallest structure in Paris and its historical significance in surpassing the Washington Monument.
    • Base Model: Provides additional details about the base dimensions but omits the detail about the Eiffel Tower being the tallest structure in Paris.
  • Conciseness:

    • Enhanced Model: More concise, focusing on the most critical historical and current facts.
    • Base Model: Slightly longer, with extra details about the base dimensions.
  • Relevance:

    • Enhanced Model: Captures the most relevant details, making it more informative for someone looking for key highlights.
    • Base Model: Adds context with base dimensions, which might be less critical depending on the summary's intended use.

This comparison highlights the BART-Large-CNN-Enhanced model's improved ability to generate more concise and relevant summaries by focusing on significant details, such as the Eiffel Tower being the tallest structure in Paris, which the base model missed. This makes the enhanced model more effective for generating high-impact summaries for users seeking essential information quickly.

Overall Appraisal

The BART-Large-CNN-Enhanced model demonstrates remarkable improvements and solidifies its position as a robust tool for text summarization. Here are the key points of its appraisal:

  • Standard Performance: The model excels in generating summaries for news articles, achieving significantly improved ROUGE scores compared to the base model. Its ability to distill lengthy articles into concise and coherent summaries while preserving the essential information makes it particularly valuable for applications such as news aggregation and content curation.

Usage

This model is highly effective for generating summaries in English texts, particularly in contexts similar to the news articles dataset upon which the model was trained. It can be used in various applications, including news aggregation, content summarization, and information retrieval.

Limitations

While the model excels in contexts similar to its training data (news articles), its performance might vary on text from other domains or in other languages. Future enhancements could involve expanding the training data to include more diverse text sources, which would improve its generalizability and robustness.

Acknowledgments

Special thanks to the developers of the BART architecture and the Hugging Face team. Their tools and frameworks were instrumental in the development and fine-tuning of this model. The NVIDIA RTX 6000 Ada Lovelace hardware provided the necessary computational power to achieve these results.

Downloads last month
28
Safetensors
Model size
406M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train phanerozoic/BART-Large-CNN-Enhanced

Collection including phanerozoic/BART-Large-CNN-Enhanced

Evaluation results