Transformers
Safetensors
English
pegasus
text2text-generation
text-simplification
WikiLarge
Eval Results (legacy)
Instructions to use eilamc14/pegasus-xsum-text-simplification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eilamc14/pegasus-xsum-text-simplification with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("eilamc14/pegasus-xsum-text-simplification") model = AutoModelForSeq2SeqLM.from_pretrained("eilamc14/pegasus-xsum-text-simplification") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| license: apache-2.0 | |
| base_model: google/pegasus-xsum | |
| datasets: | |
| - eilamc14/wikilarge-clean | |
| language: | |
| - en | |
| tags: | |
| - pegasus | |
| - text-simplification | |
| - WikiLarge | |
| model-index: | |
| - name: pegasus-xsum-text-simplification | |
| results: | |
| - task: | |
| type: text2text-generation | |
| name: Text Simplification | |
| dataset: | |
| name: ASSET | |
| type: facebook/asset | |
| url: https://huggingface.co/datasets/facebook/asset | |
| split: test | |
| metrics: | |
| - type: SARI | |
| value: 33.80 | |
| - type: FKGL | |
| value: 9.23 | |
| - type: BERTScore | |
| value: 87.54 | |
| - type: LENS | |
| value: 62.46 | |
| - type: Identical ratio | |
| value: 0.29 | |
| - type: Identical ratio (ci) | |
| value: 0.29 | |
| - task: | |
| type: text2text-generation | |
| name: Text Simplification | |
| dataset: | |
| name: MEDEASI | |
| type: cbasu/Med-EASi | |
| url: https://huggingface.co/datasets/cbasu/Med-EASi | |
| split: test | |
| metrics: | |
| - type: SARI | |
| value: 32.68 | |
| - type: FKGL | |
| value: 10.98 | |
| - type: BERTScore | |
| value: 45.14 | |
| - type: LENS | |
| value: 50.55 | |
| - type: Identical ratio | |
| value: 0.30 | |
| - type: Identical ratio (ci) | |
| value: 0.30 | |
| - task: | |
| type: text2text-generation | |
| name: Text Simplification | |
| dataset: | |
| name: OneStopEnglish | |
| type: OneStopEnglish | |
| url: https://github.com/nishkalavallabhi/OneStopEnglishCorpus | |
| split: advanced→elementary | |
| metrics: | |
| - type: SARI | |
| value: 37.07 | |
| - type: FKGL | |
| value: 8.66 | |
| - type: BERTScore | |
| value: 77.77 | |
| - type: LENS | |
| value: 60.97 | |
| - type: Identical ratio | |
| value: 0.40 | |
| - type: Identical ratio (ci) | |
| value: 0.40 | |
| # Model Card for Model ID | |
| This is one of the models fine-tuned on text simplification for [Simplify This](https://github.com/eilamc14/Simplify-This) project. | |
| ## Model Details | |
| ### Model Description | |
| Fine-tuned **sequence-to-sequence (encoder–decoder) Transformer** for **English text simplification**. | |
| Trained on the dataset **`eilamc14/wikilarge-clean`** (cleaned WikiLarge-style pairs). | |
| - **Model type:** Seq2Seq Transformer (encoder–decoder) | |
| - **Language (NLP):** English | |
| - **License:** `apache-2.0` | |
| - **Finetuned from model:** `google/pegasus-xsum` | |
| ### Model Sources | |
| - **Repository (code):** https://github.com/eilamc14/Simplify-This | |
| - **Dataset:** https://huggingface.co/datasets/eilamc14/wikilarge-clean | |
| - **Paper:** arxiv.org/abs/2601.05794 | |
| ## Uses | |
| ### Direct Use | |
| The model is intended for **English text simplification**. | |
| - **Input format:** `Simplify: <complex sentence>` | |
| - **Output:** `<simplified sentence>` | |
| **Typical uses** | |
| - Research on automatic text simplification | |
| - Benchmarking against other simplification systems | |
| - Demos/prototypes that require simpler English rewrites | |
| ### Downstream Use | |
| This repository already contains a **fine-tuned** model specialized for text simplification. | |
| Further fine-tuning is **optional** and mainly relevant when: | |
| - Adapting to a markedly different domain (e.g., medical/legal/news) | |
| - Addressing specific failure modes (e.g., over/under-simplification, factual drops) | |
| - Distilling/quantizing for deployment constraints | |
| When fine-tuning further, keep the same input convention: `Simplify: <...>`. | |
| ### Out-of-Scope Use | |
| Not intended for: | |
| - Tasks unrelated to simplification (dialogue, translation etc.) | |
| - Production use without additional safety filtering (no toxicity/bias mitigation) | |
| - Languages other than English | |
| - High-stakes settings (legal/medical advice, safety-critical decisions) | |
| ## Bias, Risks, and Limitations | |
| The model was trained on **Wikipedia and Simple English Wikipedia** alignments (via WikiLarge). | |
| As a result, it inherits the characteristics and limitations of this data: | |
| - **Domain bias:** Simplifications may reflect encyclopedic style; performance may degrade on informal, technical, or domain-specific text (e.g., medical/legal/news). | |
| - **Content bias:** Wikipedia content itself contains biases in coverage, cultural perspective, and phrasing. Simplified outputs may reflect or amplify these. | |
| - **Simplification quality:** The model may: | |
| - Over-simplify (drop important details) | |
| - Under-simplify (retain complex phrasing) | |
| - Produce ungrammatical or awkward rephrasings | |
| - **Language limitation:** Only suitable for English. Applying to other languages is unsupported. | |
| - **Safety limitation:** The model has not been aligned to avoid toxic, biased, or harmful content. If the input text contains such content, the output may reproduce or modify it without safeguards. | |
| ### Recommendations | |
| - **Evaluation required:** Always evaluate the model in the target domain before deployment. Benchmark simplification quality (e.g., with SARI, FKGL, BERTScore, LENS, human evaluation). | |
| - **Human oversight:** Use human-in-the-loop review for applications where meaning preservation is critical (education, accessibility tools, etc.). | |
| - **Attribution:** Preserve source attribution where required (Wikipedia → CC BY-SA). | |
| - **Not for high-stakes use:** Avoid legal, medical, or safety-critical applications without extensive validation and domain adaptation. | |
| ## How to Get Started with the Model | |
| Load the model and tokenizer directly from the Hugging Face Hub: | |
| ```python | |
| from transformers import AutoModelForSeq2SeqLM, AutoTokenizer | |
| model_id = "eilamc14/bart-base-text-simplification" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForSeq2SeqLM.from_pretrained(model_id) | |
| # Example input | |
| PREFIX = "Simplify: " | |
| text = "The committee deemed the proposal unnecessarily complicated." | |
| # Tokenize and generate | |
| inputs = tokenizer(PREFIX+text, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=64, num_beams=4) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ## Training Details | |
| ### Training Data | |
| [WikiLarge-clean](https://huggingface.co/datasets/eilamc14/wikilarge-clean) Dataset | |
| ### Training Procedure | |
| - **Hardware:** NVIDIA L4 GPU on Google Colab | |
| - **Objective:** Standard sequence-to-sequence cross-entropy loss | |
| - **Training type:** Full fine-tuning of all parameters (no LoRA/PEFT used) | |
| - **Batching:** Dynamic padding with Hugging Face `Trainer` / PyTorch DataLoader | |
| - **Evaluation:** Monitored on the `validation` split with metrics (SARI and identical_ratio) | |
| - **Stopping criteria:** Early stopping CallBack based on validation performance | |
| #### Preprocessing | |
| The dataset was preprocessed by prefixing each source sentence with **"Simplify: "** and tokenizing both the source (inputs) and target (labels). | |
| #### Memory & Checkpointing | |
| To reduce VRAM during training, gradient checkpointing was enabled and the KV cache was disabled: | |
| ```python | |
| model.config.use_cache = False # required when using gradient checkpointing | |
| model.gradient_checkpointing_enable() # saves memory at the cost of extra compute | |
| ``` | |
| **Notes** | |
| - Disabling `use_cache` avoids warnings/conflicts with gradient checkpointing and reduces memory usage in the forward pass. | |
| - Gradient checkpointing trades **GPU memory ↓** for **training speed ↓** (extra recomputation). | |
| - For **inference/evaluation**, re-enable the cache for faster generation: | |
| ```python | |
| model.config.use_cache = True | |
| ``` | |
| #### Training Hyperparameters | |
| The models were trained with Hugging Face `Seq2SeqTrainingArguments`. | |
| Hyperparameters varied slightly across models and runs to optimize, and full logs (batch size, steps, exact LR schedule) were not preserved. | |
| Below are the **typical defaults** used: | |
| - **Epochs:** 5 | |
| - **Evaluation strategy:** every 300 steps | |
| - **Save strategy:** every 300 steps (keep best model, `eval_loss` as criterion) | |
| - **Learning rate:** ~3e-5 | |
| - **Batch size:** ~8-64 , depends on model size | |
| - **Optimizer:** `adamw_torch_fused` | |
| - **Precision:** bf16 | |
| - **Generation config (during eval):** `max_length=128`, `num_beams=4`, `predict_with_generate=True` | |
| - **Other settings:** | |
| - Weight decay: 0.01 | |
| - Label smoothing: 0.1 | |
| - Warmup ratio: 0.1 | |
| - Max grad norm: 0.5 | |
| - Dataloader workers: 8 (L4 GPU) | |
| > Because hyperparameters were adjusted between runs and not all were logged, exact reproduction may differ slightly. | |
| ## Evaluation | |
| ### Testing Data | |
| - [**ASSET**](https://huggingface.co/datasets/facebook/asset) (test subset) | |
| - [**MEDEASI**](https://huggingface.co/datasets/cbasu/Med-EASi) (test subset) | |
| - [**OneStopEnglish**](https://github.com/nishkalavallabhi/OneStopEnglishCorpus) (advanced → elementary) | |
| ### Metrics | |
| - **Identical ratio** — share of outputs identical to the source, both normalized by basic, language-agnostic: strip, NFKC, collapse spaces | |
| - **Identical ratio (ci)** — case insensitive identical ratio | |
| - **SARI** — main simplification metric (higher is better) | |
| - **FKGL** — readability grade level (lower is simpler) | |
| - **BERTScore (F1)** — semantic similarity (higher is better) | |
| - **LENS** — composite simplification quality score (higher is better) | |
| ### Generation Arguments | |
| ```python | |
| gen_args = dict( | |
| max_new_tokens=64, | |
| num_beams=4, | |
| length_penalty=1.0, | |
| no_repeat_ngram_size=3, | |
| early_stopping=True, | |
| do_sample=False, | |
| ) | |
| ``` | |
| ### Results | |
| | Dataset | Identical ratio | Identical ratio (ci) | SARI | FKGL | BERTScore | LENS | | |
| |--------------------|----------------:|---------------------:|------:|-----:|----------:|------:| | |
| | **ASSET** | 0.29 | 0.29 | 33.80 | 9.23 | 87.54 | 62.46 | | |
| | **MEDEASI** | 0.30 | 0.30 | 32.68 | 10.98| 45.14 | 50.55 | | |
| | **OneStopEnglish** | 0.40 | 0.40 | 37.07 | 8.66 | 77.77 | 60.97 | | |
| ## Environmental Impact | |
| - **Hardware Type:** Single NVIDIA L4 GPU (Google Colab) | |
| - **Hours used:** Approx. 5–10 | |
| - **Cloud Provider:** Google Cloud (via Colab) | |
| - **Compute Region:** Unknown (Google Colab dynamic allocation) | |
| - **Carbon Emitted:** Estimated to be very low (< a few kg CO₂eq), since training was limited to a single GPU for a small number of hours. | |
| ## Citation | |
| @misc{simplifythis2025, | |
| author = {Cohen, Eilam and others}, | |
| title = {Simplify-This: A Comparative Analysis of Prompt-Based and Fine-Tuned LLMs}, | |
| year = {2025}, | |
| howpublished = {\url{https://github.com/eilamc14/Simplify-This}}, | |
| note = {GitHub repository}, | |
| urldate = {2025-09-30} | |
| } |