metadata

datasets:
  - mbshr/XSUMUrdu-DW_BBC
language:
  - ur
metrics:
  - rouge
  - bertscore
pipeline_tag: summarization

Model Card for Model ID

Summarization Model (Type:T5)

Summarization: Extractive and Abstractive

urT5 adapted from mT5 having monolingual vocabulary only; 40k tokens of Urdu.
- Fine-tuned on https://huggingface.co/mbshr/XSUMUrdu-DW_BBC, ref to https://doi.org/10.48550/arXiv.2310.02790 for details.

Model Details

Model Description

Developed by: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: urT5 adapted version of mT5
Language(s) (NLP): Urdu
License: [More Information Needed]
Finetuned from model [optional]: google/mt5-base

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: https://doi.org/10.48550/arXiv.2310.02790

Uses

Summarization

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Evaluation & Results

Evaluated on https://huggingface.co/mbshr/XSUMUrdu-DW_BBC

ROUGE-1 F Score: 40.03 combined, 46.35 BBC Urdu datapoints only and 36.91 DW Urdu datapoints only)
BERTScore: 75.1 combined, 77.0 BBC Urdu datapoints only and 74.16 DW Urdu datapoints only

Citation [optional]

@misc{munaf2023low, title={Low Resource Summarization using Pre-trained Language Models}, author={Mubashir Munaf and Hammad Afzal and Naima Iltaf and Khawir Mahmood}, year={2023}, eprint={2310.02790}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Contact

mubashir.munaaf@gmail.com