Hungarian Abstractive Summarization with finetuned MBart-50 model
For further details, see or our demo site.
- Finetuned on mBART-large-50 model
- Finetuned on HI corpus (hvg.hu + index.hu)
- Segments: 559162
Limitations
- tokenized input text (tokenizer: HuSpaCy)
- max_source_length = 1024
- max_target_length = 256
Results
Model | HI |
---|---|
mBART | 35.17/16.46/25.61 |
mT5 | 33.30/15.97/24.65 |
PEGASUS | 30.36/13.11/21.57 |
Usage with pipeline
from transformers import pipeline
summarization = pipeline(task="summarization", model="NYTK/summarization-hi-mbart-large-50-hungarian")
print(summarization(input_text)[0]["summary_text"])
Citation
If you use this model, please cite the following paper:
@inproceedings {yang-multi-sum,
title = {{Többnyelvű modellek és PEGASUS finomhangolása magyar nyelvű absztraktív összefoglalás feladatára}},
booktitle = {XIX. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2023)},
year = {2023},
publisher = {Szegedi Tudományegyetem, Informatikai Intézet},
address = {Szeged, Magyarország},
author = {Yang, Zijian Győző},
pages = {381--393}
}
- Downloads last month
- 45
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.