|
--- |
|
tags: |
|
- summarization |
|
- news |
|
language: es |
|
datasets: |
|
- mlsum |
|
--- |
|
|
|
# Spanish RoBERTa2RoBERTa (roberta-base-bne) fine-tuned on MLSUM ES for summarization |
|
|
|
## Model |
|
[BSC-TeMU/roberta-base-bne](https://huggingface.co/BSC-TeMU/roberta-base-bne) (RoBERTa Checkpoint) |
|
|
|
## Dataset |
|
**MLSUM** is the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, **Spanish**, Russian, Turkish. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset. |
|
|
|
[MLSUM es](https://huggingface.co/datasets/viewer/?dataset=mlsum) |
|
|
|
## Results (WIP) |
|
|
|
|Set|Metric| Value| |
|
|----|------|------| |
|
| Test |Rouge2 - mid -precision | 11.42| |
|
| Test | Rouge2 - mid - recall | 10.58 | |
|
| Test | Rouge2 - mid - fmeasure | 10.69| |
|
| Test | Rouge1 - fmeasure | 28.83 | |
|
| Test | RougeL - fmeasure | 23.15 | |
|
|
|
## Usage |
|
|
|
```python |
|
import torch |
|
from transformers import RobertaTokenizerFast, EncoderDecoderModel |
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
ckpt = 'Narrativa/bsc_roberta2roberta_shared-spanish-finetuned-mlsum-summarization' |
|
tokenizer = RobertaTokenizerFast.from_pretrained(ckpt) |
|
model = EncoderDecoderModel.from_pretrained(ckpt).to(device) |
|
|
|
def generate_summary(text): |
|
|
|
inputs = tokenizer([text], padding="max_length", truncation=True, max_length=512, return_tensors="pt") |
|
input_ids = inputs.input_ids.to(device) |
|
attention_mask = inputs.attention_mask.to(device) |
|
output = model.generate(input_ids, attention_mask=attention_mask) |
|
return tokenizer.decode(output[0], skip_special_tokens=True) |
|
|
|
text = "Your text here..." |
|
generate_summary(text) |
|
``` |
|
|
|
Created by: [Narrativa](https://www.narrativa.com/) |
|
|
|
About Narrativa: Natural Language Generation (NLG) | Gabriele, our machine learning-based platform, builds and deploys natural language solutions. #NLG #AI |
|
|
|
|