José Ángel González
commited on
Commit
•
20fa076
1
Parent(s):
c33c85c
Update README.md
Browse files
README.md
CHANGED
@@ -12,3 +12,19 @@ widget:
|
|
12 |
News Abstractive Summarization for Catalan (NASCA) is a Transformer encoder-decoder model, with the same hyper-parameters than BART, to perform summarization of Catalan news articles. It is pre-trained on a combination of several self-supervised tasks that help to increase the abstractivity of the generated summaries. Four pre-training tasks have been combined: sentence permutation, text infilling, Gap Sentence Generation, and Next Segment Generation. Catalan newspapers, the Catalan subset of the OSCAR corpus and Wikipedia articles in Catalan were used for pre-training the model (9.3GB of raw text -2.5 millions of documents-).
|
13 |
|
14 |
NASCA is finetuned for the summarization task on 636.596 (document, summary) pairs from the Dataset for Automatic summarization of Catalan and Spanish newspaper Articles (DACSA).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
News Abstractive Summarization for Catalan (NASCA) is a Transformer encoder-decoder model, with the same hyper-parameters than BART, to perform summarization of Catalan news articles. It is pre-trained on a combination of several self-supervised tasks that help to increase the abstractivity of the generated summaries. Four pre-training tasks have been combined: sentence permutation, text infilling, Gap Sentence Generation, and Next Segment Generation. Catalan newspapers, the Catalan subset of the OSCAR corpus and Wikipedia articles in Catalan were used for pre-training the model (9.3GB of raw text -2.5 millions of documents-).
|
13 |
|
14 |
NASCA is finetuned for the summarization task on 636.596 (document, summary) pairs from the Dataset for Automatic summarization of Catalan and Spanish newspaper Articles (DACSA).
|
15 |
+
|
16 |
+
More details about the pretraining/finetuning datasets and the models soon:
|
17 |
+
|
18 |
+
@unpublished{DACSA,
|
19 |
+
author = "Vicent Ahuir, Lluís-F. Hurtado , José Ángel González and Encarna Segarra",
|
20 |
+
title = "DACSA: a Dataset for Automatic summarization of Catalan and Spanish
|
21 |
+
newspaper Articles",
|
22 |
+
note = "Unsubmitted",
|
23 |
+
}
|
24 |
+
|
25 |
+
@unpublished{NAS,
|
26 |
+
author = "Vicent Ahuir, Lluís-F. Hurtado , José Ángel González and Encarna Segarra",
|
27 |
+
title = "NAS CA and NAS ES : Two monolingual pre-trained models for
|
28 |
+
abstractive summarization in Catalan and Spanish",
|
29 |
+
note = "Submitted to the Special Issue on Current Approaches and Applications in Natural Language Processing (Applied Sciences)",
|
30 |
+
}
|