antortl commited on
Commit
be939dc
1 Parent(s): b293815

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -14
README.md CHANGED
@@ -38,22 +38,15 @@ Abstraction Level: The model tends to be more extractive than abstractive in its
38
  ## Training and evaluation data
39
 
40
 
41
- News Articles Dataset:
42
 
43
- Source: CNN/Daily Mail dataset (version 3.0.0)
44
- Size: Approximately 200,000 articles
45
- Time Range: 2007-2021
46
  Language: English
47
- Content: Wide range of topics including politics, sports, entertainment, and world events
48
-
49
-
50
- Academic Articles Dataset:
51
-
52
- Source: arXiv and PubMed Open Access Subset
53
- Size: Approximately 150,000 articles
54
- Time Range: 2010-2022
55
- Language: English
56
- Content: Research papers from various scientific fields including physics, mathematics, computer science, and biomedical sciences
57
 
58
 
59
  Pre-processing Steps:
 
38
  ## Training and evaluation data
39
 
40
 
41
+ Dataset:
42
 
43
+ Source: PARANMT-50M
44
+ Size: Approximately 50M
45
+ Time Range: 2007-2017
46
  Language: English
47
+ Content: more than 50 million English-English
48
+ sentential paraphrase pairs
49
+ https://arxiv.org/pdf/1711.05732v2
 
 
 
 
 
 
 
50
 
51
 
52
  Pre-processing Steps: