jpalomar commited on
Commit
9f03fd4
·
verified ·
1 Parent(s): 28a1853

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -284,6 +284,8 @@ Feel free to click the expand button below to see the full list of sources.
284
  | The Swedish Culturomics Gigaword Corpus | sv | Rødven-Eide, 2016 |
285
  | Corpus of laws and legal acts of Ukraine | uk | [Link](https://lang.org.ua/en/corpora/#anchor7) |
286
 
 
 
287
  <details>
288
  <summary>References</summary>
289
 
@@ -477,7 +479,7 @@ especially if the content originates from less-regulated sources or user-generat
477
 
478
  This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
479
  - Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
480
- - Domain-specific or language-specific raw crawls (p.e. Spanish Crawling).
481
  - Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
482
  (p.e. CATalog).
483
 
 
284
  | The Swedish Culturomics Gigaword Corpus | sv | Rødven-Eide, 2016 |
285
  | Corpus of laws and legal acts of Ukraine | uk | [Link](https://lang.org.ua/en/corpora/#anchor7) |
286
 
287
+ To consult the data summary document with the respective licences, please send an e-mail to ipr@bsc.es.
288
+
289
  <details>
290
  <summary>References</summary>
291
 
 
479
 
480
  This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
481
  - Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
482
+ - Domain-specific or language-specific raw crawls, always respecting robots.txt (p.e. Spanish Crawling).
483
  - Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
484
  (p.e. CATalog).
485