Update README.md
Browse files
README.md
CHANGED
@@ -284,6 +284,8 @@ Feel free to click the expand button below to see the full list of sources.
|
|
284 |
| The Swedish Culturomics Gigaword Corpus | sv | Rødven-Eide, 2016 |
|
285 |
| Corpus of laws and legal acts of Ukraine | uk | [Link](https://lang.org.ua/en/corpora/#anchor7) |
|
286 |
|
|
|
|
|
287 |
<details>
|
288 |
<summary>References</summary>
|
289 |
|
@@ -477,7 +479,7 @@ especially if the content originates from less-regulated sources or user-generat
|
|
477 |
|
478 |
This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
|
479 |
- Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
|
480 |
-
- Domain-specific or language-specific raw crawls (p.e. Spanish Crawling).
|
481 |
- Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
|
482 |
(p.e. CATalog).
|
483 |
|
|
|
284 |
| The Swedish Culturomics Gigaword Corpus | sv | Rødven-Eide, 2016 |
|
285 |
| Corpus of laws and legal acts of Ukraine | uk | [Link](https://lang.org.ua/en/corpora/#anchor7) |
|
286 |
|
287 |
+
To consult the data summary document with the respective licences, please send an e-mail to ipr@bsc.es.
|
288 |
+
|
289 |
<details>
|
290 |
<summary>References</summary>
|
291 |
|
|
|
479 |
|
480 |
This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
|
481 |
- Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
|
482 |
+
- Domain-specific or language-specific raw crawls, always respecting robots.txt (p.e. Spanish Crawling).
|
483 |
- Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
|
484 |
(p.e. CATalog).
|
485 |
|