Update README.md
Browse files
README.md
CHANGED
@@ -32,12 +32,12 @@ The pre-training dataset consists of documents from different domains:
|
|
32 |
| Legal | OpenLegalData: German cases and laws | 5.4GB | 308,228 | 1B |
|
33 |
| Medical | Smaller public datasets | 253MB | 179,776 | 50M |
|
34 |
| Medical | CC medical texts | 3.6GB | 2,000,000 | 682M |
|
35 |
-
| Medical |
|
36 |
-
| Medical | Pubmed abstracts | 8.5GB | 21,044,382 | 1.7B |
|
37 |
-
| Medical | MIMIC III | 2.6GB | 24,221,834 | 695M |
|
38 |
-
| Medical | PMC-Patients-ReCDS | 2.1GB | 1,743,344 | 414M |
|
39 |
| Literature | German Fiction | 1.1GB | 3,219 | 243M |
|
40 |
-
| Literature | English books | 7.1GB | 11,038 | 1.6B |
|
41 |
| - | Total | 167GB | 116,079,769 | 35.8B |
|
42 |
|
43 |
|
|
|
32 |
| Legal | OpenLegalData: German cases and laws | 5.4GB | 308,228 | 1B |
|
33 |
| Medical | Smaller public datasets | 253MB | 179,776 | 50M |
|
34 |
| Medical | CC medical texts | 3.6GB | 2,000,000 | 682M |
|
35 |
+
| Medical | Medicine Dissertations | 1.4GB | 14,496 | 295M |
|
36 |
+
| Medical | Pubmed abstracts (translated) | 8.5GB | 21,044,382 | 1.7B |
|
37 |
+
| Medical | MIMIC III (translated) | 2.6GB | 24,221,834 | 695M |
|
38 |
+
| Medical | PMC-Patients-ReCDS (translated) | 2.1GB | 1,743,344 | 414M |
|
39 |
| Literature | German Fiction | 1.1GB | 3,219 | 243M |
|
40 |
+
| Literature | English books (translated) | 7.1GB | 11,038 | 1.6B |
|
41 |
| - | Total | 167GB | 116,079,769 | 35.8B |
|
42 |
|
43 |
|