hyunwoongko
commited on
Commit
•
94bc87d
1
Parent(s):
451a670
Update README.md
Browse files
README.md
CHANGED
@@ -31,7 +31,7 @@ dimensions of each head. The model is trained with a tokenization vocabulary of
|
|
31 |
|
32 |
## Training data
|
33 |
|
34 |
-
Polyglot-Ko was trained on 863 GB of Korean language data (1.2TB before processing), a large-scale dataset curated by [TUNiB](https://tunib.ai/). The data collection process has abided by South Korean laws. This dataset was collected for the purpose of training Polyglot-Ko models, so it will not be released for public use.
|
35 |
|
36 |
| Source |Size (GB) | Link |
|
37 |
|-------------------------------------|---------|------------------------------------------|
|
|
|
31 |
|
32 |
## Training data
|
33 |
|
34 |
+
Polyglot-Ko-3.8B was trained on 863 GB of Korean language data (1.2TB before processing), a large-scale dataset curated by [TUNiB](https://tunib.ai/). The data collection process has abided by South Korean laws. This dataset was collected for the purpose of training Polyglot-Ko models, so it will not be released for public use.
|
35 |
|
36 |
| Source |Size (GB) | Link |
|
37 |
|-------------------------------------|---------|------------------------------------------|
|