Update README.md
Browse files
README.md
CHANGED
@@ -8,12 +8,12 @@ datasets:
|
|
8 |
- oscar
|
9 |
---
|
10 |
|
11 |
-
# RoBERTa Turkish medium Character-level
|
12 |
|
13 |
Pretrained model on Turkish language using a masked language modeling (MLM) objective. The model is uncased.
|
14 |
The pretrained corpus is OSCAR's Turkish split, but it is further filtered and cleaned.
|
15 |
|
16 |
-
Model architecture is similar to bert-medium (8 layers, 8 heads, and 512 hidden size). Tokenization algorithm is Character-level, which means that text is split by individual characters. Vocabulary size is
|
17 |
|
18 |
## Note that this model does not include a tokenizer file, because it uses ByT5Tokenizer. The following code can be used for model loading and tokenization, example max length(1024) can be changed:
|
19 |
```
|
|
|
8 |
- oscar
|
9 |
---
|
10 |
|
11 |
+
# RoBERTa Turkish medium Character-level (uncased)
|
12 |
|
13 |
Pretrained model on Turkish language using a masked language modeling (MLM) objective. The model is uncased.
|
14 |
The pretrained corpus is OSCAR's Turkish split, but it is further filtered and cleaned.
|
15 |
|
16 |
+
Model architecture is similar to bert-medium (8 layers, 8 heads, and 512 hidden size). Tokenization algorithm is Character-level, which means that text is split by individual characters. Vocabulary size is 384.
|
17 |
|
18 |
## Note that this model does not include a tokenizer file, because it uses ByT5Tokenizer. The following code can be used for model loading and tokenization, example max length(1024) can be changed:
|
19 |
```
|