Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,55 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- Turkish
|
5 |
+
- turkish
|
6 |
+
language:
|
7 |
+
- tr
|
8 |
+
base_model:
|
9 |
+
- answerdotai/ModernBERT-base
|
10 |
+
pipeline_tag: fill-mask
|
11 |
+
---
|
12 |
+
|
13 |
+
# Long Context Pretrained Text Encoder For Turkish Language
|
14 |
+
|
15 |
+
<img src="https://huggingface.co/99eren99/ModernBERT-base-Turkish-uncased-mlm/resolve/main/assets/cover.jpg"
|
16 |
+
alt="drawing" width="400"/>
|
17 |
+
|
18 |
+
This is a Turkish Base uncased ModernBERT model. Since this model is uncased: it does not make a difference between turkish and Turkish.
|
19 |
+
|
20 |
+
#### ⚠ Uncased use requires manual lowercase conversion
|
21 |
+
|
22 |
+
|
23 |
+
**Don't** use the `do_lower_case = True` flag with the tokenizer. Instead, convert your text to lower case as follows:
|
24 |
+
```python
|
25 |
+
text.replace("I", "ı").lower()
|
26 |
+
```
|
27 |
+
This is due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer.
|
28 |
+
|
29 |
+
Be aware that this model may exhibit biased predictions as it was trained primarily on crawled data, which inherently can contain various biases.
|
30 |
+
|
31 |
+
|
32 |
+
## Example Usage
|
33 |
+
```python
|
34 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
35 |
+
|
36 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
37 |
+
"99eren99/ModernBERT-base-Turkish-uncased-mlm", do_lower_case=False
|
38 |
+
)
|
39 |
+
#tokenizer.truncation_side = "right"
|
40 |
+
|
41 |
+
model = AutoModelForMaskedLM.from_pretrained(
|
42 |
+
"99eren99/ModernBERT-base-Turkish-uncased-mlm",
|
43 |
+
)
|
44 |
+
|
45 |
+
model.eval()
|
46 |
+
|
47 |
+
# for moving to gpu
|
48 |
+
# model.to("cuda", dtype=torch.float16)
|
49 |
+
|
50 |
+
```
|
51 |
+
|
52 |
+
# Evaluations
|
53 |
+
-Mask Prediction Top 1 Accuracies (you can find eval scripts in "./assets" folder):
|
54 |
+
<img src="https://huggingface.co/99eren99/ModernBERT-base-Turkish-uncased-mlm/blob/main/assets/eval_results.jpg" alt="drawing">
|
55 |
+
|