99eren99 commited on
Commit
443fc2f
·
verified ·
1 Parent(s): e555415

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -3
README.md CHANGED
@@ -1,3 +1,55 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - Turkish
5
+ - turkish
6
+ language:
7
+ - tr
8
+ base_model:
9
+ - answerdotai/ModernBERT-base
10
+ pipeline_tag: fill-mask
11
+ ---
12
+
13
+ # Long Context Pretrained Text Encoder For Turkish Language
14
+
15
+ <img src="https://huggingface.co/99eren99/ModernBERT-base-Turkish-uncased-mlm/resolve/main/assets/cover.jpg"
16
+ alt="drawing" width="400"/>
17
+
18
+ This is a Turkish Base uncased ModernBERT model. Since this model is uncased: it does not make a difference between turkish and Turkish.
19
+
20
+ #### ⚠ Uncased use requires manual lowercase conversion
21
+
22
+
23
+ **Don't** use the `do_lower_case = True` flag with the tokenizer. Instead, convert your text to lower case as follows:
24
+ ```python
25
+ text.replace("I", "ı").lower()
26
+ ```
27
+ This is due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer.
28
+
29
+ Be aware that this model may exhibit biased predictions as it was trained primarily on crawled data, which inherently can contain various biases.
30
+
31
+
32
+ ## Example Usage
33
+ ```python
34
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
35
+
36
+ tokenizer = AutoTokenizer.from_pretrained(
37
+ "99eren99/ModernBERT-base-Turkish-uncased-mlm", do_lower_case=False
38
+ )
39
+ #tokenizer.truncation_side = "right"
40
+
41
+ model = AutoModelForMaskedLM.from_pretrained(
42
+ "99eren99/ModernBERT-base-Turkish-uncased-mlm",
43
+ )
44
+
45
+ model.eval()
46
+
47
+ # for moving to gpu
48
+ # model.to("cuda", dtype=torch.float16)
49
+
50
+ ```
51
+
52
+ # Evaluations
53
+ -Mask Prediction Top 1 Accuracies (you can find eval scripts in "./assets" folder):
54
+ <img src="https://huggingface.co/99eren99/ModernBERT-base-Turkish-uncased-mlm/blob/main/assets/eval_results.jpg" alt="drawing">
55
+