myrkur commited on
Commit
87c5c2d
·
verified ·
1 Parent(s): 31a6a90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -2
README.md CHANGED
@@ -32,14 +32,60 @@ ModernBERT is a Persian-language Masked Language Model (MLM) fine-tuned with a c
32
 
33
  ## Usage
34
 
35
- ### Load the Model and Tokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ```python
 
38
  from transformers import AutoTokenizer, AutoModelForMaskedLM
39
 
40
  # Load custom tokenizer and fine-tuned model
41
  tokenizer = AutoTokenizer.from_pretrained("myrkur/Persian-ModernBert-base")
42
- model = AutoModelForMaskedLM.from_pretrained("myrkur/Persian-ModernBert-base")
43
  ```
44
 
45
  ### Example: Masked Token Prediction
@@ -47,6 +93,7 @@ model = AutoModelForMaskedLM.from_pretrained("myrkur/Persian-ModernBert-base")
47
  ```python
48
  text = "حال و [MASK] مردم خوب است."
49
  inputs = tokenizer(text, return_tensors="pt")
 
50
  token_logits = model(**inputs).logits
51
 
52
  # Find the [MASK] token and decode top predictions
 
32
 
33
  ## Usage
34
 
35
+ You can use these models directly with the `transformers` library. Until the next `transformers` release, doing so requires installing transformers from main:
36
+
37
+ ```sh
38
+ pip install git+https://github.com/huggingface/transformers.git
39
+ ```
40
+
41
+ Since ModernBERT is a Masked Language Model (MLM), you can use the `fill-mask` pipeline or load it via `AutoModelForMaskedLM`. To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.
42
+
43
+ **⚠️ If your GPU supports it, we recommend using ModernBERT with Flash Attention 2 to reach the highest efficiency. To do so, install Flash Attention as follows, then use the model as normal:**
44
+
45
+ ```bash
46
+ pip install flash-attn
47
+ ```
48
+ ### Inference on CPU
49
+
50
+ #### Load the Model and Tokenizer
51
+
52
+ ```python
53
+ import torch
54
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
55
+
56
+ # Load custom tokenizer and fine-tuned model
57
+ tokenizer = AutoTokenizer.from_pretrained("myrkur/Persian-ModernBert-base")
58
+ model = AutoModelForMaskedLM.from_pretrained("myrkur/Persian-ModernBert-base", attn_implementation="eager", torch_dtype=torch.bfloat16, device_map="cpu")
59
+ ```
60
+
61
+ ### Example: Masked Token Prediction
62
+
63
+ ```python
64
+ text = "حال و [MASK] مردم خوب است."
65
+ inputs = tokenizer(text, return_tensors="pt")
66
+ inputs = {k:v.cpu() for k, v in inputs.items()}
67
+ token_logits = model(**inputs).logits
68
+
69
+ # Find the [MASK] token and decode top predictions
70
+ mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
71
+ mask_token_logits = token_logits[0, mask_token_index, :]
72
+ top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()
73
+
74
+ for token in top_5_tokens:
75
+ print(f"Prediction: {text.replace(tokenizer.mask_token, tokenizer.decode([token]))}")
76
+ ```
77
+
78
+ ### Inference on GPU
79
+
80
+ #### Load the Model and Tokenizer
81
 
82
  ```python
83
+ import torch
84
  from transformers import AutoTokenizer, AutoModelForMaskedLM
85
 
86
  # Load custom tokenizer and fine-tuned model
87
  tokenizer = AutoTokenizer.from_pretrained("myrkur/Persian-ModernBert-base")
88
+ model = AutoModelForMaskedLM.from_pretrained("myrkur/Persian-ModernBert-base", attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16, device_map="cuda")
89
  ```
90
 
91
  ### Example: Masked Token Prediction
 
93
  ```python
94
  text = "حال و [MASK] مردم خوب است."
95
  inputs = tokenizer(text, return_tensors="pt")
96
+ inputs = {k:v.cuda() for k, v in inputs.items()}
97
  token_logits = model(**inputs).logits
98
 
99
  # Find the [MASK] token and decode top predictions