myrkur
/

Persian-ModernBert-base

@@ -32,14 +32,60 @@ ModernBERT is a Persian-language Masked Language Model (MLM) fine-tuned with a c
 ## Usage
-### Load the Model and Tokenizer
 ```python
 from transformers import AutoTokenizer, AutoModelForMaskedLM
 # Load custom tokenizer and fine-tuned model
 tokenizer = AutoTokenizer.from_pretrained("myrkur/Persian-ModernBert-base")
-model = AutoModelForMaskedLM.from_pretrained("myrkur/Persian-ModernBert-base")
 ```
 ### Example: Masked Token Prediction
@@ -47,6 +93,7 @@ model = AutoModelForMaskedLM.from_pretrained("myrkur/Persian-ModernBert-base")
 ```python
 text = "حال و [MASK] مردم خوب است."
 inputs = tokenizer(text, return_tensors="pt")
 token_logits = model(**inputs).logits
 # Find the [MASK] token and decode top predictions

 ## Usage
+You can use these models directly with the `transformers` library. Until the next `transformers` release, doing so requires installing transformers from main:
+```sh
+pip install git+https://github.com/huggingface/transformers.git
+```
+Since ModernBERT is a Masked Language Model (MLM), you can use the `fill-mask` pipeline or load it via `AutoModelForMaskedLM`. To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.
+**⚠️ If your GPU supports it, we recommend using ModernBERT with Flash Attention 2 to reach the highest efficiency. To do so, install Flash Attention as follows, then use the model as normal:**
+```bash
+pip install flash-attn
+```
+### Inference on CPU
+#### Load the Model and Tokenizer
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+# Load custom tokenizer and fine-tuned model
+tokenizer = AutoTokenizer.from_pretrained("myrkur/Persian-ModernBert-base")
+model = AutoModelForMaskedLM.from_pretrained("myrkur/Persian-ModernBert-base", attn_implementation="eager", torch_dtype=torch.bfloat16, device_map="cpu")
+```
+### Example: Masked Token Prediction
+```python
+text = "حال و [MASK] مردم خوب است."
+inputs = tokenizer(text, return_tensors="pt")
+inputs = {k:v.cpu() for k, v in inputs.items()}
+token_logits = model(**inputs).logits
+# Find the [MASK] token and decode top predictions
+mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
+mask_token_logits = token_logits[0, mask_token_index, :]
+top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()
+for token in top_5_tokens:
+    print(f"Prediction: {text.replace(tokenizer.mask_token, tokenizer.decode([token]))}")
+```
+### Inference on GPU
+#### Load the Model and Tokenizer
 ```python
+import torch
 from transformers import AutoTokenizer, AutoModelForMaskedLM
 # Load custom tokenizer and fine-tuned model
 tokenizer = AutoTokenizer.from_pretrained("myrkur/Persian-ModernBert-base")
+model = AutoModelForMaskedLM.from_pretrained("myrkur/Persian-ModernBert-base", attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16, device_map="cuda")
 ```
 ### Example: Masked Token Prediction
 ```python
 text = "حال و [MASK] مردم خوب است."
 inputs = tokenizer(text, return_tensors="pt")
+inputs = {k:v.cuda() for k, v in inputs.items()}
 token_logits = model(**inputs).logits
 # Find the [MASK] token and decode top predictions