myrkur
/

Persian-ModernBert-base

flash-attention

Inference Endpoints

Model card Files Files and versions Community

myrkur commited on Jan 4

Commit

31a6a90

·

verified ·

1 Parent(s): 82429cc

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -69,7 +69,7 @@ The model was fine-tuned on a custom dataset with **2.5 billion Persian tokens**
 - **Optimizer**: AdamW
 - **Learning Rate**: 6e-4
 - **Batch Size**: 32
-- **Epochs**: 3
 - **Scheduler**: Inverse square root
 - **Precision**: bfloat16 for faster computation and lower memory usage
 - **Masking Strategy**: Whole Word Masking (WWM) with a probability of 30%

 - **Optimizer**: AdamW
 - **Learning Rate**: 6e-4
 - **Batch Size**: 32
+- **Epochs**: 2
 - **Scheduler**: Inverse square root
 - **Precision**: bfloat16 for faster computation and lower memory usage
 - **Masking Strategy**: Whole Word Masking (WWM) with a probability of 30%