Update README.md
Browse files
README.md
CHANGED
@@ -69,7 +69,7 @@ The model was fine-tuned on a custom dataset with **2.5 billion Persian tokens**
|
|
69 |
- **Optimizer**: AdamW
|
70 |
- **Learning Rate**: 6e-4
|
71 |
- **Batch Size**: 32
|
72 |
-
- **Epochs**:
|
73 |
- **Scheduler**: Inverse square root
|
74 |
- **Precision**: bfloat16 for faster computation and lower memory usage
|
75 |
- **Masking Strategy**: Whole Word Masking (WWM) with a probability of 30%
|
|
|
69 |
- **Optimizer**: AdamW
|
70 |
- **Learning Rate**: 6e-4
|
71 |
- **Batch Size**: 32
|
72 |
+
- **Epochs**: 2
|
73 |
- **Scheduler**: Inverse square root
|
74 |
- **Precision**: bfloat16 for faster computation and lower memory usage
|
75 |
- **Masking Strategy**: Whole Word Masking (WWM) with a probability of 30%
|