jacobfulano commited on
Commit
1dc825e
1 Parent(s): c8eb665

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -8,7 +8,8 @@ language:
8
 
9
  # MosaicBERT-Base model
10
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
11
- MosaicBERT-Base achieves higher pretraining and finetuning accuracy than [bert-base-uncased](https://huggingface.co/bert-base-uncased).
 
12
 
13
  ### Model Date
14
 
@@ -16,15 +17,17 @@ March 2023
16
 
17
  ## Documentation
18
  * Blog post
19
- * Github (mosaicml/examples repo)
20
 
21
  # How to use
22
 
 
 
23
  ```python
24
  from transformers import AutoModelforForMaskedLM
25
  mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', use_auth_token=<your token>, trust_remote_code=True)
26
  ```
27
- The tokenizer for this model is the Hugging Face `bert-base-uncased` tokenizer.
28
 
29
  ```python
30
  from transformers import BertTokenizer
@@ -93,7 +96,7 @@ for both MosaicBERT-Base and the baseline BERT-Base. For all BERT-Base models, w
93
 
94
  2. **Higher Masking Ratio for the Masked Language Modeling Objective**: We used the standard Masked Language Modeling (MLM) pretraining objective.
95
  While the original BERT paper also included a Next Sentence Prediction (NSP) task in the pretraining objective,
96
- subsequent papers have shown this to be unnecessary [Liu et al. 2019](https://arxiv.org/abs/1907.11692). For Hugging Face BERT-Base, we used the standard 15% masking ratio.
97
  However, we found that a 30% masking ratio led to slight accuracy improvements in both pretraining MLM and downstream GLUE performance.
98
  We therefore included this simple change as part of our MosaicBERT training recipe. Recent studies have also found that this simple
99
  change can lead to downstream improvements [Wettig et al. 2022](https://arxiv.org/abs/2202.08005).
 
8
 
9
  # MosaicBERT-Base model
10
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
11
+ MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
12
+ Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
13
 
14
  ### Model Date
15
 
 
17
 
18
  ## Documentation
19
  * Blog post
20
+ * [Github (mosaicml/examples/bert repo)](https://github.com/mosaicml/examples/tree/main/examples/bert)
21
 
22
  # How to use
23
 
24
+ We recommend using the code in the [mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert) for pretraining and finetuning this model.
25
+
26
  ```python
27
  from transformers import AutoModelforForMaskedLM
28
  mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', use_auth_token=<your token>, trust_remote_code=True)
29
  ```
30
+ The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
31
 
32
  ```python
33
  from transformers import BertTokenizer
 
96
 
97
  2. **Higher Masking Ratio for the Masked Language Modeling Objective**: We used the standard Masked Language Modeling (MLM) pretraining objective.
98
  While the original BERT paper also included a Next Sentence Prediction (NSP) task in the pretraining objective,
99
+ subsequent papers have shown this to be unnecessary [Liu et al. 2019](https://arxiv.org/abs/1907.11692).
100
  However, we found that a 30% masking ratio led to slight accuracy improvements in both pretraining MLM and downstream GLUE performance.
101
  We therefore included this simple change as part of our MosaicBERT training recipe. Recent studies have also found that this simple
102
  change can lead to downstream improvements [Wettig et al. 2022](https://arxiv.org/abs/2202.08005).