eduagarcia
commited on
Commit
•
338b0a8
1
Parent(s):
d789b2b
Update README.md
Browse files
README.md
CHANGED
@@ -129,7 +129,7 @@ With sufficient pre-training data, it can surpass larger models. The results hig
|
|
129 |
|
130 |
## Training Details
|
131 |
|
132 |
-
RoBERTaLexPT-base is pretrained
|
133 |
- [LegalPT](https://huggingface.co/datasets/eduagarcia/LegalPT_dedup) is a Portuguese legal corpus by aggregating diverse sources of up to 125GiB data.
|
134 |
- [CrawlPT](https://huggingface.co/datasets/eduagarcia/CrawlPT_dedup) is a composition of three Portuguese general corpora: [brWaC](https://huggingface.co/datasets/brwac), [CC100 PT subset](https://huggingface.co/datasets/eduagarcia/cc100-pt), [OSCAR-2301 PT subset](https://huggingface.co/datasets/eduagarcia/OSCAR-2301-pt_dedup).
|
135 |
|
@@ -149,10 +149,10 @@ To ensure that domain models are not constrained by a generic vocabulary, we uti
|
|
149 |
|
150 |
#### Training Hyperparameters
|
151 |
|
152 |
-
The pretraining process involved training the model for 62,500 steps, with a batch size of 2048 and a learning rate of 4e-4, each sequence containing a maximum of 512 tokens.
|
153 |
-
The weight initialization is random.
|
154 |
-
We employed the masked language modeling objective, where 15\% of the input tokens were randomly masked.
|
155 |
-
The optimization was performed using the AdamW optimizer with a linear warmup and a linear decay learning rate schedule.
|
156 |
|
157 |
For other parameters we adopted the standard [RoBERTa-base hyperparameters](https://huggingface.co/FacebookAI/roberta-base):
|
158 |
|
|
|
129 |
|
130 |
## Training Details
|
131 |
|
132 |
+
RoBERTaLexPT-base is pretrained on:
|
133 |
- [LegalPT](https://huggingface.co/datasets/eduagarcia/LegalPT_dedup) is a Portuguese legal corpus by aggregating diverse sources of up to 125GiB data.
|
134 |
- [CrawlPT](https://huggingface.co/datasets/eduagarcia/CrawlPT_dedup) is a composition of three Portuguese general corpora: [brWaC](https://huggingface.co/datasets/brwac), [CC100 PT subset](https://huggingface.co/datasets/eduagarcia/cc100-pt), [OSCAR-2301 PT subset](https://huggingface.co/datasets/eduagarcia/OSCAR-2301-pt_dedup).
|
135 |
|
|
|
149 |
|
150 |
#### Training Hyperparameters
|
151 |
|
152 |
+
The pretraining process involved training the model for 62,500 steps, with a batch size of 2048 and a learning rate of 4e-4, each sequence containing a maximum of 512 tokens.
|
153 |
+
The weight initialization is random.
|
154 |
+
We employed the masked language modeling objective, where 15\% of the input tokens were randomly masked.
|
155 |
+
The optimization was performed using the AdamW optimizer with a linear warmup and a linear decay learning rate schedule.
|
156 |
|
157 |
For other parameters we adopted the standard [RoBERTa-base hyperparameters](https://huggingface.co/FacebookAI/roberta-base):
|
158 |
|