--- datasets: - eduagarcia/LegalPT - eduagarcia/cc100-pt - eduagarcia/OSCAR-2301-pt_dedup - eduagarcia/brwac_dedup language: - pt pipeline_tag: fill-mask tags: - legal model-index: - name: RoBERTaLexPT-base results: - task: type: token-classification dataset: type: eduagarcia/portuguese_benchmark name: LeNER config: LeNER-Br split: test metrics: - type: seqeval value: 90.73 name: Mean F1 args: scheme: IOB2 - task: type: token-classification dataset: type: eduagarcia/portuguese_benchmark name: UlyNER-PL Coarse config: UlyssesNER-Br-PL-coarse split: test metrics: - type: seqeval value: 88.56 name: Mean F1 args: scheme: IOB2 - task: type: token-classification dataset: type: eduagarcia/portuguese_benchmark name: UlyNER-PL Fine config: UlyssesNER-Br-PL-fine split: test metrics: - type: seqeval value: 86.03 name: Mean F1 args: scheme: IOB2 license: cc-by-4.0 metrics: - seqeval --- # RoBERTaLexPT-base RoBERTaLexPT-base is pretrained from , using [RoBERTa-base](https://huggingface.co/FacebookAI/roberta-base), introduced by [Liu et al. (2019)](https://arxiv.org/abs/1907.11692). ## Model Details ### Model Description - **Funded by:** [More Information Needed] - **Language(s) (NLP):** Brazilian Portuguese (pt-BR) - **License:** [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/deed.en) ### Model Sources - **Repository:** https://github.com/eduagarcia/roberta-legal-portuguese - **Paper:** [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure The pretraining process involved training the model for 62,500 steps, with a batch size of 2048 sequences, each containing a maximum of 512 tokens. This computational setup is similar to the work of [BERTimbau](https://dl.acm.org/doi/abs/10.1007/978-3-030-61377-8_28), exposing the model to approximately 65 billion tokens during training. #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters | **Hyperparameter** | **RoBERTa-base** | |------------------------|-----------------:| | Number of layers | 12 | | Hidden size | 768 | | FFN inner hidden size | 3072 | | Attention heads | 12 | | Attention head size | 64 | | Dropout | 0.1 | | Attention dropout | 0.1 | | Warmup steps | 6k | | Peak learning rate | 4e-4 | | Batch size | 2048 | | Weight decay | 0.01 | | Maximum training steps | 62.5k | | Learning rate decay | Linear | | AdamW $$\epsilon$$ | 1e-6 | | AdamW $$\beta_1$$ | 0.9 | | AdamW $$\beta_2$$ | 0.98 | | Gradient clipping | 0.0 | ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Citation [More Information Needed]