|
--- |
|
datasets: |
|
- eduagarcia/LegalPT |
|
- eduagarcia/cc100-pt |
|
- eduagarcia/OSCAR-2301-pt_dedup |
|
- eduagarcia/brwac_dedup |
|
language: |
|
- pt |
|
pipeline_tag: fill-mask |
|
tags: |
|
- legal |
|
model-index: |
|
- name: RoBERTaLexPT-base |
|
results: |
|
- task: |
|
type: token-classification |
|
dataset: |
|
type: eduagarcia/portuguese_benchmark |
|
name: LeNER |
|
config: LeNER-Br |
|
split: test |
|
metrics: |
|
- type: seqeval |
|
value: 90.73 |
|
name: Mean F1 |
|
args: |
|
scheme: IOB2 |
|
- task: |
|
type: token-classification |
|
dataset: |
|
type: eduagarcia/portuguese_benchmark |
|
name: UlyNER-PL Coarse |
|
config: UlyssesNER-Br-PL-coarse |
|
split: test |
|
metrics: |
|
- type: seqeval |
|
value: 88.56 |
|
name: Mean F1 |
|
args: |
|
scheme: IOB2 |
|
- task: |
|
type: token-classification |
|
dataset: |
|
type: eduagarcia/portuguese_benchmark |
|
name: UlyNER-PL Fine |
|
config: UlyssesNER-Br-PL-fine |
|
split: test |
|
metrics: |
|
- type: seqeval |
|
value: 86.03 |
|
name: Mean F1 |
|
args: |
|
scheme: IOB2 |
|
license: cc-by-4.0 |
|
metrics: |
|
- seqeval |
|
--- |
|
# RoBERTaLexPT-base |
|
|
|
RoBERTaLexPT-base is pretrained from , using [RoBERTa-base](https://huggingface.co/FacebookAI/roberta-base), introduced by [Liu et al. (2019)](https://arxiv.org/abs/1907.11692). |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Funded by:** [More Information Needed] |
|
- **Language(s) (NLP):** Brazilian Portuguese (pt-BR) |
|
- **License:** [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/deed.en) |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://github.com/eduagarcia/roberta-legal-portuguese |
|
- **Paper:** [More Information Needed] |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
[More Information Needed] |
|
|
|
### Training Procedure |
|
|
|
The pretraining process involved training the model for 62,500 steps, with a batch size of 2048 sequences, each containing a maximum of 512 tokens. |
|
This computational setup is similar to the work of [BERTimbau](https://dl.acm.org/doi/abs/10.1007/978-3-030-61377-8_28), exposing the model to approximately 65 billion tokens during training. |
|
|
|
#### Preprocessing [optional] |
|
|
|
[More Information Needed] |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
| **Hyperparameter** | **RoBERTa-base** | |
|
|------------------------|-----------------:| |
|
| Number of layers | 12 | |
|
| Hidden size | 768 | |
|
| FFN inner hidden size | 3072 | |
|
| Attention heads | 12 | |
|
| Attention head size | 64 | |
|
| Dropout | 0.1 | |
|
| Attention dropout | 0.1 | |
|
| Warmup steps | 6k | |
|
| Peak learning rate | 4e-4 | |
|
| Batch size | 2048 | |
|
| Weight decay | 0.01 | |
|
| Maximum training steps | 62.5k | |
|
| Learning rate decay | Linear | |
|
| AdamW $$\epsilon$$ | 1e-6 | |
|
| AdamW $$\beta_1$$ | 0.9 | |
|
| AdamW $$\beta_2$$ | 0.98 | |
|
| Gradient clipping | 0.0 | |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
|
|
[More Information Needed] |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
[More Information Needed] |
|
|
|
### Results |
|
|
|
[More Information Needed] |
|
|
|
#### Summary |
|
|
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
[More Information Needed] |
|
|