metadata
datasets:
- eduagarcia/LegalPT
- eduagarcia/cc100-pt
- eduagarcia/OSCAR-2301-pt_dedup
- eduagarcia/brwac_dedup
language:
- pt
pipeline_tag: fill-mask
tags:
- legal
model-index:
- name: RoBERTaLexPT-base
results:
- task:
type: token-classification
dataset:
type: eduagarcia/portuguese_benchmark
name: LeNER
config: LeNER-Br
split: test
metrics:
- type: seqeval
value: 90.73
name: Mean F1
args:
scheme: IOB2
- task:
type: token-classification
dataset:
type: eduagarcia/portuguese_benchmark
name: UlyNER-PL Coarse
config: UlyssesNER-Br-PL-coarse
split: test
metrics:
- type: seqeval
value: 88.56
name: Mean F1
args:
scheme: IOB2
- task:
type: token-classification
dataset:
type: eduagarcia/portuguese_benchmark
name: UlyNER-PL Fine
config: UlyssesNER-Br-PL-fine
split: test
metrics:
- type: seqeval
value: 86.03
name: Mean F1
args:
scheme: IOB2
license: cc-by-4.0
metrics:
- seqeval
RoBERTaLexPT-base
RoBERTaLexPT-base is pretrained from , using RoBERTa-base, introduced by Liu et al. (2019).
Model Details
Model Description
- Funded by: [More Information Needed]
- Language(s) (NLP): Brazilian Portuguese (pt-BR)
- License: Creative Commons Attribution 4.0 International Public License
Model Sources
- Repository: https://github.com/eduagarcia/roberta-legal-portuguese
- Paper: [More Information Needed]
Training Details
Training Data
[More Information Needed]
Training Procedure
The pretraining process involved training the model for 62,500 steps, with a batch size of 2048 sequences, each containing a maximum of 512 tokens. This computational setup is similar to the work of BERTimbau, exposing the model to approximately 65 billion tokens during training.
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
Hyperparameter | RoBERTa-base |
---|---|
Number of layers | 12 |
Hidden size | 768 |
FFN inner hidden size | 3072 |
Attention heads | 12 |
Attention head size | 64 |
Dropout | 0.1 |
Attention dropout | 0.1 |
Warmup steps | 6k |
Peak learning rate | 4e-4 |
Batch size | 2048 |
Weight decay | 0.01 |
Maximum training steps | 62.5k |
Learning rate decay | Linear |
AdamW $$\epsilon$$ | 1e-6 |
AdamW $$\beta_1$$ | 0.9 |
AdamW $$\beta_2$$ | 0.98 |
Gradient clipping | 0.0 |
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Citation
[More Information Needed]