RoBERTaLexPT-base / README.md
eduagarcia's picture
Update README.md
9e761cf verified
|
raw
history blame
3.96 kB
metadata
datasets:
  - eduagarcia/LegalPT
  - eduagarcia/cc100-pt
  - eduagarcia/OSCAR-2301-pt_dedup
  - eduagarcia/brwac_dedup
language:
  - pt
pipeline_tag: fill-mask
tags:
  - legal
model-index:
  - name: RoBERTaLexPT-base
    results:
      - task:
          type: token-classification
        dataset:
          type: eduagarcia/portuguese_benchmark
          name: LeNER
          config: LeNER-Br
          split: test
        metrics:
          - type: seqeval
            value: 90.73
            name: Mean F1
            args:
              scheme: IOB2
      - task:
          type: token-classification
        dataset:
          type: eduagarcia/portuguese_benchmark
          name: UlyNER-PL Coarse
          config: UlyssesNER-Br-PL-coarse
          split: test
        metrics:
          - type: seqeval
            value: 88.56
            name: Mean F1
            args:
              scheme: IOB2
      - task:
          type: token-classification
        dataset:
          type: eduagarcia/portuguese_benchmark
          name: UlyNER-PL Fine
          config: UlyssesNER-Br-PL-fine
          split: test
        metrics:
          - type: seqeval
            value: 86.03
            name: Mean F1
            args:
              scheme: IOB2
license: cc-by-4.0
metrics:
  - seqeval

RoBERTaLexPT-base

RoBERTaLexPT-base is pretrained from , using RoBERTa-base, introduced by Liu et al. (2019).

Model Details

Model Description

Model Sources

Training Details

Training Data

[More Information Needed]

Training Procedure

The pretraining process involved training the model for 62,500 steps, with a batch size of 2048 sequences, each containing a maximum of 512 tokens. This computational setup is similar to the work of BERTimbau, exposing the model to approximately 65 billion tokens during training.

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Hyperparameter RoBERTa-base
Number of layers 12
Hidden size 768
FFN inner hidden size 3072
Attention heads 12
Attention head size 64
Dropout 0.1
Attention dropout 0.1
Warmup steps 6k
Peak learning rate 4e-4
Batch size 2048
Weight decay 0.01
Maximum training steps 62.5k
Learning rate decay Linear
AdamW $$\epsilon$$ 1e-6
AdamW $$\beta_1$$ 0.9
AdamW $$\beta_2$$ 0.98
Gradient clipping 0.0

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Citation

[More Information Needed]