eduagarcia
/

RoBERTaLexPT-base

@@ -116,30 +116,34 @@ We adopted the standard [RoBERTa hyperparameters](https://arxiv.org/abs/1907.116
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
 The model was evaluated on ["PortuLex" benchmark](eduagarcia/portuguese_benchmark), a four-task benchmark designed to evaluate the quality and performance of language models in the Portuguese legal domain.
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
 ## Citation
 [More Information Needed]
 ## Acknowledgment

 ## Evaluation
 The model was evaluated on ["PortuLex" benchmark](eduagarcia/portuguese_benchmark), a four-task benchmark designed to evaluate the quality and performance of language models in the Portuguese legal domain.
+Macro F1-Score (\%) for multiple models evaluated on PortuLex benchmark test splits:
+| **Model**                                                                  | **LeNER** | **UlyNER-PL**   | **FGV-STF** |  **RRIP** | **Average (%)** |
+|----------------------------------------------------------------------------|-----------|-----------------|-------------|:---------:|-----------------|
+|                                                                            |           | Coarse/Fine     | Coarse      |           |                 |
+| [BERTimbau-base](https://dl.acm.org/doi/abs/10.1007/978-3-030-61377-8_28)  | 88.34     | 86.39/83.83     | 79.34       |   82.34   | 83.78           |
+| [BERTimbau-large](https://dl.acm.org/doi/abs/10.1007/978-3-030-61377-8_28) | 88.64     | 87.77/84.74     | 79.71       | **83.79** | 84.60           |
+| [Albertina-PT-BR-base](https://arxiv.org/abs/2305.06721)                   | 89.26     | 86.35/84.63     | 79.30       |   81.16   | 83.80           |
+| [Albertina-PT-BR-xlarge](https://arxiv.org/abs/2305.06721)                 | 90.09     | 88.36/**86.62** | 79.94       |   82.79   | 85.08           |
+| [BERTikal-base](https://arxiv.org/abs/2110.15709)                          | 83.68     | 79.21/75.70     | 77.73       |   81.11   | 79.99           |
+| [JurisBERT-base](https://repositorio.ufms.br/handle/123456789/5119)        | 81.74     | 81.67/77.97     | 76.04       |   80.85   | 79.61           |
+| [BERTimbauLAW-base](https://repositorio.ufms.br/handle/123456789/5119)     | 84.90     | 87.11/84.42     | 79.78       |   82.35   | 83.20           |
+| [Legal-XLM-R-base](https://arxiv.org/abs/2306.02069)                       | 87.48     | 83.49/83.16     | 79.79       |   82.35   | 83.24           |
+| [Legal-XLM-R-large](https://arxiv.org/abs/2306.02069)                      | 88.39     | 84.65/84.55     | 79.36       |   81.66   | 83.50           |
+| [Legal-RoBERTa-PT-large](https://arxiv.org/abs/2306.02069)                 | 87.96     | 88.32/84.83     | 79.57       |   81.98   | 84.02           |
+| RoBERTaTimbau-base                                                         | 89.68     | 87.53/85.74     | 78.82       |   82.03   | 84.29           |
+| RoBERTaLegalPT-base                                                        | 90.59     | 85.45/84.40     | 79.92       |   82.84   | 84.57           |
+| RoBERTaLexPT-base                                                          | **90.73** | **88.56**/86.03 | **80.40**   |   83.22   | **85.41**       |
+In summary, RoBERTaLexPT consistently achieves top legal NLP effectiveness despite its base size.
+With sufficient pre-training data, it can surpass overparameterized models. The results highlight the importance of domain-diverse training data over sheer model scale.
 ## Citation
 [More Information Needed]
 ## Acknowledgment
+This work has been supported by the AI Center of Excellence (Centro de Excelência em Inteligência Artificial – CEIA) of the Institute of Informatics at the Federal University of Goiás (INF-UFG).