Fill-Mask
Transformers
PyTorch
Spanish
roberta
legal
spanish
Inference Endpoints
File size: 1,542 Bytes
38cc31e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4a37d06
 
 
 
 
 
 
 
38cc31e
 
4844eb6
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
language:
- es
license: apache-2.0
tags:
- legal
- spanish
datasets:
- legal_ES
- temu_legal  
metrics:
- ppl
widget:
- text: "La ley fue <mask> finalmente." 
- text: "El Tribunal <mask> desestimó el recurso de amparo."
- text: "Hay base legal dentro del marco <mask> actual."

---
# Spanish Legal-domain RoBERTa

There are two main models made specifically for the Spanish language, the BETO model and a GPT-2. There is also a multilingual BERT (mBERT) that is often used as it might be better sometimes.

Both BETO and GPT-2 models for Spanish have been trained with rather low resources, 4GB and 3GB of data respectively. The data used for training both models might be various but the amount is not enough to cover all domains. Furthermore, training a BERT-like domain-specific model is better as it effectively covers the vocabulary and understands the legal jargon. We present our model trained on 9GB that are specifically of the legal domain.

## Citing 
```
@misc{gutierrezfandino2021legal,
      title={Spanish Legalese Language Model and Corpora}, 
      author={Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Aitor Gonzalez-Agirre and Marta Villegas},
      year={2021},
      eprint={2110.12201},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

For more information visit our [GitHub repository](https://github.com/PlanTL-GOB-ES/lm-legal-es)

## Funding
This work was funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) within the framework of the Plan-TL.