raquelsilveira commited on
Commit
6f0883a
1 Parent(s): 4c7c9fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -11,4 +11,35 @@ LegalBert-pt is a language model for the legal domain in the Portuguese language
11
  |Model|Initial model|#Layers|#Params|
12
  |-|-|-|-|
13
  |LegalBert-pt SC| |12|110M|
14
- |LegalBert-pt FP| neuralmind/bert-base-portuguese-cased | 12 | 110M |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  |Model|Initial model|#Layers|#Params|
12
  |-|-|-|-|
13
  |LegalBert-pt SC| |12|110M|
14
+ |LegalBert-pt FP| neuralmind/bert-base-portuguese-cased | 12 | 110M |
15
+
16
+ ## Dataset
17
+
18
+ To pretrain various versions of the LegalBert-pt language model, we collected a total of 1.5 million legal documents in Portuguese from ten Brazilian courts. These documents consisted of four types: initial petitions, petitions, decisions, and sentences. Table shows the distribution of these documents.
19
+
20
+ The data were obtained from the Codex system of the Brazilian National Council of Justice (CNJ), which maintains the largest and most diverse set of legal texts in Brazilian Portuguese. As part of an agreement established with the researchers who authored this article, the CNJ provided these data for our research.
21
+
22
+ |Data source|Number of documents|%|
23
+ |-|-|-|
24
+ |Court of Justice of the State of Ceará|80,504|5.37\%|
25
+ |Court of Justice of the State of Piauí|90,514|6.03|
26
+ |Court of Justice of the State of Rio de Janeiro|33,320|2.22|
27
+ |Court of Justice of the State of Rondônia|971,615|64.77|
28
+ |Federal Regional Court of the 3rd Region|70,196|4.68|
29
+ |Federal Regional Court of the 5th Region|6,767|0.45|
30
+ |Regional Labor Court of the 9th Region|16,133|1.08|
31
+ |Regional Labor Court of the 11th Region|5,351|0.36|
32
+ |Regional Labor Court of the 13th Region|155,567|10.37|
33
+ |Regional Labor Court of the 23th Region|70,033|4.67|
34
+ |Total|1,500,000|100.00\% |
35
+
36
+ ## Usage
37
+
38
+ ```python
39
+ from transformers import AutoTokenizer # Or BertTokenizer
40
+ from transformers import AutoModelForPreTraining # Or BertForPreTraining for loading pretraining heads
41
+ from transformers import AutoModel # or BertModel, for BERT without pretraining heads
42
+
43
+ model = AutoModelForPreTraining.from_pretrained('raquelsilveira/legalbertpt_sc')
44
+ tokenizer = AutoTokenizer.from_pretrained('raquelsilveira/legalbertpt_sc')
45
+ ```