Fill-Mask
Transformers
PyTorch
Danish
roberta
legal
Inference Endpoints
kiddothe2b commited on
Commit
9ea2ae5
·
1 Parent(s): 5443746

Add. 100k steps with max_seq_length=512

Browse files
Files changed (2) hide show
  1. README.md +34 -24
  2. pytorch_model.bin +1 -1
README.md CHANGED
@@ -1,25 +1,29 @@
1
  ---
 
 
 
2
  tags:
3
- - generated_from_trainer
 
 
4
  datasets:
5
- - custom_legal_danish_corpus
 
6
  model-index:
7
- - name: danish-lex-lm-base-mlm
8
  results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
  # danish-lex-lm-base-mlm
15
 
16
- This model is a fine-tuned version of [data/PLMs/danish-lm/danish-lex-lm-base](https://huggingface.co/data/PLMs/danish-lm/danish-lex-lm-base) on the custom_legal_danish_corpus dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 0.7302
 
19
 
20
  ## Model description
21
 
22
- More information needed
23
 
24
  ## Intended uses & limitations
25
 
@@ -27,10 +31,12 @@ More information needed
27
 
28
  ## Training and evaluation data
29
 
30
- More information needed
31
 
32
  ## Training procedure
33
 
 
 
34
  ### Training hyperparameters
35
 
36
  The following hyperparameters were used during training:
@@ -46,23 +52,27 @@ The following hyperparameters were used during training:
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: cosine
48
  - lr_scheduler_warmup_ratio: 0.05
49
- - training_steps: 500000
50
 
51
  ### Training results
52
 
53
- | Training Loss | Epoch | Step | Validation Loss |
54
- |:-------------:|:-----:|:------:|:---------------:|
55
- | 1.4648 | 5.36 | 50000 | 1.2920 |
56
- | 1.2165 | 10.72 | 100000 | 1.0625 |
57
- | 1.0952 | 16.07 | 150000 | 0.9611 |
58
- | 1.0233 | 21.43 | 200000 | 0.8931 |
59
- | 0.963 | 26.79 | 250000 | 0.8477 |
60
- | 0.9122 | 32.15 | 300000 | 0.8168 |
61
- | 0.8697 | 37.51 | 350000 | 0.7836 |
62
- | 0.8397 | 42.86 | 400000 | 0.7560 |
63
- | 0.8231 | 48.22 | 450000 | 0.7476 |
64
- | 0.8207 | 53.58 | 500000 | 0.7243 |
65
-
 
 
 
 
66
 
67
  ### Framework versions
68
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: fill-mask
4
+
5
  tags:
6
+ - legal
7
+ language:
8
+ -da
9
  datasets:
10
+ - multi_eurlex
11
+ - DDSC/partial-danish-gigaword-no-twitter
12
  model-index:
13
+ - name: coastalcph/danish-lex-lm-base
14
  results: []
15
  ---
16
 
 
 
 
17
  # danish-lex-lm-base-mlm
18
 
19
+ This model is pre-training on a combination of the Danish part of the MultiEURLEX (Chalkidis et al., 2021) dataset comprising EU legislation and two subsets (`retsinformationdk`, `retspraksis`) of the Danish Gigaword Corpus (Derczynski et al., 2021)) comprising legal proceedings.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.7302 (up to 128 tokens)
22
+ - Loss: 0.7847 (up to 512 tokens)
23
 
24
  ## Model description
25
 
26
+ This is a RoBERTa (Liu et al., 2019) model pre-training on Danish legal corpora. It follows a base configurations with 12 Transformer layers, each one with 768 hidden units and 12 attention heads.
27
 
28
  ## Intended uses & limitations
29
 
 
31
 
32
  ## Training and evaluation data
33
 
34
+ This model is pre-training on a combination of the Danish part of the MultiEURLEX dataset and two subsets (`retsinformationdk`, `retspraksis`) of the Danish Gigaword Corpus.
35
 
36
  ## Training procedure
37
 
38
+ The model was initially pre-trained for 500k steps with sequences up to 128 tokens, and then continued pre-training for additional 100k with sequences up to 512 tokens.
39
+
40
  ### Training hyperparameters
41
 
42
  The following hyperparameters were used during training:
 
52
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
  - lr_scheduler_type: cosine
54
  - lr_scheduler_warmup_ratio: 0.05
55
+ - training_steps: 500000 + 100000
56
 
57
  ### Training results
58
 
59
+ | Training Loss | Length | Step | Validation Loss |
60
+ |:-------------:|:------:|:-------:|:---------------:|
61
+ | 1.4648 | 128 | 50000 | 1.2920 |
62
+ | 1.2165 | 128 | 100000 | 1.0625 |
63
+ | 1.0952 | 128 | 150000 | 0.9611 |
64
+ | 1.0233 | 128 | 200000 | 0.8931 |
65
+ | 0.963 | 128 | 250000 | 0.8477 |
66
+ | 0.9122 | 128 | 300000 | 0.8168 |
67
+ | 0.8697 | 128 | 350000 | 0.7836 |
68
+ | 0.8397 | 128 | 400000 | 0.7560 |
69
+ | 0.8231 | 128 | 450000 | 0.7476 |
70
+ | 0.8207 | 128 | 500000 | 0.7243 |
71
+
72
+ | Training Loss | Length | Step | Validation Loss |
73
+ |:-------------:|:------:|:-------:|:---------------:|
74
+ | 0.7045 | 512 | +50000 | 0.8318 |
75
+ | 0.6432 | 512 | +100000 | 0.7913 |
76
 
77
  ### Framework versions
78
 
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:45679ead40348bea4447e1d2c33cde39cb39fd41d7f5c2571f2291863d0115cd
3
  size 442675755
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:231970291ed388ead49a620480330c5722b32dc7e82a19825bc24f9cc6d67bda
3
  size 442675755