Fill-Mask
Transformers
PyTorch
Danish
roberta
legal
Inference Endpoints
kiddothe2b's picture
Update README.md
df93f18
|
raw
history blame
2.79 kB
metadata
license: cc-by-nc-4.0
pipeline_tag: fill-mask
tags:
  - legal
language:
  - da
datasets:
  - multi_eurlex
  - DDSC/partial-danish-gigaword-no-twitter
model-index:
  - name: coastalcph/danish-legal-lm-base
    results: []

Danish Legal LM

This model is pre-training on a combination of the Danish part of the MultiEURLEX (Chalkidis et al., 2021) dataset comprising EU legislation and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus (Derczynski et al., 2021) comprising legal proceedings. It achieves the following results on the evaluation set:

  • Loss: 0.7302 (up to 128 tokens)
  • Loss: 0.7847 (up to 512 tokens)

Model description

This is a RoBERTa (Liu et al., 2019) model pre-trained on Danish legal corpora. It follows a base configuration with 12 Transformer layers, each one with 768 hidden units and 12 attention heads.

Intended uses & limitations

More information needed

Training and evaluation data

This model is pre-training on a combination of the Danish part of the MultiEURLEX dataset and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus.

Training procedure

The model was initially pre-trained for 500k steps with sequences up to 128 tokens, and then continued pre-training for additional 100k with sequences up to 512 tokens.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: tpu
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 256
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 500000 + 100000

Training results

Training Loss Length Step Validation Loss
1.4648 128 50000 1.2920
1.2165 128 100000 1.0625
1.0952 128 150000 0.9611
1.0233 128 200000 0.8931
0.963 128 250000 0.8477
0.9122 128 300000 0.8168
0.8697 128 350000 0.7836
0.8397 128 400000 0.7560
0.8231 128 450000 0.7476
0.8207 128 500000 0.7243
Training Loss Length Step Validation Loss
0.7045 512 +50000 0.8318
0.6432 512 +100000 0.7913

Framework versions

  • Transformers 4.18.0
  • Pytorch 1.12.0+cu102
  • Datasets 2.0.0
  • Tokenizers 0.12.0