Fill-Mask
Transformers
PyTorch
German
bert
Inference Endpoints
scherrmann commited on
Commit
9f1e7f3
β€’
1 Parent(s): 97d6359

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -5,11 +5,12 @@ language:
5
  ---
6
  # German FinBERT (Further Pre-trained Version)
7
 
8
- German FinBERT is a BERT language model focusing on the financial domain within the German language. In my [paper](https://arxiv.org/pdf/2010.10906.pdf) (UPDATE!!), I describe in more detail the steps taken to train the model and show that it outperforms its generic benchmarks for finance specific downstream tasks.
9
  This version of German FinBERT starts with the [gbert-base](https://huggingface.co/deepset/gbert-base) model and continues pre-training on finance specific textual data.
10
 
11
  ## Overview
12
- **Author:** [here](https://arxiv.org/pdf/2010.10906.pdf) (UPDATE!)
 
13
  **Archticture:** BERT base
14
  **Language:** German
15
  **Specialization:** Financial textual data
@@ -20,13 +21,13 @@ This version of German FinBERT starts with the [gbert-base](https://huggingface.
20
  German FinBERT's pre-training corpus includes a diverse range of financial documents, such as Bundesanzeiger reports, Handelsblatt articles, MarketScreener data, and additional sources including FAZ, ad-hoc announcements, LexisNexis & Event Registry content, Zeit Online articles, Wikipedia entries, and Gabler Wirtschaftslexikon. In total, the corpus spans from 1996 to 2023, consisting of 12.15 million documents with 10.12 billion tokens over 53.19 GB.
21
 
22
  I further pre-train the model for 10,400 steps with a batch size of 4096, which is one epoch. I use an Adam optimizer with decoupled weight decay regularization, with Adam parameters 0.9, 0.98, 1e βˆ’ 6, a weight
23
- decay of 1e βˆ’ 5 and a maximal learning of 1e βˆ’ 4. . I train the model using a Nvidia DGX A100 node consisting of 8 A100 GPUs with 80 GB of memory each.
24
 
25
  ## Performance
26
  ### Fine-tune Datasets
27
  To fine-tune the model, I use several datasets, including:
28
- - A manually labeled [multi-label database of German ad-hoc announcements](https://arxiv.org/pdf/2010.10906.pdf) (UPDATE!!!) containing 31,771 sentences, each associated with up to 20 possible topics.
29
- - An extractive question-answering dataset based on the SQuAD format, which was created using 3,044 ad-hoc announcements processed by OpenAI's ChatGPT to generate and answer questions.
30
  - The [financial phrase bank](https://arxiv.org/abs/1307.5336) of Malo et al. (2013) for sentiment classification, translated to German using [DeepL](https://www.deepl.com/translator)
31
 
32
  ### Benchmark Results
 
5
  ---
6
  # German FinBERT (Further Pre-trained Version)
7
 
8
+ German FinBERT is a BERT language model focusing on the financial domain within the German language. In my [paper](https://arxiv.org/pdf/2311.08793.pdf), I describe in more detail the steps taken to train the model and show that it outperforms its generic benchmarks for finance specific downstream tasks.
9
  This version of German FinBERT starts with the [gbert-base](https://huggingface.co/deepset/gbert-base) model and continues pre-training on finance specific textual data.
10
 
11
  ## Overview
12
+ **Author** Moritz Scherrmann
13
+ **Paper:** [here](https://arxiv.org/pdf/2311.08793.pdf)
14
  **Archticture:** BERT base
15
  **Language:** German
16
  **Specialization:** Financial textual data
 
21
  German FinBERT's pre-training corpus includes a diverse range of financial documents, such as Bundesanzeiger reports, Handelsblatt articles, MarketScreener data, and additional sources including FAZ, ad-hoc announcements, LexisNexis & Event Registry content, Zeit Online articles, Wikipedia entries, and Gabler Wirtschaftslexikon. In total, the corpus spans from 1996 to 2023, consisting of 12.15 million documents with 10.12 billion tokens over 53.19 GB.
22
 
23
  I further pre-train the model for 10,400 steps with a batch size of 4096, which is one epoch. I use an Adam optimizer with decoupled weight decay regularization, with Adam parameters 0.9, 0.98, 1e βˆ’ 6, a weight
24
+ decay of 1e βˆ’ 5 and a maximal learning of 1e βˆ’ 4. I train the model using a Nvidia DGX A100 node consisting of 8 A100 GPUs with 80 GB of memory each.
25
 
26
  ## Performance
27
  ### Fine-tune Datasets
28
  To fine-tune the model, I use several datasets, including:
29
+ - A manually labeled [multi-label database of German ad-hoc announcements](https://arxiv.org/pdf/2311.07598.pdf) containing 31,771 sentences, each associated with up to 20 possible topics.
30
+ - An extractive question-answering dataset based on the SQuAD format, which was created using 3,044 ad-hoc announcements processed by OpenAI's ChatGPT to generate and answer questions (see [here](https://huggingface.co/datasets/scherrmann/adhoc_quad)).
31
  - The [financial phrase bank](https://arxiv.org/abs/1307.5336) of Malo et al. (2013) for sentiment classification, translated to German using [DeepL](https://www.deepl.com/translator)
32
 
33
  ### Benchmark Results