|
--- |
|
language: de |
|
license: mit |
|
tags: |
|
- flair |
|
- token-classification |
|
- sequence-tagger-model |
|
base_model: deepset/gbert-base |
|
widget: |
|
- text: PASt ( KvD ) - Polizeipräsidium Westhessen [ Newsroom ] Wiesbaden ( ots ) |
|
- Am Sonntag , den 27.01.2019 führte die Autobahnpolizei Wiesbaden in Zusammenarbeit |
|
mit der Präsidialwache in der Zeit von 11:00 - 16:00 Uhr eine Geschwindigkeitsmessung |
|
in der Baustelle der A66 am Wiesbadener Kreuz durch . |
|
--- |
|
|
|
# Fine-tuned Flair Model on German MobIE Dataset with AutoTrain |
|
|
|
This Flair model was fine-tuned on the |
|
[German MobIE](https://aclanthology.org/2021.konvens-1.22/) |
|
NER Dataset using GBERT Base as backbone LM and the 🚀 [AutoTrain](https://github.com/huggingface/autotrain-advanced) |
|
library. |
|
|
|
## Dataset |
|
|
|
The [German MobIE](https://github.com/DFKI-NLP/MobIE) dataset is a German-language dataset, which is human-annotated |
|
with 20 coarse- and fine-grained entity types and entity linking information for geographically linkable entities. The |
|
dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated |
|
entities, 13.1K of which are linked to a knowledge base. |
|
|
|
The following named entities are annotated: |
|
|
|
* `location-stop` |
|
* `trigger` |
|
* `organization-company` |
|
* `location-city` |
|
* `location` |
|
* `event-cause` |
|
* `location-street` |
|
* `time` |
|
* `date` |
|
* `number` |
|
* `duration` |
|
* `organization` |
|
* `person` |
|
* `set` |
|
* `distance` |
|
* `disaster-type` |
|
* `money` |
|
* `org-position` |
|
* `percent` |
|
|
|
## Fine-Tuning |
|
|
|
The latest [Flair version](https://github.com/flairNLP/flair/tree/42ea3f6854eba04387c38045f160c18bdaac07dc) is used for |
|
fine-tuning. Additionally, the model is trained with the |
|
[FLERT (Schweter and Akbik (2020)](https://arxiv.org/abs/2011.06993) approach, because the MobIE dataset thankfully |
|
comes with document boundary information marker. |
|
|
|
A hyper-parameter search over the following parameters with 5 different seeds per configuration is performed: |
|
|
|
* Batch Sizes: [`16`] |
|
* Learning Rates: [`5e-05`, `3e-05`] |
|
|
|
All models are trained with the awesome [AutoTrain Advanced](https://github.com/huggingface/autotrain-advanced) from |
|
Hugging Face. More details can be found in this [repository](https://github.com/stefan-it/autotrain-flair-mobie). |
|
|
|
## Results |
|
|
|
A hyper-parameter search with 5 different seeds per configuration is performed and micro F1-score on development set |
|
is reported: |
|
|
|
| Configuration | Seed 1 | Seed 2 | Seed 3 | Seed 4 | Seed 5 | Average | |
|
|--------------------|-----------------|-------------|-------------|-------------|--------------|-----------------| |
|
| `bs16-e10-lr5e-05` | [0.8446][1] | [0.8495][2] | [0.8455][3] | [0.8419][4] | [0.8476][5] | 0.8458 ± 0.0029 | |
|
| `bs16-e10-lr3e-05` | [**0.8392**][6] | [0.8445][7] | [0.8495][8] | [0.8381][9] | [0.8449][10] | 0.8432 ± 0.0046 | |
|
|
|
[1]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr5e-05-1 |
|
[2]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr5e-05-2 |
|
[3]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr5e-05-3 |
|
[4]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr5e-05-4 |
|
[5]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr5e-05-5 |
|
[6]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr3e-05-1 |
|
[7]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr3e-05-2 |
|
[8]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr3e-05-3 |
|
[9]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr3e-05-4 |
|
[10]: https://hf.co/stefan-it/autotrain-flair-mobie-gbert_base-bs16-e10-lr3e-05-5 |
|
|
|
The result in bold shows the performance of this model. |
|
|
|
Additionally, the Flair [training log](training.log) and [TensorBoard logs](tensorboard) are also uploaded to the model |
|
hub. |