German Medical NER

Acknowledgement

This model had been created as part of joint research of HUMADEX research group (https://www.linkedin.com/company/101563689/) and has received funding by the European Union Horizon Europe Research and Innovation Program project SMILE (grant number 101080923) and Marie Skłodowska-Curie Actions (MSCA) Doctoral Networks, project BosomShield ((rant number 101073222). Responsibility for the information and views expressed herein lies entirely with the authors. Authors: dr. Izidor Mlakar, Rigon Sallauka, dr. Umut Arioz, dr. Matej Rojc

Publication

The paper associated with this model has been published: 10.3390/app15105585

Please cite this paper as follows if you use this model or build upon this work. Your citation supports the authors and the continued development of this research.

@article{app15105585,
  author  = {Sallauka, Rigon and Arioz, Umut and Rojc, Matej and Mlakar, Izidor},
  title   = {Weakly-Supervised Multilingual Medical NER for Symptom Extraction for Low-Resource Languages},
  journal = {Applied Sciences},
  volume  = {15},
  year    = {2025},
  number  = {10},
  article-number = {5585},
  url     = {https://www.mdpi.com/2076-3417/15/10/5585},
  issn    = {2076-3417},
  doi     = {10.3390/app15105585}
}

Use

Primary Use Case: This model is designed to extract medical entities such as symptoms, diagnostic tests, and treatments from clinical text in the German language.
Applications: Suitable for healthcare professionals, clinical data analysis, and research into medical text processing.
Supported Entity Types:
- PROBLEM : Diseases, symptoms, and medical conditions.
- TEST: Diagnostic procedures and laboratory tests.
- TREATMENT: Medications, therapies, and other medical interventions.

Training Data

Data Sources: Annotated datasets, including clinical data and translations of English medical text into German.
Data Augmentation: The training dataset underwent data augmentation techniques to improve the model's ability to generalize to different text structures.
Dataset Split :
- Training Set: 80%
- Validation Set: 10%
- Test Set: 10%

Model Training

Training Configuration:
- Optimizer: AdamW
- Learning Rate: 3e-5
- Batch Size: 64
- Epochs: 200
- Loss Function: Focal Loss to handle class imbalance
Frameworks: PyTorch, Hugging Face Transformers, SimpleTransformers

Evaluation metrics

eval_loss = 0.2966328261132536
f1_score = 0.7869508628049208
precision = 0.7893554696639308
recall = 0.7845608617193459

Visit HUMADEX/Weekly-Supervised-NER-pipline for more info.

How to Use

You can easily use this model with the Hugging Face transformers library. Here's an example of how to load and use the model for inference:

from transformers import AutoTokenizer, AutoModelForTokenClassification

model_name = "HUMADEX/german_medical_ner"

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Sample text for inference
text = "Der Patient klagte über starke Kopfschmerzen und Übelkeit, die seit zwei Tagen anhielten. Zur Linderung der Symptome wurde ihm Paracetamol verschrieben, und er wurde angewiesen, sich auszuruhen und viel Flüssigkeit zu trinken."

# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt")

HUMADEX
/

german_medical_ner

German Medical NER

Acknowledgement

Publication

Use

Training Data

Model Training

Evaluation metrics

How to Use

Model tree for HUMADEX/german_medical_ner

Dataset used to train HUMADEX/german_medical_ner

Collection including HUMADEX/german_medical_ner

Weakly Supervised Multi-lingual NER pipeline