TinyBERT for Demo NER (German)

Model Description

This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in German medical texts.

It was fine-tuned from the DedalusHealthCare/tinybert-mlm-de masked language model using the DedalusHealthCare/ner_demo_de dataset.

Base Model: DedalusHealthCare/tinybert-mlm-de

Training Dataset: DedalusHealthCare/ner_demo_de

Task: Token Classification (Named Entity Recognition)

Language: German (de)

Entities: DISORDER_FINDING

Model Format: PYTORCH+ONNX

Please use max as aggregation strategy in the NER pipeline (see example below).

Training Details

  • Training epochs: 1
  • Learning rate: N/A
  • Training batch size: 32
  • Evaluation batch size: 32
  • Max sequence length: 256
  • Warmup steps: N/A
  • FP16: False
  • Gradient accumulation steps: 2
  • Evaluation accumulation steps: 2
  • Save steps: 15000
  • Evaluation steps: 10000
  • Evaluation strategy: steps
  • Random seed: 33
  • Label all tokens: True
  • Balanced training: False
  • Chunk mode: sliding_window
  • Stride: 16
  • Max training samples: None
  • Max evaluation samples: 10000
  • Early stopping patience: 0
  • Early stopping threshold: 0.0

Use Case Configuration

  • Use case name: demo
  • Language: German (de)
  • Target entities: DISORDER_FINDING
  • Text processing max length: N/A
  • Entity labeling scheme: N/A

Usage

Using Transformers Pipeline

from transformers import pipeline

# Load the model
ner_pipeline = pipeline(
    "ner",
    model="DedalusHealthCare/tinybert-ner-demo-de",
    tokenizer="DedalusHealthCare/tinybert-ner-demo-de",
    aggregation_strategy="max"
)

# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."

# Get predictions
entities = ner_pipeline(text)
print(entities)

Using AutoModel and AutoTokenizer

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "DedalusHealthCare/tinybert-ner-demo-de"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Tokenize text
text = "Der Patient hat Diabetes und Bluthochdruck."
tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get predictions
with torch.no_grad():
    outputs = model(**tokens)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get labels
predicted_token_class_ids = predictions.argmax(-1)
labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]

Using ONNX Runtime (Optimized Inference)

from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline
import torch

# Load ONNX model for faster inference
model_name = "DedalusHealthCare/tinybert-ner-demo-de"
onnx_model = ORTModelForTokenClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create pipeline with ONNX model (recommended)
ner_pipeline = pipeline(
    "ner",
    model=onnx_model,
    tokenizer=tokenizer,
    aggregation_strategy="max"
)

# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."
entities = ner_pipeline(text)
print(entities)

# Direct model usage
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    outputs = onnx_model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

predicted_token_class_ids = predictions.argmax(-1)
token_labels = [onnx_model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]

Performance Comparison

  • PyTorch: Standard format, suitable for training and research
  • ONNX: Optimized for inference, typically 2-4x faster than PyTorch
  • Recommendation: Use ONNX for production inference, PyTorch for research

Model Architecture

This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition.

Intended Use

This model is intended for:

  • Named Entity Recognition in German medical texts
  • Identification of DISORDER_FINDING entities
  • Medical text processing and analysis
  • Research and development in medical NLP

Limitations

  • Trained specifically for German medical texts
  • Performance may vary on texts from different medical domains
  • May not generalize well to non-medical texts
  • Requires careful evaluation on new datasets

Ethical Considerations

  • This model is trained on medical data and should be used responsibly
  • Outputs should be validated by medical professionals
  • Patient privacy and data protection regulations must be followed
  • The model may have biases present in the training data

Model Performance

This model has been evaluated on the goldset from ner_disorderfinding_de_goldset using IO evaluation (sklearn, token level, lenient) with the following results:

Overall Performance

Metric Score
Precision (Macro) 0.423825
Recall (Macro) 0.467183
F1-Score (Macro) 0.435170
Precision (Weighted) 0.599471
Recall (Weighted) 0.697989
F1-Score (Weighted) 0.640426

Inference Performance: 5.53 seconds for evaluation dataset

Entity-Level Performance (IO Evaluation)

Entity Type Precision Recall F1-Score Support
DISORDER_FINDING 0.753533 0.900434 0.820460 N/A

Evaluation Details

  • Dataset: goldset from ner_disorderfinding_de_goldset
  • Dataset Source: goldset
  • Evaluation Date: 2025-11-03 12:25:56
  • Language: de
  • Entities: DISORDER_FINDING

This evaluation section is automatically generated and updated.

Citation

If you use this model, please cite:

@model{demo_de_ner_model,
  title = {TinyBERT for Demo NER (German)},
  author = {DH Healthcare GmbH},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-de}
}

License

This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.

Contact

For questions or support, please contact DH Healthcare GmbH.

Downloads last month
5
Safetensors
Model size
12.2M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DedalusHealthCare/tinybert-ner-demo-de

Quantized
(2)
this model