TinyBERT for Demo NER (German)
Model Description
This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in German medical texts.
It was fine-tuned from the DedalusHealthCare/tinybert-mlm-de masked language model using the DedalusHealthCare/ner_demo_de dataset.
Base Model: DedalusHealthCare/tinybert-mlm-de
Training Dataset: DedalusHealthCare/ner_demo_de
Task: Token Classification (Named Entity Recognition)
Language: German (de)
Entities: DISORDER_FINDING
Model Format: PYTORCH+ONNX
Please use max as aggregation strategy in the NER pipeline (see example below).
Training Details
- Training epochs: 1
- Learning rate: N/A
- Training batch size: 32
- Evaluation batch size: 32
- Max sequence length: 256
- Warmup steps: N/A
- FP16: False
- Gradient accumulation steps: 2
- Evaluation accumulation steps: 2
- Save steps: 15000
- Evaluation steps: 10000
- Evaluation strategy: steps
- Random seed: 33
- Label all tokens: True
- Balanced training: False
- Chunk mode: sliding_window
- Stride: 16
- Max training samples: None
- Max evaluation samples: 10000
- Early stopping patience: 0
- Early stopping threshold: 0.0
Use Case Configuration
- Use case name: demo
- Language: German (de)
- Target entities: DISORDER_FINDING
- Text processing max length: N/A
- Entity labeling scheme: N/A
Usage
Using Transformers Pipeline
from transformers import pipeline
# Load the model
ner_pipeline = pipeline(
"ner",
model="DedalusHealthCare/tinybert-ner-demo-de",
tokenizer="DedalusHealthCare/tinybert-ner-demo-de",
aggregation_strategy="max"
)
# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."
# Get predictions
entities = ner_pipeline(text)
print(entities)
Using AutoModel and AutoTokenizer
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "DedalusHealthCare/tinybert-ner-demo-de"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Tokenize text
text = "Der Patient hat Diabetes und Bluthochdruck."
tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**tokens)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get labels
predicted_token_class_ids = predictions.argmax(-1)
labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
Using ONNX Runtime (Optimized Inference)
from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline
import torch
# Load ONNX model for faster inference
model_name = "DedalusHealthCare/tinybert-ner-demo-de"
onnx_model = ORTModelForTokenClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Create pipeline with ONNX model (recommended)
ner_pipeline = pipeline(
"ner",
model=onnx_model,
tokenizer=tokenizer,
aggregation_strategy="max"
)
# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."
entities = ner_pipeline(text)
print(entities)
# Direct model usage
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = onnx_model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_token_class_ids = predictions.argmax(-1)
token_labels = [onnx_model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
Performance Comparison
- PyTorch: Standard format, suitable for training and research
- ONNX: Optimized for inference, typically 2-4x faster than PyTorch
- Recommendation: Use ONNX for production inference, PyTorch for research
Model Architecture
This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition.
Intended Use
This model is intended for:
- Named Entity Recognition in German medical texts
- Identification of DISORDER_FINDING entities
- Medical text processing and analysis
- Research and development in medical NLP
Limitations
- Trained specifically for German medical texts
- Performance may vary on texts from different medical domains
- May not generalize well to non-medical texts
- Requires careful evaluation on new datasets
Ethical Considerations
- This model is trained on medical data and should be used responsibly
- Outputs should be validated by medical professionals
- Patient privacy and data protection regulations must be followed
- The model may have biases present in the training data
Model Performance
This model has been evaluated on the goldset from ner_disorderfinding_de_goldset using IO evaluation (sklearn, token level, lenient) with the following results:
Overall Performance
| Metric | Score |
|---|---|
| Precision (Macro) | 0.423825 |
| Recall (Macro) | 0.467183 |
| F1-Score (Macro) | 0.435170 |
| Precision (Weighted) | 0.599471 |
| Recall (Weighted) | 0.697989 |
| F1-Score (Weighted) | 0.640426 |
Inference Performance: 5.53 seconds for evaluation dataset
Entity-Level Performance (IO Evaluation)
| Entity Type | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| DISORDER_FINDING | 0.753533 | 0.900434 | 0.820460 | N/A |
Evaluation Details
- Dataset: goldset from ner_disorderfinding_de_goldset
- Dataset Source: goldset
- Evaluation Date: 2025-11-03 12:25:56
- Language: de
- Entities: DISORDER_FINDING
This evaluation section is automatically generated and updated.
Citation
If you use this model, please cite:
@model{demo_de_ner_model,
title = {TinyBERT for Demo NER (German)},
author = {DH Healthcare GmbH},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-de}
}
License
This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.
Contact
For questions or support, please contact DH Healthcare GmbH.
- Downloads last month
- 5
Model tree for DedalusHealthCare/tinybert-ner-demo-de
Base model
DedalusHealthCare/tinybert-mlm-de