File size: 4,200 Bytes
c599cce
 
 
 
 
 
 
6950fee
 
8d8409a
6950fee
1212cb0
8d8409a
6950fee
 
 
 
 
8d8409a
 
 
 
 
 
 
6950fee
 
 
 
 
 
 
 
 
5d1c512
 
6950fee
 
5d1c512
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6950fee
 
 
 
 
 
 
 
 
 
8d8409a
 
6950fee
3098a9a
 
 
6950fee
 
 
3098a9a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6950fee
 
 
0e41a35
8d8409a
6950fee
 
 
 
8d8409a
6950fee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
license: agpl-3.0
language:
- de
base_model:
- deepset/gbert-base
pipeline_tag: token-classification
---

# MEDNER.DE: Medicinal Product Entity Recognition in German-Specific Contexts

Released in December 2024, this is a German BERT language model further pretrained on `deepset/gbert-base` using a pharmacovigilance-related case summary corpus. The model has been fine-tuned for Named Entity Recognition (NER) tasks on an automatically annotated dataset to recognize medicinal products such as medications and vaccines.  
In our paper, we outline the steps taken to train this model and demonstrate its superior performance compared to previous approaches


---

## Overview
- **Paper**: [https://...
- **Architecture**: MLM_based BERT Base
- **Language**: German
- **Supported Labels**: Medicinal Product


**Model Name**: MEDNER.DE

---

## How to Use

### Use a pipeline as a high-level helper
```python
from transformers import pipeline

# Load the NER pipeline
model = pipeline("ner", model="pei-germany/MEDNER-de-fp-gbert", aggregation_strategy="none")

# Input text
text = "Der Patient wurde mit AstraZeneca geimpft und nahm anschließend Ibuprofen, um das Fieber zu senken."

# Get raw predictions and merge subwords
merged_predictions = []
current = None

for pred in model(text):
    if pred['word'].startswith("##"):
        if current:
            current['word'] += pred['word'][2:]
            current['end'] = pred['end']
            current['score'] = (current['score'] + pred['score']) / 2
    else:
        if current:
            merged_predictions.append(current)
        current = pred.copy()

if current:
    merged_predictions.append(current)

# Filter by confidence threshold and print
threshold = 0.5
filtered_predictions = [p for p in merged_predictions if p['score'] >= threshold]
for p in filtered_predictions:
    print(f"Entity: {p['entity']}, Word: {p['word']}, Score: {p['score']:.2f}, Start: {p['start']}, End: {p['end']}")

```


### Load model directly
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("pei-germany/MEDNER-de-fp-gbert")
model = AutoModelForTokenClassification.from_pretrained("pei-germany/MEDNER-de-fp-gbert")

text = "Der Patient wurde mit AstraZeneca geimpft und nahm anschließend Ibuprofen, um das Fieber zu senken."

# Tokenize and get predictions
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# Decode tokens and predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predictions = torch.argmax(outputs.logits, dim=2)[0].tolist()
labels = [model.config.id2label[pred] for pred in predictions]

# Process and merge subwords
entities = []
current_word = ""
current_entity = None

for token, label in zip(tokens, labels):
    token = token.replace("##", "")  # Remove subword markers

    if label.startswith("B-"):  # Beginning of a new entity
        if current_entity and current_entity == label[2:]:  # Merge consecutive B- labels
            current_word += token
        else:  # Save the previous entity and start a new one
            if current_word:
                entities.append({"entity": current_entity, "word": current_word})
            current_word = token
            current_entity = label[2:]
    elif label.startswith("I-") and current_entity == label[2:]:  # Continuation of the same entity
        current_word += token
    else:  # Outside any entity
        if current_word:  # Save the previous entity
            entities.append({"entity": current_entity, "word": current_word})
        current_word = ""
        current_entity = None

if current_word:  # Append the last entity
    entities.append({"entity": current_entity, "word": current_word})

# Print results
for entity in entities:
    print(f"Entity: {entity['entity']}, Word: {entity['word']}")

```
---
# Authors
Farnaz Zeidi, Manuela Messelhäußer, Roman Christof, Xing David Wang, Ulf Leser, Dirk Mentzer, Renate König, Liam Childs. 


---

## License
This model is shared under the [GNU Affero General Public License v3.0 License](https://choosealicense.com/licenses/agpl-3.0/).