farnazzeidi commited on
Commit
8d8409a
·
verified ·
1 Parent(s): 6950fee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -29
README.md CHANGED
@@ -7,32 +7,22 @@ base_model:
7
  pipeline_tag: token-classification
8
  ---
9
 
10
- # NER Model for Legal Texts
11
 
12
- Released in January 2024, this is a Turkish BERT language model pretrained from scratch on an **optimized BERT architecture** using a 2 GB Turkish legal corpus. The corpus was sourced from legal-related thesis documents available in the Higher Education Board National Thesis Center (YÖKTEZ). The model has been fine-tuned for Named Entity Recognition (NER) tasks on human-annotated datasets provided by **NewMind**, a legal tech company in Istanbul, Turkey.
 
13
 
14
- In our paper, we outline the steps taken to train this model and demonstrate its superior performance compared to previous approaches.
15
 
16
  ---
17
 
18
  ## Overview
19
- - **Preprint Paper**: [https://arxiv.org/abs/2407.00648](https://arxiv.org/abs/2407.00648)
20
- - **Architecture**: Optimized BERT Base
21
- - **Language**: Turkish
22
- - **Supported Labels**:
23
- - `Person`
24
- - `Law`
25
- - `Publication`
26
- - `Government`
27
- - `Corporation`
28
- - `Other`
29
- - `Project`
30
- - `Money`
31
- - `Date`
32
- - `Location`
33
- - `Court`
34
-
35
- **Model Name**: LegalLTurk Optimized BERT
36
 
37
  ---
38
 
@@ -43,10 +33,10 @@ In our paper, we outline the steps taken to train this model and demonstrate its
43
  from transformers import pipeline
44
 
45
  # Load the pipeline
46
- model = pipeline("ner", model="farnazzeidi/ner-legalturk-bert-model", aggregation_strategy='simple')
47
 
48
  # Input text
49
- text = "Burada, Tebligat Kanunu ile VUK düzenlemesi ayrımına dikkat etmek gerekir."
50
 
51
  # Get predictions
52
  predictions = model(text)
@@ -61,10 +51,10 @@ import torch
61
 
62
  # Load model and tokenizer
63
 
64
- tokenizer = AutoTokenizer.from_pretrained("farnazzeidi/ner-legalturk-bert-model")
65
- model = AutoModelForTokenClassification.from_pretrained("farnazzeidi/ner-legalturk-bert-model")
66
 
67
- text = "Burada, Tebligat Kanunu ile VUK düzenlemesi ayrımına dikkat etmek gerekir."
68
  inputs = tokenizer(text, return_tensors="pt")
69
  outputs = model(**inputs)
70
 
@@ -82,12 +72,11 @@ print(predictions)
82
  ```
83
  ---
84
  # Authors
85
- Farnaz Zeidi, Mehmet Fatih Amasyali, Çigdem Erol
 
86
 
87
  ---
88
 
89
  ## License
90
- This model is shared under the [CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en).
91
- You are free to use, share, and adapt the model for non-commercial purposes, provided that you give appropriate credit to the authors.
92
 
93
- For commercial use, please contact [zeidi.uni@gmail.com].
 
7
  pipeline_tag: token-classification
8
  ---
9
 
10
+ # MEDNER.DE: Medicinal Product Entity Recognition in German-Specific Contexts
11
 
12
+ Released in December 2024, this is a German BERT language model further pretrained on `deepset/gbert-base` using a pharmacovigilance-related Case Summary Corpus (GS-Corpus).** The model has been fine-tuned for Named Entity Recognition (NER) tasks on an automatically annotated dataset to recognize medicinal products such as medications and vaccines.
13
+ In our paper, we outline the steps taken to train this model and demonstrate its superior performance compared to previous approaches
14
 
 
15
 
16
  ---
17
 
18
  ## Overview
19
+ - **Paper**: [https://...
20
+ - **Architecture**: MLM_based BERT Base
21
+ - **Language**: German
22
+ - **Supported Labels**: Medicinal Product
23
+
24
+
25
+ **Model Name**: MEDNER.DE
 
 
 
 
 
 
 
 
 
 
26
 
27
  ---
28
 
 
33
  from transformers import pipeline
34
 
35
  # Load the pipeline
36
+ model = pipeline("ner", model="pei-germany/MEDNER-de-fp-gbert", aggregation_strategy='simple')
37
 
38
  # Input text
39
+ text="Der Patient bekam den COVID-Impfstoff und nahm danach Aspirin."
40
 
41
  # Get predictions
42
  predictions = model(text)
 
51
 
52
  # Load model and tokenizer
53
 
54
+ tokenizer = AutoTokenizer.from_pretrained("pei-germany/MEDNER-de-fp-gbert")
55
+ model = AutoModelForTokenClassification.from_pretrained("pei-germany/MEDNER-de-fp-gbert")
56
 
57
+ text="Der Patient bekam den COVID-Impfstoff und nahm danach Aspirin."
58
  inputs = tokenizer(text, return_tensors="pt")
59
  outputs = model(**inputs)
60
 
 
72
  ```
73
  ---
74
  # Authors
75
+ ...
76
+
77
 
78
  ---
79
 
80
  ## License
81
+ This model is shared under the [GNU Affero General Public License v3.0 License](https://choosealicense.com/licenses/agpl-3.0/).
 
82