Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,51 @@
|
|
1 |
---
|
2 |
license: afl-3.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: afl-3.0
|
3 |
---
|
4 |
+
|
5 |
+
# 医疗领域中文命名实体识别
|
6 |
+
|
7 |
+
项目地址:https://github.com/iioSnail/chinese_medical_ner
|
8 |
+
|
9 |
+
使用方法:
|
10 |
+
|
11 |
+
```
|
12 |
+
from transformers import AutoModelForTokenClassification, BertTokenizerFast
|
13 |
+
|
14 |
+
tokenizer = BertTokenizerFast.from_pretrained('iioSnail/bert-base-chinese-medical-ner')
|
15 |
+
model = AutoModelForTokenClassification.from_pretrained("iioSnail/bert-base-chinese-medical-ner")
|
16 |
+
|
17 |
+
sentences = ["瘦脸针、水光针和玻尿酸详解!", "半月板钙化的病因有哪些?"]
|
18 |
+
inputs = tokenizer(sentences, return_tensors="pt", padding=True, add_special_tokens=False)
|
19 |
+
outputs = model(**inputs)
|
20 |
+
outputs = outputs.logits.argmax(-1) * inputs['attention_mask']
|
21 |
+
|
22 |
+
print(outputs)
|
23 |
+
```
|
24 |
+
|
25 |
+
输出结果:
|
26 |
+
|
27 |
+
```
|
28 |
+
tensor([[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 4, 4],
|
29 |
+
[1, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 4, 0, 0]])
|
30 |
+
```
|
31 |
+
|
32 |
+
其中 `1=B, 2=I, 3=E, 4=O`。`1, 3`表示一个二字医疗实体,`1,2,3`表示一个3字医疗实体, `1,2,2,3`表示一个4字医疗实体,依次类推。
|
33 |
+
|
34 |
+
可以使用项目中的`MedicalNerModel.format_outputs(sentences, outputs)`来将输出进行转换。
|
35 |
+
|
36 |
+
效果如下:
|
37 |
+
|
38 |
+
```
|
39 |
+
[
|
40 |
+
[
|
41 |
+
{'start': 0, 'end': 3, 'word': '瘦脸针'},
|
42 |
+
{'start': 4, 'end': 7, 'word': '水光针'},
|
43 |
+
{'start': 8, 'end': 11, 'word': '玻尿酸'}、
|
44 |
+
],
|
45 |
+
[
|
46 |
+
{'start': 0, 'end': 5, 'word': '半月板钙化'}
|
47 |
+
]
|
48 |
+
]
|
49 |
+
```
|
50 |
+
|
51 |
+
更多信息请参考项目:https://github.com/iioSnail/chinese_medical_ner
|