iioSnail commited on
Commit
3aa9e7d
·
1 Parent(s): 6d8d8e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -1,3 +1,51 @@
1
  ---
2
  license: afl-3.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: afl-3.0
3
  ---
4
+
5
+ # 医疗领域中文命名实体识别
6
+
7
+ 项目地址:https://github.com/iioSnail/chinese_medical_ner
8
+
9
+ 使用方法:
10
+
11
+ ```
12
+ from transformers import AutoModelForTokenClassification, BertTokenizerFast
13
+
14
+ tokenizer = BertTokenizerFast.from_pretrained('iioSnail/bert-base-chinese-medical-ner')
15
+ model = AutoModelForTokenClassification.from_pretrained("iioSnail/bert-base-chinese-medical-ner")
16
+
17
+ sentences = ["瘦脸针、水光针和玻尿酸详解!", "半月板钙化的病因有哪些?"]
18
+ inputs = tokenizer(sentences, return_tensors="pt", padding=True, add_special_tokens=False)
19
+ outputs = model(**inputs)
20
+ outputs = outputs.logits.argmax(-1) * inputs['attention_mask']
21
+
22
+ print(outputs)
23
+ ```
24
+
25
+ 输出结果:
26
+
27
+ ```
28
+ tensor([[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 4, 4],
29
+ [1, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 4, 0, 0]])
30
+ ```
31
+
32
+ 其中 `1=B, 2=I, 3=E, 4=O`。`1, 3`表示一个二字医疗实体,`1,2,3`表示一个3字医疗实体, `1,2,2,3`表示一个4字医疗实体,依次类推。
33
+
34
+ 可以使用项目中的`MedicalNerModel.format_outputs(sentences, outputs)`来将输出进行转换。
35
+
36
+ 效果如下:
37
+
38
+ ```
39
+ [
40
+ [
41
+ {'start': 0, 'end': 3, 'word': '瘦脸针'},
42
+ {'start': 4, 'end': 7, 'word': '水光针'},
43
+ {'start': 8, 'end': 11, 'word': '玻尿酸'}、
44
+ ],
45
+ [
46
+ {'start': 0, 'end': 5, 'word': '半月板钙化'}
47
+ ]
48
+ ]
49
+ ```
50
+
51
+ 更多信息请参考项目:https://github.com/iioSnail/chinese_medical_ner