iioSnail
/

bert-base-chinese-medical-ner

Token Classification

Inference Endpoints

Model card Files Files and versions Community

iioSnail commited on Jun 27, 2023

Commit

3aa9e7d

·

1 Parent(s): 6d8d8e1

Update README.md

Files changed (1) hide show

README.md +48 -0

README.md CHANGED Viewed

@@ -1,3 +1,51 @@
 ---
 license: afl-3.0
 ---

 ---
 license: afl-3.0
 ---
+# 医疗领域中文命名实体识别
+项目地址：https://github.com/iioSnail/chinese_medical_ner
+使用方法：
+```
+from transformers import AutoModelForTokenClassification, BertTokenizerFast
+tokenizer = BertTokenizerFast.from_pretrained('iioSnail/bert-base-chinese-medical-ner')
+model = AutoModelForTokenClassification.from_pretrained("iioSnail/bert-base-chinese-medical-ner")
+sentences = ["瘦脸针、水光针和玻尿酸详解！", "半月板钙化的病因有哪些？"]
+inputs = tokenizer(sentences, return_tensors="pt", padding=True, add_special_tokens=False)
+outputs = model(**inputs)
+outputs = outputs.logits.argmax(-1) * inputs['attention_mask']
+print(outputs)
+```
+输出结果：
+```
+tensor([[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 4, 4],
+        [1, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 4, 0, 0]])
+```
+其中 `1=B, 2=I, 3=E, 4=O`。`1, 3`表示一个二字医疗实体，`1,2,3`表示一个3字医疗实体, `1,2,2,3`表示一个4字医疗实体，依次类推。
+可以使用项目中的`MedicalNerModel.format_outputs(sentences, outputs)`来将输出进行转换。
+效果如下：
+```
+[
+  [
+    {'start': 0, 'end': 3, 'word': '瘦脸针'},
+    {'start': 4, 'end': 7, 'word': '水光针'},
+    {'start': 8, 'end': 11, 'word': '玻尿酸'}、
+  ],
+  [
+    {'start': 0, 'end': 5, 'word': '半月板钙化'}
+  ]
+]
+```
+更多信息请参考项目：https://github.com/iioSnail/chinese_medical_ner