qixun
/

bert-chinese-poem

Inference Endpoints

Model card Files Files and versions Community

qixun commited on Jun 24, 2024

Commit

b09569b

·

verified ·

1 Parent(s): e12e88c

Update README.md

Files changed (1) hide show

README.md +42 -3

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: gpl-3.0
----

+---
+license: gpl-3.0
+widget:
+  - text: "宵凉百念集孤[MASK]，暗雨鸣廊睡未能。生计坐怜秋一叶，归程冥想浪千层。寒心国事浑难料，堆眼官资信可憎。此去梦中应不忘，顺承门内近觚棱。"
+---
+适用于中国古典诗歌的bert模型，在搜韵开源的语料上以16的batch_size训练了110万步左右，loss稳定低于1。
+使用方法如下：
+```python
+from transformers import BertTokenizer, BertForMaskedLM
+import torch
+# 加载分词器
+tokenizer = BertTokenizer.from_pretrained("qixun/bert-chinese-poem")
+# 加载模型
+model = BertForMaskedLM.from_pretrained("qixun/bert-chinese-poem")
+# 输入文本
+text = "宵凉百念集孤[MASK]，暗雨鸣廊睡未能。生计坐怜秋一叶，归程冥想浪千层。寒心国事浑难料，堆眼官资信可憎。此去梦中应不忘，顺承门内近觚棱。"
+# 分词
+inputs = tokenizer(text, return_tensors="pt")
+# 模型推理
+with torch.no_grad():
+    outputs = model(**inputs)
+# 获取[MASK]标记的位置
+mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
+# 获取预测的token_id
+predicted_token_id = outputs.logits[0, mask_token_index].argmax(axis=-1).item()
+# 获取预测的词
+predicted_token = tokenizer.decode([predicted_token_id])
+print(f"预测的词是：{predicted_token}")
+```