适用于中国古典诗歌的bert模型,在搜韵开源的语料上以16的batch_size训练了110万步左右,loss稳定低于1。
使用方法如下:
from transformers import BertTokenizer, BertForMaskedLM
import torch
# 加载分词器
tokenizer = BertTokenizer.from_pretrained("qixun/bert-chinese-poem")
# 加载模型
model = BertForMaskedLM.from_pretrained("qixun/bert-chinese-poem")
# 输入文本
text = "宵凉百念集孤[MASK],暗雨鸣廊睡未能。生计坐怜秋一叶,归程冥想浪千层。寒心国事浑难料,堆眼官资信可憎。此去梦中应不忘,顺承门内近觚棱。"
# 分词
inputs = tokenizer(text, return_tensors="pt")
# 模型推理
with torch.no_grad():
outputs = model(**inputs)
# 获取[MASK]标记的位置
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
# 获取预测的token_id
predicted_token_id = outputs.logits[0, mask_token_index].argmax(axis=-1).item()
# 获取预测的词
predicted_token = tokenizer.decode([predicted_token_id])
print(f"预测的词是:{predicted_token}")
- Downloads last month
- 147
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.