YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Punctuator for Simplified Chinese
The model is fine-tuned based on DistilBertForTokenClassification
for adding punctuations to plain text (simplified Chinese). The model is fine-tuned based on distilled model bert-base-chinese
.
Usage
from transformers import DistilBertForTokenClassification, DistilBertTokenizerFast
model = DistilBertForTokenClassification.from_pretrained("Qishuai/distilbert_punctuator_zh")
tokenizer = DistilBertTokenizerFast.from_pretrained("Qishuai/distilbert_punctuator_zh")
Model Overview
Training data
Combination of following three dataset:
- News articles of People's Daily 2014. Reference
Model Performance
- Validation with MSRA training dataset. Reference
- Metrics Report:
precision recall f1-score support C_COMMA 0.67 0.59 0.63 91566 C_DUNHAO 0.50 0.37 0.42 21013 C_EXLAMATIONMARK 0.23 0.06 0.09 399 C_PERIOD 0.84 0.99 0.91 44258 C_QUESTIONMARK 0.00 1.00 0.00 0 micro avg 0.71 0.67 0.69 157236 macro avg 0.45 0.60 0.41 157236 weighted avg 0.69 0.67 0.68 157236
- Downloads last month
- 104
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.