--- language: - zh license: apache-2.0 tags: - classification inference: false --- # IDEA-CCNL/Erlangshen-TCBert-330M-Sentence-Embedding-Chinese - Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM) - Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/) ## 简介 Brief Introduction 330M参数的句子表征Topic Classification BERT (TCBert)。 The TCBert with 330M parameters is pre-trained for sentence representation for Chinese topic classification tasks. ## 模型分类 Model Taxonomy | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra | | :----: | :----: | :----: | :----: | :----: | :----: | | 通用 General | 句子表征 | 二郎神 Erlangshen | TCBert (sentence representation) | 330M | Chinese | ## 模型信息 Model Information 为了提高模型在话题分类上句子表征效果,我们收集了大量话题分类数据进行基于prompts的对比学习预训练。 To improve the model performance on sentence representation for the topic classification task, we collected numerous topic classification datasets for contrastive pre-training based on general prompts. ### 下游效果 Performance Stay tuned. ## 使用 Usage ```python from transformers import BertForMaskedLM, BertTokenizer import torch tokenizer=BertTokenizer.from_pretrained("IDEA-CCNL/Erlangshen-TCBert-330M-Sentence-Embedding-Chinese") model=BertForMaskedLM.from_pretrained("IDEA-CCNL/Erlangshen-TCBert-330M-Sentence-Embedding-Chinese") ``` Stay tuned for more details on usage for sentence representation. 如果您在您的工作中使用了我们的模型,可以引用我们的[网站](https://github.com/IDEA-CCNL/Fengshenbang-LM/): You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/): ```text @misc{Fengshenbang-LM, title={Fengshenbang-LM}, author={IDEA-CCNL}, year={2021}, howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}}, } ```