metadata

language:
  - zh
license: apache-2.0
tags:
  - classification
inference: false

IDEA-CCNL/Erlangshen-TCBert-330M-Sentence-Embedding-Chinese

Github: Fengshenbang-LM
Docs: Fengshenbang-Docs

简介 Brief Introduction

330M参数的句子表征Topic Classification BERT (TCBert)。

The TCBert with 330M parameters is pre-trained for sentence representation for Chinese topic classification tasks.

模型分类 Model Taxonomy

需求 Demand	任务 Task	系列 Series	模型 Model	参数 Parameter	额外 Extra
通用 General	句子表征	二郎神 Erlangshen	TCBert (sentence representation)	330M	Chinese

模型信息 Model Information

为了提高模型在话题分类上句子表征效果，我们收集了大量话题分类数据进行基于prompts的对比学习预训练。

To improve the model performance on sentence representation for the topic classification task, we collected numerous topic classification datasets for contrastive pre-training based on general prompts.

下游效果 Performance

Stay tuned.

使用 Usage

from transformers import BertForMaskedLM, BertTokenizer
import torch
tokenizer=BertTokenizer.from_pretrained("IDEA-CCNL/Erlangshen-TCBert-330M-Sentence-Embedding-Chinese")
model=BertForMaskedLM.from_pretrained("IDEA-CCNL/Erlangshen-TCBert-330M-Sentence-Embedding-Chinese")

Stay tuned for more details on usage for sentence representation.

如果您在您的工作中使用了我们的模型，可以引用我们的网站:

You can also cite our website:

@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2021},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}