metadata

language:
  - zh
inference:
  parameters:
    max_new_tokens: 250
    repetition_penalty: 1.1
    top_p: 0.9
    do_sample: true
license: apache-2.0

Wenzhong2.0-GPT2-3.5B model (chinese)，one model of Fengshenbang-LM.

As we all know, the single direction language model based on decoder structure has strong generation ability, such as GPT model. The 3.5 billion parameter Wenzhong-GPT2-3.5B large model, using 100G chinese common data, 32 A100 training for 28 hours, is the largest open source GPT2 large model of chinese. Our model performs well in Chinese continuation generation. Wenzhong2.0-GPT2-3.5B-Chinese is a Chinese gpt2 model trained with cleaner data on the basis of Wenzhong-GPT2-3.5B.

Usage

load model

from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese')
model = GPT2Model.from_pretrained('IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

generation

from transformers import pipeline, set_seed
set_seed(55)
generator = pipeline('text-generation', model='IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese')
generator("北京位于", max_length=30, num_return_sequences=1)

Citation

If you find the resource is useful, please cite the following website in your paper.

@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2021},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}