p208p2002
/

llama-chinese-81M

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama-chinese-81M / README.md

Ubuntu

init

96980b7 about 1 year ago

|

304 Bytes

Baby LLaMA Chinese 81M

一個小型中文預訓練語言模型。

Training Dataset

中文維基百科(20230601)
英文維基百科(20230601)

Tokenizer

使用在中英文維基百科上訓練的 BPE Tokenizer，詞表大小為32k。

https://github.com/p208p2002/BPE-tokenizer-from-zh-wiki