high-quality Chinese training datasets
Collection
a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or reinforcement learning.
•
12 items
•
Updated
•
8
opencsg/csg-wukong-2b-chinese-fineweb-edu
as base model, we fine-tune it on smoltalk-chinese
for 2 epoch