![](https://cdn-avatars.huggingface.co/v1/production/uploads/625e62452a7279d3c77b5c38/OaFXtjB10qAfoGQrhAzeU.jpeg)
yulan-team/YuLan-Mini
Text Generation
•
Updated
•
789
•
35
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
Note A highly capable 2.4B lightweight LLM using only 1T pre-training data.
Note The model & optimizer states of the last curriculum phase before learning rate annealing.
Note The model & optimizer states of the 20th curriculum phase.