arxiv:2501.02423
Shuaipeng Li
unlimblue
AI & ML interests
None yet
Recent Activity
authored
a paper
1 day ago
Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
authored
a paper
1 day ago
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
authored
a paper
1 day ago
HMoE: Heterogeneous Mixture of Experts for Language Modeling
Organizations
models
None public yet
datasets
None public yet