Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs Paper • 2407.12117 • Published Jul 16, 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling Paper • 2405.14578 • Published May 23, 2024 • 1
HMoE: Heterogeneous Mixture of Experts for Language Modeling Paper • 2408.10681 • Published Aug 20, 2024 • 8
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published Nov 4, 2024 • 24
3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds Paper • 1707.06783 • Published Jul 21, 2017
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling Paper • 2405.14578 • Published May 23, 2024 • 1