HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 8 days ago • 17 • 8
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 8 days ago • 17
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 8 days ago • 17 • 8
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 8 days ago • 17
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 8 days ago • 17 • 8
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models Paper • 2502.13059 • Published 24 days ago
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 21 days ago • 13
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 21 days ago • 13
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 21 days ago • 13 • 2
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 26
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 26