Papers
arxiv:2501.08313

MiniMax-01: Scaling Foundation Models with Lightning Attention

Published on Jan 14
· Submitted by Ryan1122 on Jan 15
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token. We develop an optimized parallel strategy and highly efficient computation-communication overlap techniques for MoE and lightning attention. This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and extrapolate to 4 million tokens during inference at an affordable cost. Our vision-language model, MiniMax-VL-01 is built through continued training with 512 billion vision-language tokens. Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window. We publicly release MiniMax-01 at https://github.com/MiniMax-AI.

Community

Paper submitter

A technical report from MiniMax. The authors are listed in alphabetical order. The model is open-sourced at https://github.com/MiniMax-AI.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

This comment has been hidden

We made a deep dive video for this paper: https://www.youtube.com/watch?v=eh7oDAxUoPg. Happy learning 🤓 and stretching 💪 together!

Oh, and btw, we tried using Minimax for this paper deep dive, but it kept hanging on us 😅… (maybe our long text + long PDF combo was just too much? shouldn't be though…or maybe Minimax just doesn’t like deep diving itself?! 🤔) That said, their PDF-on-the-side feature is super sweet 🍭 for paper reading and live QA! 📝

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.08313 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 10