NFTCID's picture
8

NFTCID

NFTCID
ยท

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with ๐Ÿ‘€ about 12 hours ago
๐— ๐—ถ๐—ป๐—ถ๐— ๐—ฎ๐˜…'๐˜€ ๐—ป๐—ฒ๐˜„ ๐— ๐—ผ๐—˜ ๐—Ÿ๐—Ÿ๐—  ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ฒ๐˜€ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜ ๐—น๐—ฒ๐˜ƒ๐—ฒ๐—น ๐˜„๐—ถ๐˜๐—ต ๐Ÿฐ๐—  ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ ๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—น๐—ฒ๐—ป๐—ด๐˜๐—ต ๐Ÿ’ฅ This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. ๐—ž๐—ฒ๐˜† ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€: ๐Ÿ—๏ธ MoE with novel hybrid attention: โ€ฃ Mixture of Experts with 456B total parameters (45.9B activated per token) โ€ฃ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers ๐Ÿ† Outperforms leading models across benchmarks while offering vastly longer context: โ€ฃ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks โ€ฃ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) ๐Ÿ”ฌ Technical innovations enable efficient scaling: โ€ฃ Novel expert parallel and tensor parallel strategies cut communication overhead in half โ€ฃ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) ๐ŸŽฏ Thorough training strategy: โ€ฃ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! ๐Ÿ“ It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here ๐Ÿ‘‰ https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users ๐Ÿ‘‰ https://huggingface.co/MiniMaxAI/MiniMax-Text-01
liked a Space 3 days ago
akhaliq/anychat
liked a model 14 days ago
ibm-granite/granite-3.1-8b-instruct
View all activity

Organizations

None yet

models

None public yet

datasets

None public yet