Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
9
NFTCID
NFTCID
Follow
0 followers
ยท
13 following
AI & ML interests
None yet
Recent Activity
liked
a model
3 days ago
deepseek-ai/DeepSeek-R1
reacted
to
m-ric
's
post
with ๐
13 days ago
๐ ๐ถ๐ป๐ถ๐ ๐ฎ๐ '๐ ๐ป๐ฒ๐ ๐ ๐ผ๐ ๐๐๐ ๐ฟ๐ฒ๐ฎ๐ฐ๐ต๐ฒ๐ ๐๐น๐ฎ๐๐ฑ๐ฒ-๐ฆ๐ผ๐ป๐ป๐ฒ๐ ๐น๐ฒ๐๐ฒ๐น ๐๐ถ๐๐ต ๐ฐ๐ ๐๐ผ๐ธ๐ฒ๐ป๐ ๐ฐ๐ผ๐ป๐๐ฒ๐ ๐ ๐น๐ฒ๐ป๐ด๐๐ต ๐ฅ This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. ๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐: ๐๏ธ MoE with novel hybrid attention: โฃ Mixture of Experts with 456B total parameters (45.9B activated per token) โฃ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers ๐ Outperforms leading models across benchmarks while offering vastly longer context: โฃ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks โฃ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) ๐ฌ Technical innovations enable efficient scaling: โฃ Novel expert parallel and tensor parallel strategies cut communication overhead in half โฃ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) ๐ฏ Thorough training strategy: โฃ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! ๐ It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here ๐ https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users ๐ https://huggingface.co/MiniMaxAI/MiniMax-Text-01
liked
a Space
15 days ago
akhaliq/anychat
View all activity
Organizations
None yet
NFTCID
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
3 days ago
deepseek-ai/DeepSeek-R1
Text Generation
โข
Updated
5 days ago
โข
498k
โข
5.37k
liked
a Space
15 days ago
Running
on
CPU Upgrade
1.65k
๐ข
Anychat
liked
2 models
27 days ago
ibm-granite/granite-3.1-8b-instruct
Text Generation
โข
Updated
about 2 hours ago
โข
72.1k
โข
131
PowerInfer/SmallThinker-3B-Preview
Text Generation
โข
Updated
15 days ago
โข
110k
โข
377
liked
a dataset
27 days ago
agibot-world/AgiBotWorld-Alpha
Viewer
โข
Updated
11 days ago
โข
19.7M
โข
22.6k
โข
165
liked
a model
3 months ago
genmo/mochi-1-preview
Text-to-Video
โข
Updated
Dec 18, 2024
โข
40.9k
โข
1.16k
liked
a model
6 months ago
black-forest-labs/FLUX.1-schnell
Text-to-Image
โข
Updated
Aug 16, 2024
โข
795k
โข
3.3k
liked
a model
about 1 year ago
ibm-research/re2g-reranker-trex
Text Classification
โข
Updated
May 16, 2023
โข
1.64k
โข
7
liked
a dataset
about 1 year ago
Yelp/yelp_review_full
Viewer
โข
Updated
Jan 4, 2024
โข
700k
โข
38.1k
โข
110