Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update 8 days ago
Post
2451
๐—›๐˜‚๐—ป๐˜†๐˜‚๐—ฎ๐—ป-๐—Ÿ๐—ฎ๐—ฟ๐—ด๐—ฒ ๐—ท๐˜‚๐˜€๐˜ ๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—ฏ๐˜† ๐—ง๐—ฒ๐—ป๐—ฐ๐—ฒ๐—ป๐˜: ๐—Ÿ๐—ฎ๐—ฟ๐—ด๐—ฒ๐˜€๐˜ ๐—ฒ๐˜ƒ๐—ฒ๐—ฟ ๐—ผ๐—ฝ๐—ฒ๐—ป ๐— ๐—ผ๐—˜ ๐—Ÿ๐—Ÿ๐— , ๐—ผ๐—ป๐—น๐˜† ๐Ÿฑ๐Ÿฎ๐—• ๐—ฎ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜๐—ฒ๐—ฟ๐˜€ ๐—ฏ๐˜‚๐˜ ๐—ฏ๐—ฒ๐—ฎ๐˜๐˜€ ๐—Ÿ๐—Ÿ๐—ฎ๐— ๐—” ๐Ÿฏ.๐Ÿญ-๐Ÿฐ๐Ÿฌ๐Ÿฑ๐—• ๐—ผ๐—ป ๐—บ๐—ผ๐˜€๐˜ ๐—ฎ๐—ฐ๐—ฎ๐—ฑ๐—ฒ๐—บ๐—ถ๐—ฐ ๐—ฏ๐—ฒ๐—ป๐—ฐ๐—ต๐—บ๐—ฎ๐—ฟ๐—ธ๐˜€ ๐Ÿš€

โšก Mixture of Experts (MoE) architecture: 389 B parameters in total, but only 52B are activated for any input

๐Ÿงช Trained on 7T tokens, including 1.5T tokens of synthetic data

๐Ÿ—๏ธ Architecture : Novel "recycle routing" prevents token dropping when experts are overrloaded

๐Ÿ“Š Great benchmark results: Surpasses Llama-3-405B-Instruct in most benchmarks although it has 8x fewer active parameters
โ€ฃ Impressive perf on MATH: 77.4

๐Ÿ‹ย Large context length: up to 256K tokens

๐Ÿ”’ License:
โ€ฃ Commercial use allowed, except if your products have >100M monthly active users
โ€ฃ No access in the EU

๐Ÿค—ย Model weights available on HF!

Read the full paper here ๐Ÿ‘‰ย  Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent (2411.02265)
In this post