Haoning Wu, Teo PRO

teowu

https://teowu.github.io

AI & ML interests

Lead of Q-Future: https://github.com/Q-Future. I love MLLMs/LMMs/LVLMs/(any names you call them). Part of two great MoE VLMs as core contributors: Kimi-VL & Aria. Living and Cooking in Singapore Now.

Recent Activity

liked a model 12 days ago

moonshotai/Kimi-K2-Instruct-0905

liked a dataset about 1 month ago

lmarena-ai/VisionArena-Chat

new activity 3 months ago

moonshotai/Kimi-VL-A3B-Thinking-2506:Updates Transformers Inference code in README.md

View all activity

Organizations

liked a model 12 days ago

moonshotai/Kimi-K2-Instruct-0905

Text Generation • Updated 7 days ago • 22.1k • • 488

liked a dataset about 1 month ago

lmarena-ai/VisionArena-Chat

Viewer • Updated Feb 4 • 199k • 5.04k • 8

New activity in moonshotai/Kimi-VL-A3B-Thinking-2506 3 months ago

Updates Transformers Inference code in README.md

#8 opened 3 months ago by

tobiashaab

upvoted a paper 3 months ago

Generative Frame Sampler for Long Video Understanding

Paper • 2503.09146 • Published Mar 12 • 1

reacted to fdaudens's post with 👍🔥 3 months ago

Post

2603

You might not have heard of Moonshot AI — but within 24 hours, their new model Kimi K2 shot to the top of Hugging Face’s trending leaderboard.

So… who are they, and why does it matter?

Had a lot of fun co-writing this blog post with @xianbao , with key insights translated from Chinese, to unpack how this startup built a model that outperforms GPT-4.1, Claude Opus, and DeepSeek V3 on several major benchmarks.

🧵 A few standout facts:

1. From zero to $3.3B in 18 months:
Founded in March 2023, Moonshot is now backed by Alibaba, Tencent, Meituan, and HongShan.

2. A CEO who thinks from the end:
Yang Zhilin (31) previously worked at Meta AI, Google Brain, and Carnegie Mellon. His vision? Nothing less than AGI — still a rare ambition among Chinese AI labs.

3. A trillion-parameter model that’s surprisingly efficient:
Kimi K2 uses a mixture-of-experts architecture (32B active params per inference) and dominates on coding/math benchmarks.

4. The secret weapon: Muon optimizer:
A new training method that doubles efficiency, cuts memory in half, and ran 15.5T tokens with zero failures. Big implications.

Most importantly, their move from closed to open source signals a broader shift in China’s AI scene — following Baidu’s pivot. But as Yang puts it: “Users are the only real leaderboard.”

👇 Check out the full post to explore what Kimi K2 can do, how to try it, and why it matters for the future of open-source LLMs:
https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained