Thomas Wolf's picture

Thomas Wolf PRO

thomwolf

·

https://thomwolf.io

AI & ML interests

NLP and open-source :-)

Recent Activity

authored a paper 2 days ago

Towards Best Practices for Open Datasets for LLM Training

upvoted an article 3 days ago

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

upvoted an article 3 days ago

Diving into MiniMax01 405B MoE

View all activity

Articles

Introducing smolagents: simple agents that write actions in code.

FineWeb2-C: Help Build Better Language Models in Your Language

LeMaterial: an open source initiative to accelerate materials discovery and research

FineVideo: behind the scenes

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

A failed experiment: Infini-Attention, and why we should keep trying?

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Constitutional AI with Open LLMs

Open LLM Leaderboard: DROP deep dive

What's going on with the Open LLM Leaderboard?

Can foundation models label data like humans?

Organizations

thomwolf's activity

authored a paper 2 days ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 4 days ago • 38

upvoted 2 articles 3 days ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

By

•

3 days ago

• 32

Article

Diving into MiniMax01 405B MoE

By

•

3 days ago

• 15

liked a model 4 days ago

NovaSky-AI/Sky-T1-32B-Preview

Text Generation • Updated 5 days ago • 4.89k • 447

liked a Space 4 days ago

2024 AI Timeline

liked 2 models 4 days ago

microsoft/phi-4

Text Generation • Updated 9 days ago • 100k • 1.41k

hexgrad/Kokoro-82M

Text-to-Speech • Updated about 3 hours ago • 20.6k • 1.87k

liked a model 9 days ago

refuelai/Qwen-2-Refueled

Updated 9 days ago • 10 • 3

upvoted a paper 11 days ago

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Paper • 2406.11896 • Published Jun 14, 2024 • 19

updated a Space 12 days ago

README

reacted to lewtun's post with 🔥 12 days ago

Post

3267

I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co/blog/ganqu/prime

liked a model 13 days ago

deepseek-ai/DeepSeek-V3

Updated 19 days ago • 142k • 2k

upvoted an article 13 days ago

Article

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

By

•

16 days ago

• 37

upvoted a collection 13 days ago

Phi-3

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated 10 days ago • 547

liked a Space 13 days ago

AI Phone Leaderboard

AI Phone Leaderboard

liked a model 13 days ago

matteogeniaccio/phi-4

Updated 8 days ago • 38.1k • 187

upvoted a paper 13 days ago

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 103

upvoted a paper 14 days ago

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 59

updated a Space 15 days ago

Discussion Forum

liked a model 16 days ago

Qwen/Qwen2.5-3B-Instruct

Text Generation • Updated Sep 25, 2024 • 580k • 146