AI & ML interests
None defined yet.
Recent Activity
View all activity
boroll2347
posted an
update 13 days ago
mrfakename
posted an
update 3 months ago
Post
16395
Excited to share that I've joined the Hugging Face Fellows program! 🤗
Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀
Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀
Post
1123
The #1 trending AI/ML dataset today 🏆
Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles
Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles
Post
745
The new King 👑has arrived!
Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking
Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking
Post
2824
💸🤑You don’t need 100 GPUs to train something amazing!
Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!
Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook
Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!
Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook
Post
4086
🤖 Did you know your voice might be cloned without your consent from just *one sentence* of audio?
That's not great. So with @frimelle , we brainstormed a new idea for developers who want to curb malicious use: ✨The Voice Consent Gate.✨
Details, code, here: https://huggingface.co/blog/voice-consent-gate
That's not great. So with @frimelle , we brainstormed a new idea for developers who want to curb malicious use: ✨The Voice Consent Gate.✨
Details, code, here: https://huggingface.co/blog/voice-consent-gate
mrfakename
posted an
update 4 months ago
Post
6275
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo
(Turn 🔊 on to hear audio samples)
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo
(Turn 🔊 on to hear audio samples)
Post
1230
Tokenization is one of the most important processes in AI - yet many would like to kill it 💀
What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.
➡️ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!
Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.
🤨 Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.
Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.
But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers
What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.
➡️ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!
Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.
🤨 Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.
Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.
But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers
Post
4954
STOP EVERYTHING NOW - we might finally have a radical architecture improvement over Transformers!!! 🚨
A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year.
➡️ Tiny Recursive Model is 7M parameters
➡️ On ARC-AGI, it beats flagship models like Gemini-2.5-pro
Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger
and had 1,000 as many authors 😂 (Alexia is alone on the paper)
What's this sorcery?
In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables: one for the proposed answer and one for the reasoning.
@AlexiaJM started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M)
Hierarchical Reasoning Model had introduced one main feature:
🔎 Deep supervision
In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps.
They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers.
Alexia studied what was useful and what wasn't, and cleaned the architecture as follows :
Why use a recurrent architecture, when you can just make it a loop?
➡️ She made the network recursive, looping over itself
Why use 2 latent variables ?
➡️ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer.
➡️ She runs ablation studies to validate that 2 is indeed optimal.
This new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do.
This might be the breakthrough we've been awaiting for so long!
A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year.
➡️ Tiny Recursive Model is 7M parameters
➡️ On ARC-AGI, it beats flagship models like Gemini-2.5-pro
Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger
and had 1,000 as many authors 😂 (Alexia is alone on the paper)
What's this sorcery?
In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables: one for the proposed answer and one for the reasoning.
@AlexiaJM started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M)
Hierarchical Reasoning Model had introduced one main feature:
🔎 Deep supervision
In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps.
They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers.
Alexia studied what was useful and what wasn't, and cleaned the architecture as follows :
Why use a recurrent architecture, when you can just make it a loop?
➡️ She made the network recursive, looping over itself
Why use 2 latent variables ?
➡️ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer.
➡️ She runs ablation studies to validate that 2 is indeed optimal.
This new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do.
This might be the breakthrough we've been awaiting for so long!
Post
2323
Cool stuff these past weeks on huggingface! 🤗 🚀 !
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio
Post
2942
🤖 As AI-generated content is shared in movies/TV/across the web, there's one simple low-hanging fruit 🍇 to help know what's real: Visible watermarks. With the Gradio team, I've made sure it's trivially easy to add this disclosure to images, video, chatbot text. See how: https://huggingface.co/blog/watermarking-with-gradio
Thanks to the code collab in particular from @abidlabs and Yuvraj Sharma.
Thanks to the code collab in particular from @abidlabs and Yuvraj Sharma.
Post
2998
New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation.
Evaluation Dataset: AI-companionship/INTIMA
Paper: AI-companionship/INTIMA
Work from @giadap , @frimelle , @yjernite .
Evaluation Dataset: AI-companionship/INTIMA
Paper: AI-companionship/INTIMA
Work from @giadap , @frimelle , @yjernite .
1024m
authored a paper 7 months ago
Post
458
🤖 ICYMI: Yesterday, Hugging Face and OpenAI partnered to bring open source GPT to the public. This is a Big Deal in "AI world".
0. Common ground setting: OpenAI is the ChatGPT people. An “open source” model is one whose weights are available — that means the model can be “yours”.
1. You don’t have to interact with the company directly, nor give them your interactions, to use the system. The company can't "surveil" you.
2. You can evaluate the unique contributions of their SOTA model much more rigorously than you can when there are collections of models+code behind a closed API. You can find out specifically what the model can and can't do.
3. And you can directly customize it for whatever you'd like. Fine-tuning, wherein you give the model data that's tailored to your use cases and train it some more on that data, is trivial* when you have the model weights.
*Provided you have the compute.
4. You can directly benchmark whatever you'd like. Biases? Energy usage? Strengths/weaknesses? Go for it. You wants it you gots it--this transparency helps people understand SOTA *in general*, not just for this model, but points to, e.g., what's going on with closed Google models as well.
5. One of the most powerful things about "openness" that I've learned is that it cultivates ecosystems of collaborators building on top of one another's brilliance to make systems that are significantly better than they would be if created in isolation.
But, caveat wrt my own philosophy...
6. I do not take it as a given that advancing LLMs is good, and have a lot more to say wrt where I think innovation should focus more. For example, a focus on *data* -- curation, measurement, consent, credit, compensation, safety -- would deeply improve technology for everyone.
7. The transparency this release provides is massive for people who want to *learn* about LLMs. For the next generation of technologists to advance over the current, they MUST be able to learn about what's happening now. (cont...)
0. Common ground setting: OpenAI is the ChatGPT people. An “open source” model is one whose weights are available — that means the model can be “yours”.
1. You don’t have to interact with the company directly, nor give them your interactions, to use the system. The company can't "surveil" you.
2. You can evaluate the unique contributions of their SOTA model much more rigorously than you can when there are collections of models+code behind a closed API. You can find out specifically what the model can and can't do.
3. And you can directly customize it for whatever you'd like. Fine-tuning, wherein you give the model data that's tailored to your use cases and train it some more on that data, is trivial* when you have the model weights.
*Provided you have the compute.
4. You can directly benchmark whatever you'd like. Biases? Energy usage? Strengths/weaknesses? Go for it. You wants it you gots it--this transparency helps people understand SOTA *in general*, not just for this model, but points to, e.g., what's going on with closed Google models as well.
5. One of the most powerful things about "openness" that I've learned is that it cultivates ecosystems of collaborators building on top of one another's brilliance to make systems that are significantly better than they would be if created in isolation.
But, caveat wrt my own philosophy...
6. I do not take it as a given that advancing LLMs is good, and have a lot more to say wrt where I think innovation should focus more. For example, a focus on *data* -- curation, measurement, consent, credit, compensation, safety -- would deeply improve technology for everyone.
7. The transparency this release provides is massive for people who want to *learn* about LLMs. For the next generation of technologists to advance over the current, they MUST be able to learn about what's happening now. (cont...)
1024m
authored a paper 7 months ago
Post
510
🤖 👾 Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <—> China relationship in AI, and what it means for AI ethics.
Post
3903
Open-source is catching up on Deep Research! 🔥 an Alibaba team has published a New data + RL recipe that allows open models to compete with OpenAI’s Deep Research.
This is one of the best papers I’ve read on fine-tuning LLMs for agentic use-cases.
Deep Research use cases, those where you task an agent to go very broad in its search on a topic, sometimes launching 100s of web searches to refine the answer. Here’s an example: “Between 1990 and 1994 inclusive, what teams played in a soccer match with a Brazilian referee had four yellow cards, two for each team where three of the total four were not issued during the first half, and four substitutions, one of which was for an injury in the first 25 minutes of the match.” (answer: Ireland v Romania)
Open-source model just weren’t performing that well. The team from Alibaba posited that the main cause for this was that Deep research-like tasks simply were missing from training data. Indeed, our usual agentic training data of a few tool calls hardly cover this “many-steps-with-unclear-entities” type of query.
So researchers decided to fill the gap, and create a high-quality dataset for Deep Research.
My highlights from the paper:
1 - The data: by smartly leveraging an ontology of knowledge as entities linked in a graph, they can then choose an arbitrary big subgraph to craft an arbitrarily difficult request. This process produced SailorfogQA, a high-quality traiing dataset for Deep Research.
2 - The traning methods: They start from Qwen 2.5. After fine-tuning on their dataset, researchers apply a round RL with a reward on format + answer (scored by LLM judge), and it does increase performance ~4% across all benchmarks.
I'm still amazed by the quality produced by Alibaba-NLP (makers of Qwen) - keep these papers coming!
This is one of the best papers I’ve read on fine-tuning LLMs for agentic use-cases.
Deep Research use cases, those where you task an agent to go very broad in its search on a topic, sometimes launching 100s of web searches to refine the answer. Here’s an example: “Between 1990 and 1994 inclusive, what teams played in a soccer match with a Brazilian referee had four yellow cards, two for each team where three of the total four were not issued during the first half, and four substitutions, one of which was for an injury in the first 25 minutes of the match.” (answer: Ireland v Romania)
Open-source model just weren’t performing that well. The team from Alibaba posited that the main cause for this was that Deep research-like tasks simply were missing from training data. Indeed, our usual agentic training data of a few tool calls hardly cover this “many-steps-with-unclear-entities” type of query.
So researchers decided to fill the gap, and create a high-quality dataset for Deep Research.
My highlights from the paper:
1 - The data: by smartly leveraging an ontology of knowledge as entities linked in a graph, they can then choose an arbitrary big subgraph to craft an arbitrarily difficult request. This process produced SailorfogQA, a high-quality traiing dataset for Deep Research.
2 - The traning methods: They start from Qwen 2.5. After fine-tuning on their dataset, researchers apply a round RL with a reward on format + answer (scored by LLM judge), and it does increase performance ~4% across all benchmarks.
I'm still amazed by the quality produced by Alibaba-NLP (makers of Qwen) - keep these papers coming!
Post
2730
Diffusion LLMs are coming for autoregressive LLMs ⚡️⚡️ Inception Labs' new diffusion model demolishes all leading LLMs on generation speed, with equal quality !
Inception Labs was founded a few months ago, and they're not sleeping: after dropping a code model, they just published Mercury chat, a diffusion-based chat model that reaches 1000 tokens / second on H100, i.e. 10x more than models of equivalent performance on the same hardware!
What's the breakthrough? Well instead, of generating tokens left-to-right like the more common autoregressive LLMs, diffusion models generate their blocks of text all at once, and successive steps refine the whole text.
Diffusion models being really fast at isn't new, we have had some promising results on this by Google already back in May with Gemini Diffusion, and Mercury themselves had already published their coding model a few months ago
But being that good quality is new - and now Inception Labs just proved that their models work well in chat too, which could have been challenging given that's streaming generation is well suited to left-to-right generation.
They have a playground available at chat dot inceptionlabs dot ai, I recommend giving it a try!
Inception Labs was founded a few months ago, and they're not sleeping: after dropping a code model, they just published Mercury chat, a diffusion-based chat model that reaches 1000 tokens / second on H100, i.e. 10x more than models of equivalent performance on the same hardware!
What's the breakthrough? Well instead, of generating tokens left-to-right like the more common autoregressive LLMs, diffusion models generate their blocks of text all at once, and successive steps refine the whole text.
Diffusion models being really fast at isn't new, we have had some promising results on this by Google already back in May with Gemini Diffusion, and Mercury themselves had already published their coding model a few months ago
But being that good quality is new - and now Inception Labs just proved that their models work well in chat too, which could have been challenging given that's streaming generation is well suited to left-to-right generation.
They have a playground available at chat dot inceptionlabs dot ai, I recommend giving it a try!
Post
3666
If you're using any HF libraries, you should enable the Hub MCP in your agentic coding tool!
The brand new Docs Semantic Search tool is intravenous caffeine supply for Cursor, enables to correct API errors in a few seconds, gj @mishig ⚡️⚡️
👉 To enable Hub MCP, head to your account setting, under MCP, and it will give you everything you need!
The brand new Docs Semantic Search tool is intravenous caffeine supply for Cursor, enables to correct API errors in a few seconds, gj @mishig ⚡️⚡️
👉 To enable Hub MCP, head to your account setting, under MCP, and it will give you everything you need!
Post
1776
If you didn't yet, you should read the technical report for SmolVLA, published yesterday by the Hugging Face robotics team!
➡️ Amongst other ideas, it introduces "Async inference" to boost their robot actions.
Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!)
Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!
So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.
➡️ This boosted robot throughput by ~30%! (nearly 2× tasks per time window).
gg @cadene and team! 👏
Report here: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)
➡️ Amongst other ideas, it introduces "Async inference" to boost their robot actions.
Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!)
Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!
So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.
➡️ This boosted robot throughput by ~30%! (nearly 2× tasks per time window).
gg @cadene and team! 👏
Report here: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)