Data Agents

Enterprise
community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

data-agents's activity

m-ricĀ 
posted an update 2 days ago
view post
Post
761
š— š—¶š—»š—¶š— š—®š˜…'š˜€ š—»š—²š˜„ š— š—¼š—˜ š—Ÿš—Ÿš—  š—暝—²š—®š—°š—µš—²š˜€ š—–š—¹š—®š˜‚š—±š—²-š—¦š—¼š—»š—»š—²š˜ š—¹š—²š˜ƒš—²š—¹ š˜„š—¶š˜š—µ šŸ°š—  š˜š—¼š—øš—²š—»š˜€ š—°š—¼š—»š˜š—²š˜…š˜ š—¹š—²š—»š—“š˜š—µ šŸ’„

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

š—žš—²š˜† š—¶š—»š˜€š—¶š—“š—µš˜š˜€:

šŸ—ļø MoE with novel hybrid attention:
ā€£ Mixture of Experts with 456B total parameters (45.9B activated per token)
ā€£ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

šŸ† Outperforms leading models across benchmarks while offering vastly longer context:
ā€£ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
ā€£ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

šŸ”¬ Technical innovations enable efficient scaling:
ā€£ Novel expert parallel and tensor parallel strategies cut communication overhead in half
ā€£ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

šŸŽÆ Thorough training strategy:
ā€£ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! šŸ“
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here šŸ‘‰ MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users šŸ‘‰ MiniMaxAI/MiniMax-Text-01
m-ricĀ 
posted an update 3 days ago
view post
Post
2145
š—Ŗš—²'š˜ƒš—² š—·š˜‚š˜€š˜ š—暝—²š—¹š—²š—®š˜€š—²š—± š˜€š—ŗš—¼š—¹š—®š—“š—²š—»š˜š˜€ š˜ƒšŸ­.šŸÆ.šŸ¬ šŸš€, and it comes with a major feature: you can now log agent runs using OpenTelemetry to inspect them afterwards! šŸ“Š

This interactive format is IMO much easier to inspect big multi-step runs than endless console logs.

The setup is very easy, in a few lines of code.

Find a tutorial here šŸ‘‰ https://huggingface.co/docs/smolagents/tutorials/inspect_runs
  • 4 replies
Ā·
m-ricĀ 
posted an update 6 days ago
view post
Post
559
š—¢š—¦-š—šš—²š—»š—²š˜€š—¶š˜€: š—»š—²š˜„ š—暝—²š˜€š—²š—®š—暝—°š—µ š—½š—®š—½š—²š—æ š—½š—暝—¼š—½š—¼š˜€š—²š˜€ š—® š—»š—¼š˜ƒš—²š—¹ š˜š—暝—®š—¶š—»š—¶š—»š—“ š—±š—®š˜š—® š—“š—²š—»š—²š—暝—®š˜š—¶š—¼š—» š—ŗš—²š˜š—µš—¼š—± š—³š—¼š—æ š—–š—¹š—®š˜‚š—±š—²-š—–š—¼š—ŗš—½š˜‚š˜š—²š—æ-š—Øš˜€š—²-š—¹š—¶š—øš—² š—®š—“š—²š—»š˜š˜€, š˜„š—¶š˜š—µ š—¶š—ŗš—½š—暝—²š˜€š˜€š—¶š˜ƒš—² š—暝—²š˜€š˜‚š—¹š˜š˜€! šŸ”„

The main bottleneck in building GUI agents it to find training data.
GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do

You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks.

āž”ļø Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions.

šŸ” Exploration-driven vs task-driven approach:
ā€£ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting
ā€£ It then reverse-engineers high-level tasks from successful interaction patterns
ā€£ This leads to more natural and diverse training data than predefined tasks

šŸŽÆ Novel reward model for trajectory quality:
ā€£ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion
ā€£ This preserves valuable partial successes that would otherwise be wasted

šŸ† Superior results across environments:
ā€£ Nearly doubles performance on AndroidWorld (9.8% ā†’ 17.4%)

By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100!

Read the paper here šŸ‘‰ OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis (2412.19723)
m-ricĀ 
posted an update 11 days ago
view post
Post
4983
Since I published it on GitHub a few days ago,
Hugging Face's new agentic library š˜€š—ŗš—¼š—¹š—®š—“š—²š—»š˜š˜€ has gathered nearly 4k stars šŸ¤Æ

āž”ļø But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. āœØ

Sounds like something you'd like to do? Apply here šŸ‘‰ https://apply.workable.com/huggingface/j/AF1D4E3FEB/
Ā·
lewtunĀ 
posted an update 12 days ago
view post
Post
3267
I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co/blog/ganqu/prime
lewtunĀ 
posted an update 19 days ago
view post
Post
2130
This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
  • 1 reply
Ā·
lvwerraĀ 
updated a Space 30 days ago
m-ricĀ 
posted an update 30 days ago
view post
Post
2316
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: š—Ŗš—²š—¹š—°š—¼š—ŗš—² š— š—¼š—±š—²š—暝—»š—•š—˜š—„š—§! šŸ¤—

We talk a lot about āœØGenerative AIāœØ, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

āž”ļø Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

š—§š—Ÿ;š——š—„:
šŸ›ļø Architecture changes:
ā‡’ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
āœØ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

šŸ„‡ As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post šŸ‘‰ https://huggingface.co/blog/modernbert
  • 1 reply
Ā·
anton-lĀ 
posted an update 30 days ago
view post
Post
2234
Introducing šŸ“š…š¢š§šžšŒššš­š”: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
šŸ› ļø carefully extracting math data from Common Crawl;
šŸ”Ž iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! šŸš€
Weā€™re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
m-ricĀ 
posted an update 30 days ago
view post
Post
2477
š‡š®š š š¢š§š  š…šššœšž š«šžš„šžššš¬šžš¬ šš¢šœšØš­š«šØš§, šš š¦š¢šœš«šØš¬šœšØš©š¢šœ š„š¢š› š­š”ššš­ š¬šØš„šÆšžš¬ š‹š‹šŒ š­š«ššš¢š§š¢š§š  šŸ’šƒ š©ššš«ššš„š„šžš„š¢š³ššš­š¢šØš§ šŸ„³

šŸ•°ļø Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

šŸ‘“šŸ» If they had needed all this time, we would have GPU stories from the time of Pharaoh š“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

šŸ› ļø But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

šŸ¤ š—•š˜‚š˜ š—»š—¼š˜„ š˜„š—² š—±š—¼š—»'š˜ š—»š—²š—²š—± š—µš˜‚š—“š—² š—暝—²š—½š—¼š˜€ š—®š—»š˜†š—ŗš—¼š—暝—²! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

āš” š—œš˜'š˜€ š˜š—¶š—»š˜†, š˜†š—²š˜ š—½š—¼š˜„š—²š—暝—³š˜‚š—¹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look šŸ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
  • 1 reply
Ā·
freddyaboultonĀ 
updated a Space about 1 month ago
freddyaboultonĀ 
posted an update about 1 month ago
freddyaboultonĀ 
posted an update about 1 month ago
lewtunĀ 
posted an update about 1 month ago
view post
Post
6739
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute šŸ”„

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

šŸ“ˆ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

šŸŽ„ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

šŸ§­ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
Ā·
m-ricĀ 
posted an update about 1 month ago
view post
Post
2207
š—£š—¼š˜š—²š—»š˜š—¶š—®š—¹ š—½š—®š—暝—®š—±š—¶š—“š—ŗ š˜€š—µš—¶š—³š˜ š—¶š—» š—Ÿš—Ÿš— š˜€: š—»š—²š˜„ š—½š—®š—½š—²š—æ š—Æš˜† š— š—²š˜š—® š—°š—¹š—®š—¶š—ŗš˜€ š˜š—µš—®š˜ š˜„š—² š—°š—®š—» š—“š—²š˜ š—暝—¶š—± š—¼š—³ š˜š—¼š—øš—²š—»š—¶š˜‡š—²š—暝˜€! šŸ„³

Current LLMs process text by first splitting it into tokens. They use a module named "tokenizer", that -spl-it-s- th-e- te-xt- in-to- arbitrary tokens depending on a fixed dictionnary.
On the Hub you can find this dictionary in a model's files under tokenizer.json.

āž”ļø This process is called BPE tokenization. It is suboptimal, everyone says it. It breaks text into predefined chunks that often fail to capture the nuance of language. But it has been a necessary evil in language models since their inception.

šŸ’„ In Byte Latent Transformer (BLT), Meta researchers propose an elegant solution by eliminating tokenization entirely, working directly with raw bytes while maintaining efficiency through dynamic "patches."

This had been tried before with different byte-level tokenizations, but it's the first time that an architecture of this type scales as well as BPE tokenization. And it could mean a real paradigm shift! šŸ‘šŸ‘

šŸ—ļø š—”š—暝—°š—µš—¶š˜š—²š—°š˜š˜‚š—暝—²:
Instead of a lightweight tokenizer, BLT has a lightweight encoder that process raw bytes into patches. Then the patches are processed by the main heavy-duty transformers as we do normally (but for patches of bytes instead of tokens), before converting back to bytes.

šŸ§© š——š˜†š—»š—®š—ŗš—¶š—° š—£š—®š˜š—°š—µš—¶š—»š—“:
Instead of fixed tokens, BLT groups bytes based on their predictability (measured by entropy) - using more compute for complex sequences and efficiently handling simple ones. This allows efficient processing while maintaining byte-level understanding.

I hope this breakthrough is confirmed and we can get rid of all the tokenizer stuff, it will make model handling easier!

Read their paper here šŸ‘‰ https://dl.fbaipublicfiles.com/blt/BLT__Patches_Scale_Better_Than_Tokens.pdf
  • 2 replies
Ā·
freddyaboultonĀ 
posted an update about 1 month ago
view post
Post
1997
Version 0.0.21 of gradio-pdf now properly loads chinese characters!
freddyaboultonĀ 
posted an update about 1 month ago
view post
Post
1557
Hello Llama 3.2! šŸ—£ļøšŸ¦™

Build a Siri-like coding assistant that responds to "Hello Llama" in 100 lines of python! All with Gradio, webRTC šŸ˜Ž

freddyaboulton/hey-llama-code-editor
m-ricĀ 
posted an update about 1 month ago
view post
Post
2598
šŸ’„ š—šš—¼š—¼š—“š—¹š—² š—暝—²š—¹š—²š—®š˜€š—²š˜€ š—šš—²š—ŗš—¶š—»š—¶ šŸ®.šŸ¬, š˜€š˜š—®š—暝˜š—¶š—»š—“ š˜„š—¶š˜š—µ š—® š—™š—¹š—®š˜€š—µ š—ŗš—¼š—±š—²š—¹ š˜š—µš—®š˜ š˜€š˜š—²š—®š—ŗš—暝—¼š—¹š—¹š˜€ š—šš—£š—§-šŸ°š—¼ š—®š—»š—± š—–š—¹š—®š˜‚š—±š—²-šŸÆ.šŸ² š—¦š—¼š—»š—»š—²š˜! And they start a huge effort on agentic capabilities.

šŸš€ The performance improvements are crazy for such a fast model:
ā€£ Gemini 2.0 Flash outperforms the previous 1.5 Pro model at twice the speed
ā€£ Now supports both input AND output of images, video, audio and text
ā€£ Can natively use tools like Google Search and execute code

āž”ļø If the price is on par with previous Flash iteration ($0.30 / M tokens, to compare with GPT-4o's $1.25) the competition will have a big problem with this 4x cheaper model that gets better benchmarks šŸ¤Æ

šŸ¤– What about the agentic capabilities?

ā€£ Project Astra: A universal AI assistant that can use Google Search, Lens and Maps
ā€£ Project Mariner: A Chrome extension that can complete complex web tasks (83.5% success rate on WebVoyager benchmark, this is really impressive!)
ā€£ Jules: An AI coding agent that integrates with GitHub workflows

I'll be eagerly awaiting further news from Google!

Read their blogpost here šŸ‘‰ https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/