Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

merveΒ 
posted an update 2 days ago
view post
Post
2214
This week in open AI was πŸ”₯ Let's recap! πŸ€— merve/january-31-releases-679a10669bd4030090c5de4d
LLMs πŸ’¬
> Huge: AllenAI released new TΓΌlu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B πŸ”₯
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license 😱
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license πŸ”₯
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset

VLMs & vision πŸ‘€
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization πŸ”₯
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!

Audio πŸ—£οΈ
> YuE is a new open-source music generation foundation model, lyrics-to-song generation

Codebase πŸ‘©πŸ»β€πŸ’»
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co/blog/open-r1
  • 1 reply
Β·
chansungΒ 
posted an update about 20 hours ago
view post
Post
972
A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/
rubenroyΒ 
posted an update about 22 hours ago
singhsidhukuldeepΒ 
posted an update 3 days ago
view post
Post
1828
Exciting breakthrough in AI: AirRAG - A Novel Approach to Retrieval Augmented Generation!

Researchers from Alibaba Cloud have developed a groundbreaking framework that significantly improves how AI systems reason and retrieve information. AirRAG introduces five fundamental reasoning actions that work together to create more accurate and comprehensive responses.

>> Key Technical Innovations:
- Implements Monte Carlo Tree Search (MCTS) for exploring diverse reasoning paths
- Utilizes five core actions: System Analysis, Direct Answer, Retrieval-Answer, Query Transformation, and Summary-Answer
- Features self-consistency verification and process-supervised reward modeling
- Achieves superior performance across complex QA datasets like HotpotQA, MuSiQue, and 2WikiMultiHopQA

>> Under the Hood:
The system expands solution spaces through tree-based search, allowing for multiple reasoning paths to be explored simultaneously. The framework implements computationally optimal strategies, applying more resources to key actions while maintaining efficiency.

>> Results Speak Volumes:
- Outperforms existing RAG methods by over 10% on average
- Shows remarkable scalability with increasing inference computation
- Demonstrates exceptional flexibility in integrating with other advanced technologies

This research represents a significant step forward in making AI systems more capable of complex reasoning tasks. The team's innovative approach combines human-like reasoning with advanced computational techniques, setting new benchmarks in the field.
prithivMLmodsΒ 
posted an update about 6 hours ago
view post
Post
368
o3-Mini and Deepseek R1
Worked out with some famous and weird examples.

πŸ”₯Blog: https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1

Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.

example 1: o3 Mini , example 2: Deepseek R1

Q2 : https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer
singhsidhukuldeepΒ 
posted an update 1 day ago
view post
Post
1512
Excited to share groundbreaking research from @Baidu_Inc on enterprise information search! The team has developed EICopilot, a revolutionary agent-based solution that transforms how we explore enterprise data in large-scale knowledge graphs.

>> Technical Innovation
EICopilot leverages Large Language Models to interpret natural language queries and automatically generates Gremlin scripts for enterprise data exploration. The system processes hundreds of millions of nodes and billions of edges in real-time, handling complex enterprise relationships with remarkable precision.

Key Technical Components:
- Advanced data pre-processing pipeline that builds vector databases of representative queries
- Novel query masking strategy that significantly improves intent recognition
- Comprehensive reasoning pipeline combining Chain-of-Thought with In-context learning
- Named Entity Recognition and Natural Language Processing Customization for precise entity matching
- Schema Linking Module for efficient graph database query generation

>> Performance Metrics
The results are impressive - EICopilot achieves a syntax error rate as low as 10% and execution correctness up to 82.14%. The system handles 5000+ daily active users, demonstrating its robustness in real-world applications.

>> Implementation Details
The system uses Apache TinkerPop for graph database construction and employs sophisticated disambiguation processes, including anaphora resolution and entity retrieval. The architecture includes both offline and online phases, with continuous learning from user interactions to improve query accuracy.

Kudos to the research team from Baidu Inc., South China University of Technology, and other collaborating institutions for this significant advancement in enterprise information retrieval technology.
  • 1 reply
Β·
MonsterMMORPGΒ 
posted an update 1 day ago
view post
Post
1257
Paints-UNDO Installers Published - Undo Images Like Drawing From Scratch - 1-Click Install for Windows, RunPod, Massed Compute, Kaggle

Installers shared here : https://www.patreon.com/posts/121228327

Check attached images

PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings

So what this APP do is that, you give an input image, it tries to mimic how that image could have been drawn with an artist like steps of drawing it

The APP generates image steps points in time and also a video output of drawing like drawing from scratch

Official Repo : https://github.com/lllyasviel/Paints-UNDO

We have Low VRAM mode and it works great

1-Click Installers and Gradio APP : https://www.patreon.com/posts/121228327

We have 1-Click installer for Windows, RunPod, Massed Compute and a Free Kaggle Notebook. Read this post extremely carefully to learn how to use all
luigi12345Β 
posted an update 1 day ago
view post
Post
1321
πŸš€ OpenAI o3-mini Just Dropped – Here’s What You Need to Know!

OpenAI just launched o3-mini, a faster, smarter upgrade over o1-mini. It’s better at math, coding, and logic, making it more reliable for structured tasks. Now available in ChatGPT & API, with function calling, structured outputs, and system messages.

πŸ”₯ Why does this matter?
βœ… Stronger in logic, coding, and structured reasoning
βœ… Function calling now works reliably for API responses
βœ… More stable & efficient for production tasks
βœ… Faster responses with better accuracy

⚠️ Who should use it?
βœ”οΈ Great for coding, API calls, and structured Q&A
❌ Not meant for long conversations or complex reasoning (GPT-4 is better)

πŸ’‘ Free users: Try it under β€œReason” mode in ChatGPT
πŸ’‘ Plus/Team users: Daily message limit tripled to 150/day!
  • 1 reply
Β·
m-ricΒ 
posted an update 2 days ago
view post
Post
1584
Now you can launch a code agent directly from your terminal!
✨ πšœπš–πš˜πš•πšŠπšπšŽπš—πš "πšˆπš˜πšžπš› πšπšŠπšœπš”" directly launches a CodeAgent
▢️ This also works with web agents (replace πšœπš–πš˜πš•πšŠπšπšŽπš—πš with πš πšŽπš‹πšŠπšπšŽπš—πš) thanks to @merve !

πŸ’Ύ Another treat from smolagents release 1.7.0:
Now agents have a memory mechanism, enabling many possibilities like replaying the last run with πšŠπšπšŽπš—πš.πš›πšŽπš™πš•πšŠπš’(), thank you @clefourrier !

Check the release notes here πŸ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.7.0
jasoncorkillΒ 
posted an update 2 days ago
view post
Post
2536
We benchmarked @xai-org 's Aurora model, as far as we know the first public evaluation of the model at scale.

We collected 401k human annotations in over the past ~2 days for this, we have uploaded all of the annotation data here on huggingface with a fully permissive license
Rapidata/xAI_Aurora_t2i_human_preferences