view article Article Introducing Storage Buckets on the Hugging Face Hub +10 about 23 hours ago • 104
🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 8 days ago • 12
Finance Commons Collection A large collection of multimodal financial documents in open data. • 7 items • Updated Jul 17, 2024 • 13
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated 12 days ago • 87
view article Article Did GPT 5.2 make a breakthrough discovery in theoretical physics? 19 days ago • 60
view article Article Follow the White Rabbit: Using Embeddings So You Never Get Lost in Translation 15 days ago • 8
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 19 days ago • 479
view article Article Compute and Competition in AI: Different FlOPs for Different Folks 26 days ago • 12
Olmix: A Framework for Data Mixing Throughout LM Development Paper • 2602.12237 • Published 26 days ago • 2
view article Article Building a Mood-Based Movie Recommendation Engine with Voyage-4-nano, Hugging Face, and MongoDB Atlas Vector Search about 1 month ago • 4
view article Article Introducing Daggr: Chain apps programmatically, inspect visually +3 Jan 29 • 103
compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data Paper • 2602.06669 • Published Feb 6 • 7
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 88