Microsoft's rStar-Math paper claims that 🤏 ~7B models can match the math skills of o1 using clever train- and test-time techniques. You can now download their prompt templates from Hugging Face ! 📏 The paper introduces rStar-Math, which claims to rival OpenAI o1's math reasoning capabilities by integrating Monte Carlo Tree Search (MCTS) with step-by-step verified reasoning trajectories. 🤖 A Process Preference Model (PPM) enables fine-grained evaluation of intermediate steps, improving training data quality. 🧪 The system underwent four rounds of self-evolution, progressively refining both the policy and reward models to tackle Olympiad-level math problems—without GPT-4-based data distillation. 💾 While we wait for the release of code and datasets, you can already download the prompts they used from the HF Hub! Details and links here 👇 Prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/ Templates on the hub: MoritzLaurer/rstar-math-prompts Prompt-templates collection: MoritzLaurer/prompt-templates-6776aa0b0b8a923957920bb4 Paper: https://arxiv.org/pdf/2501.04519
✨ MiniMax-text-01: - 456B with 45.9B activated per token - Combines Lightning Attention, Softmax Attention, and MoE for optimal performance - Training context up to 1M tokens, inference handles 4M tokens
✨ MiniMax-VL-01: - ViT-MLP-LLM framework ( non-transformer👀) - Handles image inputs from 336×336 to 2016×2016 - 694M image-caption pairs + 512B tokens processed across 4 stages
InternLM3-8B-instruct🔥 Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost! internlm/internlm3-67875827c377690c01a9131d
Microsoft's rStar-Math paper claims that 🤏 ~7B models can match the math skills of o1 using clever train- and test-time techniques. You can now download their prompt templates from Hugging Face ! 📏 The paper introduces rStar-Math, which claims to rival OpenAI o1's math reasoning capabilities by integrating Monte Carlo Tree Search (MCTS) with step-by-step verified reasoning trajectories. 🤖 A Process Preference Model (PPM) enables fine-grained evaluation of intermediate steps, improving training data quality. 🧪 The system underwent four rounds of self-evolution, progressively refining both the policy and reward models to tackle Olympiad-level math problems—without GPT-4-based data distillation. 💾 While we wait for the release of code and datasets, you can already download the prompts they used from the HF Hub! Details and links here 👇 Prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/ Templates on the hub: MoritzLaurer/rstar-math-prompts Prompt-templates collection: MoritzLaurer/prompt-templates-6776aa0b0b8a923957920bb4 Paper: https://arxiv.org/pdf/2501.04519
🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.
We apply our recipe to train 2 Static Embedding models that we release today! We release: 2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0 🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation 📜 my training scripts, using the Sentence Transformers library 📊 my Weights & Biases reports with losses & metrics 📕 my list of 30 training and 13 evaluation datasets
The 2 Static Embedding models have the following properties: 🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5' 0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed! 📏 No maximum sequence length! Embed texts at any length (note: longer texts may embed worse) 📐 Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more. 🪆 Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)
Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings
The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.
FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!
📏 The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.
🤖 Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.
🧪 The authors tested different prompt templates on held-out data to ensure their generalization.
📚 It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.
💾 You can now download and reuse these prompt templates via the prompt-templates library!
🔄 The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Let’s make LLM work more transparent and reproducible by sharing more templates like this!