โจ MiniMax-text-01: - 456B with 45.9B activated per token - Combines Lightning Attention, Softmax Attention, and MoE for optimal performance - Training context up to 1M tokens, inference handles 4M tokens
โจ MiniMax-VL-01: - ViT-MLP-LLM framework ( non-transformer๐) - Handles image inputs from 336ร336 to 2016ร2016 - 694M image-caption pairs + 512B tokens processed across 4 stages
๐ Let me introduce the work I've done over the past three months: ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ฎ-๐ง๐ฎ๐ถ๐๐ฎ๐ป-๐ฏ๐ and ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ฎ-๐ง๐ฎ๐ถ๐๐ฎ๐ป-๐ฏ๐-๐๐ป๐๐๐ฟ๐๐ฐ๐, now open-sourced on ๐ค Hugging Face.
๐น๐ถ๐ฎ๐ป๐ด๐ต๐๐๐ป/๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ฎ-๐ง๐ฎ๐ถ๐๐ฎ๐ป-๐ฏ๐: This model is built on top of ๐บ๐ฒ๐๐ฎ-๐น๐น๐ฎ๐บ๐ฎ/๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ฎ-๐ฏ๐ with continual pretraining. The training dataset consists of a mixture of Traditional Chinese and multilingual texts in specific proportions, including 20B tokens of Traditional Chinese text.
๐น๐ถ๐ฎ๐ป๐ด๐ต๐๐๐ป/๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ฎ-๐ง๐ฎ๐ถ๐๐ฎ๐ป-๐ฏ๐-๐๐ป๐๐๐ฟ๐๐ฐ๐: This is a fine-tuned conversational model based on the foundation model.
This Llama-3.2-Taiwan open-source project is currently a one-person effort (yes, I did everything from text preparation โ so exhausting!). If you're interested, feel free to join the Discord server for discussions.
๐ ฑ๐ ด๐ ฝ๐ ฒ๐ ท๐ ผ๐ ฐ๐๐ บ๐ ธ๐ ฝ๐ ถ
The evaluation was conducted using ikala/tmmluplus, though the README page does not yet reflect the latest results. The performance is close to the previous versions, indicating that further improvements might require adding more specialized knowledge in the datasets.
InternLM3-8B-instruct๐ฅ Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost! internlm/internlm3-67875827c377690c01a9131d
reacted to MoritzLaurer's
post with ๐3 days ago
Microsoft's rStar-Math paper claims that ๐ค ~7B models can match the math skills of o1 using clever train- and test-time techniques. You can now download their prompt templates from Hugging Face ! ๐ The paper introduces rStar-Math, which claims to rival OpenAI o1's math reasoning capabilities by integrating Monte Carlo Tree Search (MCTS) with step-by-step verified reasoning trajectories. ๐ค A Process Preference Model (PPM) enables fine-grained evaluation of intermediate steps, improving training data quality. ๐งช The system underwent four rounds of self-evolution, progressively refining both the policy and reward models to tackle Olympiad-level math problemsโwithout GPT-4-based data distillation. ๐พ While we wait for the release of code and datasets, you can already download the prompts they used from the HF Hub! Details and links here ๐ Prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/ Templates on the hub: MoritzLaurer/rstar-math-prompts Prompt-templates collection: MoritzLaurer/prompt-templates-6776aa0b0b8a923957920bb4 Paper: https://arxiv.org/pdf/2501.04519
reacted to fdaudens's
post with โค๏ธ๐3 days ago
@meg, one of the best researchers in AI ethics, makes a critical point about autonomy: fully autonomous systems carry unknowable risks because they operate on computer logic rather than human logic.
The solution? Build systems that support & assist rather than override human decisions.
I highly recommend reading the blog post written by Meg, @evijit@sasha and @giadap. They define different levels of agent autonomy & provide a values-based analysis of risks, benefits, and uses of AI agents to help you make better decisions.
Weโre launching a FREE and CERTIFIED course on Agents!
We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents.
Here's what you'll learn:
- Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions. - Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors. - Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents. - Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents. Audience
This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents.
Enroll today and start building the next generation of AI agent applications!
๐๏ธ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.
We apply our recipe to train 2 Static Embedding models that we release today! We release: 2๏ธโฃ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0 ๐ง my modern training strategy: ideation -> dataset choice -> implementation -> evaluation ๐ my training scripts, using the Sentence Transformers library ๐ my Weights & Biases reports with losses & metrics ๐ my list of 30 training and 13 evaluation datasets
The 2 Static Embedding models have the following properties: ๐๏ธ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5' 0๏ธโฃ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed! ๐ No maximum sequence length! Embed texts at any length (note: longer texts may embed worse) ๐ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more. ๐ช Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)
Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings
The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.
Major update on the Talking to Chatbots dataset! Expanded the 'wrapped' dataset (one row per chat) to 2.86k records, and the 'unwrapped' version (one row per conversation turn) to 11k records. The main source is my ChatGPT archive with nearly 2 years of chats. It is still a work in progress as I incorporate chats from other sources and qualitative metrics (SCBN) for responses.