SauerkrautLM-ColLFM2-450M-v0.1

VAGO Solutions Logo

πŸ† #1 Small Model (<1B) | Best-in-Class Efficiency

SauerkrautLM-ColLFM2-450M-v0.1 is the #1 small model for visual document retrieval, achieving 83.56 NDCG@5 on ViDoRe v1 - beating colSmol-500M (82.49) with 10% fewer parameters!

ViDoRe v1 Benchmark - 128-dim Models

🎯 Why Visual Document Retrieval?

Traditional OCR-based retrieval loses layout, tables, and visual context. Our visual approach:

  • βœ… No OCR errors - Direct visual understanding
  • βœ… Layout-aware - Understands tables, forms, charts
  • βœ… End-to-end - Single model, no pipeline complexity

πŸ† Key Achievements

Benchmark Score Rank (Small <1B)
ViDoRe v1 83.56 πŸ₯‡ #1
MTEB v1+v2 74.33 πŸ₯‡ #1
ViDoRe v3 43.32 πŸ₯‡ #1

Small Category Comparison (<1B, 128-dim)

Model Params Dim ViDoRe v1 MTEB v1+v2 ViDoRe v3
SauerkrautLM-ColLFM2-450M-v0.1 ⭐ 450M 128 83.56 74.33 43.32
colSmol-500M 500M 128 82.49 71.17 -
colSmol-256M 256M 128 79.74 66.90 20.73

#1 in ALL benchmarks for small models!

Detailed Benchmark Results

πŸ“Š ViDoRe v1 (NDCG@5) - Click to expand
Task Score
ArxivQA 76.11
DocVQA 59.11
InfoVQA 88.36
ShiftProject 73.14
SyntheticDocQA-AI 98.76
SyntheticDocQA-Energy 94.39
SyntheticDocQA-Gov 94.61
SyntheticDocQA-Health 97.32
TabFQuAD 80.91
TATDQA 72.88
Average 83.56
πŸ“Š MTEB v1+v2 (NDCG@5) - Click to expand

ViDoRe v1 Tasks:

Task Score
ArxivQA 76.11
DocVQA 59.11
InfoVQA 88.36
ShiftProject 73.14
SyntheticDocQA-AI 98.76
SyntheticDocQA-Energy 94.39
SyntheticDocQA-Gov 94.61
SyntheticDocQA-Health 97.32
TabFQuAD 80.91
TATDQA 72.88

ViDoRe v2 Tasks (Multilingual):

Task Score
ViDoRe-v2-2BioMed 51.00
ViDoRe-v2-2Econ 48.35
ViDoRe-v2-2ESG-HL 54.87
ViDoRe-v2-2ESG 50.80
Combined Average 74.33
πŸ“Š ViDoRe v3 (NDCG@10) - Click to expand
Task Score
ViDoRe-v3-CS 58.08
ViDoRe-v3-Energy 47.92
ViDoRe-v3-FinanceEn 47.72
ViDoRe-v3-FinanceFr 33.00
ViDoRe-v3-HR 43.37
ViDoRe-v3-Industry 30.21
ViDoRe-v3-Pharma 51.42
ViDoRe-v3-Physics 34.83
Average 43.32

Efficiency Comparison

Metric ColLFM2-450M colSmol-500M Advantage
Parameters 450M 500M -10%
ViDoRe v1 83.56 82.49 +1.07
MTEB v1+v2 74.33 71.17 +3.16

πŸ“‹ Summary Tables

128-dim Models Comparison

128-dim Models Summary

Comparison vs High-dim Models

High-dim Comparison

✨ Key Features

  • πŸ† Best Small Model: #1 in ALL benchmarks for <1B models
  • ⚑ Ultra Efficient: Only 450M parameters, ~0.9GB VRAM
  • πŸŽ“ Curriculum Learning: Trained with progressive difficulty
  • πŸ”€ Hierarchical Merge: Advanced model merging for optimal performance
  • πŸ“ Native 512x512: Optimized for document resolution
  • 🌍 Multilingual: 6 languages (EN, DE, FR, ES, IT, PT)

Model Details

Property Value
Base Model LiquidAI/LFM2-VL-450M
Parameters 450M
Embedding Dimension 128
VRAM (bfloat16) ~0.9 GB
Max Context Length 32,768 tokens
Image Resolution 512Γ—512 native
Image Tokens 64-256 (dynamic)
Vision Encoder SigLIP2 (86M)
License LFM 1.0

πŸŽ“ Advanced Training Methodology

1. Curriculum Learning

Unlike standard training, ColLFM2 was trained with curriculum learning:

Stage 1: Easy examples (high-quality, clear documents)
    ↓
Stage 2: Medium examples (mixed quality)
    ↓
Stage 3: Hard examples (complex layouts, noisy scans)
    ↓
Stage 4: Full mixture with hard negatives

2. Hierarchical Model Merging

Base LFM2-VL-450M
        ↓
    β”Œβ”€β”€β”€β”΄β”€β”€β”€β”
    ↓       ↓
mMARCO   Retrieval
Specialist  Model
    ↓       ↓
    β””β”€β”€β”€β”¬β”€β”€β”€β”˜
        ↓
  Hierarchical Merge
        ↓
   Final Model
  • mMARCO Specialist: Sub-model trained on mMARCO for retrieval fundamentals
  • Retrieval Model: Trained on document retrieval datasets
  • Hierarchical Merge: Combined using learned merge weights

Hardware & Configuration

Setting Value
GPUs 4x NVIDIA RTX 6000 Ada (48GB)
Effective Batch Size 256
Precision bfloat16
Curriculum Stages 4

Datasets

Dataset Description
vidore/colpali_train_set ColPali training data
openbmb/VisRAG-Ret-Train-In-domain-data Visual RAG training data
llamaindex/vdr-multilingual-train Multilingual retrieval (with curriculum)
unicamp-dl/mmarco mMARCO for specialist model
VAGO Multilingual Datasets Proprietary multilingual data

Installation & Usage

⚠️ Important: Install our package first before loading the model:

pip install git+https://github.com/VAGOsolutions/sauerkrautlm-colpali
import torch
from PIL import Image
from sauerkrautlm_colpali.models import ColLFM2, ColLFM2Processor

model_name = "VAGOsolutions/SauerkrautLM-ColLFM2-450M-v0.1"

model = ColLFM2.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()

processor = ColLFM2Processor.from_pretrained(model_name)

images = [Image.open("document.png")]
queries = ["What is the main topic?"]

batch_images = processor.process_images(images).to(model.device)
batch_queries = processor.process_queries(queries).to(model.device)

with torch.no_grad():
    image_embeddings = model(**batch_images)
    query_embeddings = model(**batch_queries)

scores = processor.score(query_embeddings, image_embeddings)

Use Cases

βœ… Perfect for:

  • Edge deployment (Raspberry Pi, Jetson)
  • Mobile applications
  • High-throughput batch processing
  • Cost-sensitive deployments
  • Real-time retrieval systems

πŸ“Š Additional Benchmark Visualizations

MTEB v1+v2 Benchmark (128-dim Models)

MTEB v1+v2 Benchmark - 128-dim Models

ViDoRe v3 Benchmark (128-dim Models)

ViDoRe v3 Benchmark - 128-dim Models

Our Models vs High-dim Models

ViDoRe v1 - Our Models vs High-dim

License

This model is licensed under the LFM 1.0 License from LiquidAI. Please review the full license before commercial use.

Citation

@misc{sauerkrautlm-colpali-2025,
  title={SauerkrautLM-ColPali: Multi-Vector Vision Retrieval Models},
  author={David Golchinfar},
  organization={VAGO Solutions},
  year={2025},
  url={https://github.com/VAGOsolutions/sauerkrautlm-colpali}
}

Contact

Downloads last month
102
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for VAGOsolutions/SauerkrautLM-ColLFM2-450M-v0.1

Finetuned
(14)
this model

Datasets used to train VAGOsolutions/SauerkrautLM-ColLFM2-450M-v0.1

Space using VAGOsolutions/SauerkrautLM-ColLFM2-450M-v0.1 1

Collection including VAGOsolutions/SauerkrautLM-ColLFM2-450M-v0.1