VISReg: Variance-Invariance-Sketching Regularization for JEPA training

arXiv Project Page GitHub

Key results:

  • 💪 Strong collapse prevention: High gradient when embedding collapse
  • Friendly to scale training: Linear complexity to scaling factors
  • 🧩 Easy to train: Similar to LeJEPA, it is a heuristic-free method
  • 🏆 Best OOD performance: Achieve the best accuracy on 6 OOD datasets
  • 📉 Data efficiency: Achieving a similar average accuracy to DINOv2 with 90% less data
  • 🧬 Robust to low-quality datasets: It is robust to long-tailed and sparse datasets

Available Checkpoints

File Architecture Patch Size Embed Dim Params Pre-training Data
visreg-vit-b-inet1k.pth ViT-Base 16 768 86M ImageNet-1K
visreg-vit-l-inet1k.pth ViT-Large 14 1024 304M ImageNet-1K

Usage

Load with timm

import timm
import torch

# ViT-Base/16
model = timm.create_model("vit_base_patch16_224", pretrained=False, num_classes=0, dynamic_img_size=True)
state_dict = torch.load("visreg-vit-b-inet1k.pth", map_location="cpu")
model.load_state_dict(state_dict)

# ViT-Large/14
model = timm.create_model("vit_large_patch14_224", pretrained=False, num_classes=0, dynamic_img_size=True)
state_dict = torch.load("visreg-vit-l-inet1k.pth", map_location="cpu")
model.load_state_dict(state_dict)

Download with huggingface_hub

from huggingface_hub import hf_hub_download

# ViT-Base/16
path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-b-inet1k.pth")

# ViT-Large/14
path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-l-inet1k.pth")

Feature extraction

from PIL import Image
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

img = transform(Image.open("image.jpg")).unsqueeze(0)

with torch.no_grad():
    features = model(img)  # [1, embed_dim]

Evaluation

Full evaluation suite (linear probe, segmentation, fine-tuning) is available in the GitHub repo:

# Linear probe on 10+ datasets
python downstream/linear_prob/run_evaluation.py \
    --checkpoint visreg-vit-b-inet1k.pth \
    --model vit_b \
    --datasets all

Citation

@inproceedings{wu2026visreg,
  title     = {VISReg: Variance-Invariance-Sketching Regularization for JEPA training},
  author    = {Wu, Haiyu and Balestriero, Randall and Levine, Morgan},
  booktitle = {arXiv},
  year      = {2026}
}

License

This project (code and pretrained weights) is released under CC BY-NC 4.0 for non-commercial use only.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train BooBooWu/visreg

Paper for BooBooWu/visreg