Update 2026-05-18 (v1.0): Initial release

DepthVLM-4B

DepthVLM serves as a unified foundation model for both low-level dense geometry prediction and high-level multimodal understanding, while achieving substantially faster inference compared with existing VLM-based approaches such as DepthLM and Youtu-VL.

By attaching a lightweight depth head to the LLM backbone and training under a unified vision-text supervision paradigm, DepthVLM transforms a single VLM into a native dense geometry predictor while preserving its multimodal capability.

Highlights

Native dense metric depth estimation in VLMs: Directly predicts geometry within the VLM framework.
Unified multimodal understanding and geometry prediction: Generates full-resolution depth maps alongside language outputs in a single forward pass.
Efficient Inference: Achieves higher efficiency compared to per-pixel query or coarse token-level outputs.
Versatile Application: Supports both indoor and outdoor metric depth estimation.
Improved 3D spatial reasoning: Moving toward a truly unified foundation model.

Resources

Paper: Unlocking Dense Metric Depth Estimation in VLMs
Project Page: https://depthvlm.github.io/
Repository: https://github.com/hanxunyu/DepthVLM

Usage

Please refer to the official repository for detailed instructions on:

Data preprocessing
Training
Evaluation
Inference and visualization

Citation

If you find this work useful, please cite:

@article{yu2026unlocking,
  title={Unlocking Dense Metric Depth Estimation in VLMs},
  author={Hanxun Yu and Xuan Qu and Yuxin Wang and Jianke Zhu and Lei Ke},
  journal={arXiv preprint arXiv:2605.15876},
  year={2026}
}

Downloads last month: 113

Safetensors

Model size

5B params

Tensor type

BF16

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JonnyYu828/DepthVLM-4B

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(271)

this model

Paper for JonnyYu828/DepthVLM-4B

Unlocking Dense Metric Depth Estimation in VLMs

Paper • 2605.15876 • Published 5 days ago • 9