metadata

license: mit
tags:
  - vision
pipeline_tag: depth-estimation

ZoeDepth (fine-tuned on KITT)

ZoeDepth model fine-tuned on the KITTI dataset. It was introduced in the paper ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth by Shariq et al. and first released in this repository.

ZoeDepth extends the DPT framework for metric (also called absolute) depth estimation, obtaining state-of-the-art results.

Disclaimer: The team releasing ZoeDepth did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description

ZoeDepth adapts DPT, a model for relative depth estimation, for so-called metric (also called absolute) depth estimation.

This means that the model is able to estimate depth in actual metric values.

drawing

ZoeDepth architecture. Taken from the original paper.

Intended uses & limitations

You can use the raw model for tasks like zero-shot monocular depth estimation. See the model hub to look for other versions on a task that interests you.

How to use

The easiest is to leverage the pipeline API which abstracts away the complexity for the user:

from transformers import pipeline
from PIL import Image
import requests

# load pipe
depth_estimator = pipeline(task="depth-estimation", model="Intel/zoedepth-kitti")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
outputs = depth_estimator(image)
depth = outputs.depth

For more code examples, we refer to the documentation.

BibTeX entry and citation info

@misc{bhat2023zoedepth,
      title={ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth}, 
      author={Shariq Farooq Bhat and Reiner Birkl and Diana Wofk and Peter Wonka and Matthias Müller},
      year={2023},
      eprint={2302.12288},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}