depth-anything-v1-vs-v2

Running on Zero

App Files Files Community

depth-anything-v1-vs-v2 / README_Github.md

LiheYoung

Add Github repository content

9829def verified 6 months ago

preview code

raw

history blame

5.57 kB

	<div align="center">
	<h1>Depth Anything V2</h1>

	[Lihe Yang](https://liheyoung.github.io/)<sup>1</sup> · [Bingyi Kang](https://bingykang.github.io/)<sup>2&dagger;</sup> · [Zilong Huang](http://speedinghzl.github.io/)<sup>2</sup>
	<br>
	[Zhen Zhao](http://zhaozhen.me/) · [Xiaogang Xu](https://xiaogang00.github.io/) · [Jiashi Feng](https://sites.google.com/site/jshfeng/)<sup>2</sup> · [Hengshuang Zhao](https://hszhao.github.io/)<sup>1*</sup>

	<sup>1</sup>HKU&emsp;&emsp;&emsp;<sup>2</sup>TikTok
	<br>
	&dagger;project lead&emsp;*corresponding author

	<a href="https://arxiv.org/abs/2406.09414"><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a>
	<a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a>
	<a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
	<a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-yellow' alt='Benchmark'></a>
	</div>

	This work presents Depth Anything V2. It significantly outperforms [V1](https://github.com/LiheYoung/Depth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.

	![teaser](assets/teaser.png)

	## News

	- 2024-06-14: Paper, project page, code, models, demo, and benchmark are all released.


	## Pre-trained Models

	We provide four models of varying scales for robust relative depth estimation:

	\| Model \| Params \| Checkpoint \|
	\|:-\|-:\|:-:\|
	\| Depth-Anything-V2-Small \| 24.8M \| [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) \|
	\| Depth-Anything-V2-Base \| 97.5M \| [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) \|
	\| Depth-Anything-V2-Large \| 335.3M \| [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) \|
	\| Depth-Anything-V2-Giant \| 1.3B \| Coming soon \|


	### Code snippet to use our models
	```python
	import cv2
	import torch

	from depth_anything_v2.dpt import DepthAnythingV2

	# take depth-anything-v2-large as an example
	model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
	model.load_state_dict(torch.load('checkpoints/depth_anything_v2_vitl.pth', map_location='cpu'))
	model.eval()

	raw_img = cv2.imread('your/image/path')
	depth = model.infer_image(raw_img) # HxW raw depth map
	```

	## Usage

	### Installation

	```bash
	git clone https://github.com/DepthAnything/Depth-Anything-V2
	cd Depth-Anything-V2
	pip install -r requirements.txt
	```

	### Running

	```bash
	python run.py --encoder <vits \| vitb \| vitl \| vitg> --img-path <path> --outdir <outdir> [--input-size <size>] [--pred-only] [--grayscale]
	```
	Options:
	- `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
	- `--input-size` (optional): By default, we use input size `518` for model inference. You can increase the size for even more fine-grained results.
	- `--pred-only` (optional): Only save the predicted depth map, without raw image.
	- `--grayscale` (optional): Save the grayscale depth map, without applying color palette.

	For example:
	```bash
	python run.py --encoder vitl --img-path assets/examples --outdir depth_vis
	```

	If you want to use Depth Anything V2 on videos:

	```bash
	python run_video.py --encoder vitl --video-path assets/examples_video --outdir video_depth_vis
	```

	Please note that our larger model has better temporal consistency on videos.


	### Gradio demo

	To use our gradio demo locally:

	```bash
	python app.py
	```

	You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2).

	Note: Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)). In V1, we unintentionally used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https://github.com/DepthAnything/Depth-Anything-V2/blob/2cbc36a8ce2cec41d38ee51153f112e87c8e42d8/depth_anything_v2/dpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.



	## Fine-tuned to Metric Depth Estimation

	Please refer to [metric depth estimation](./metric_depth).


	## DA-2K Evaluation Benchmark

	Please refer to [DA-2K benchmark](./DA-2K.md).

	## LICENSE

	Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.


	## Citation

	If you find this project useful, please consider citing:

	```bibtex
	@article{depth_anything_v2,
	title={Depth Anything V2},
	author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
	journal={arXiv:2406.09414},
	year={2024}
	}

	@inproceedings{depth_anything_v1,
	title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
	author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
	booktitle={CVPR},
	year={2024}
	}
	```