Spaces:
Running
on
Zero
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
![](https://depthcrafter.github.io/img/logo.png)
Wenbo Hu1* β ,
Xiangjun Gao2*,
Xiaoyu Li1* β ,
Sijie Zhao1,
Xiaodong Cun1,
Yong Zhang1,
Long Quan2,
Ying Shan3, 1
1Tencent AI Lab
2The Hong Kong University of Science and Technology
3ARC Lab, Tencent PCG
arXiv preprint, 2024
π Introduction
- [24-9-28] Add full dataset inference and evaluation scripts for better comparison use. :-)
- [24-9-25] π€π€π€ Add huggingface online demo DepthCrafter.
- [24-9-19] Add scripts for preparing benchmark datasets.
- [24-9-18] Add point cloud sequence visualization.
- [24-9-14] π₯π₯π₯ DepthCrafter is released now, have fun!
π€ DepthCrafter can generate temporally consistent long-depth sequences with fine-grained details for open-world videos, without requiring additional information such as camera poses or optical flow.
π₯ Visualization
We provide some demos of unprojected point cloud sequences, with reference RGB and estimated depth videos. Please refer to our project page for more details.
https://github.com/user-attachments/assets/62141cc8-04d0-458f-9558-fe50bc04cc21
π Quick Start
π€ Gradio Demo
- Online demo: DepthCrafter
- Local demo:
gradio app.py
π Community Support
- NukeDepthCrafter: a plugin allows you to generate temporally consistent Depth sequences inside Nuke, which is widely used in the VFX industry.
π οΈ Installation
- Clone this repo:
git clone https://github.com/Tencent/DepthCrafter.git
- Install dependencies (please refer to requirements.txt):
pip install -r requirements.txt
π€ Model Zoo
DepthCrafter is available in the Hugging Face Model Hub.
πββοΈ Inference
1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution:
Full inference (~0.6 fps on A100, recommended for high-quality results):
python run.py --video-path examples/example_01.mp4
Fast inference through 4-step denoising and without classifier-free guidance οΌ~2.3 fps on A100οΌ:
python run.py --video-path examples/example_01.mp4 --num-inference-steps 4 --guidance-scale 1.0
2. Low-resolution inference requires a GPU with ~9GB memory for 512x256 resolution:
Full inference (~2.3 fps on A100):
python run.py --video-path examples/example_01.mp4 --max-res 512
Fast inference through 4-step denoising and without classifier-free guidance (~9.4 fps on A100):
python run.py --video-path examples/example_01.mp4 --max-res 512 --num-inference-steps 4 --guidance-scale 1.0
π Dataset Evaluation
Please check the benchmark
folder.
- To create the dataset we use in the paper, you need to run
dataset_extract/dataset_extract_${dataset_name}.py
. - Then you will get the
csv
files that save the relative root of extracted RGB video and depth npz files. We also provide these csv files. - Inference for all datasets scripts:
(Remember to replace thebash benchmark/infer/infer.sh
input_rgb_root
andsaved_root
with your own path.) - Evaluation for all datasets scripts:
(Remember to replace thebash benchmark/eval/eval.sh
pred_disp_root
andgt_disp_root
with your own path.)
π€ Contributing
- Welcome to open issues and pull requests.
- Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.
π Citation
If you find this work helpful, please consider citing:
@article{hu2024-DepthCrafter,
author = {Hu, Wenbo and Gao, Xiangjun and Li, Xiaoyu and Zhao, Sijie and Cun, Xiaodong and Zhang, Yong and Quan, Long and Shan, Ying},
title = {DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos},
journal = {arXiv preprint arXiv:2409.02095},
year = {2024}
}