metadata

license: other
license_name: sv3d-nc-community
license_link: LICENSE
datasets:
  - allenai/objaverse
pipeline_tag: image-to-video
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License
  Agreement](https://huggingface.co/stabilityai/sv3d/blob/main/LICENSE.md) and
  acknowledge Stability AI's [Privacy
  Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
  Name: text
  Email: text
  Country: country
  Organization or Affiliation: text
  Receive email updates and promotions on Stability AI products, services, and research?:
    type: select
    options:
      - 'Yes'
      - 'No'

SV3D-diffusers

This repo (https://github.com/chenguolin/sv3d-diffusers) provides scripts about:

Spatio-temporal UNet (SV3DUNetSpatioTemporalConditionModel) and pipeline (StableVideo3DDiffusionPipeline) modified from SVD for SV3D in the diffusers convention.
Converting the Stability-AI's SV3D-p UNet checkpoint to the diffusers convention.
Infering the SV3D-p model with the diffusers library to synthesize a 21-frame orbital video around a 3D object from a single-view image (preprocessed by removing background and centering first).

Converted SV3D-p checkpoints have been uploaded to HuggingFace🤗 chenguolin/sv3d-diffusers.

🚀 Usage

git clone https://github.com/chenguolin/sv3d-diffusers.git
# Please install PyTorch first according to your CUDA version
pip3 install -r requirements.txt
# If you can't access to HuggingFace🤗, try:
# export HF_ENDPOINT=https://hf-mirror.com
python3 infer.py --output_dir out/ --image_path assets/images/sculpture.png --elevation 10 --half_precision --seed -1

The synthesized video will save at out/ as a .gif file.

📸 Results

Image preprocessing and random seed for different implementations are different, so the results are presented only for reference.

Implementation	sculpture	bag	kunkun
SV3D-diffusers (Ours)
Official SV3D

📚 Citation

If you find this repo helpful, please consider giving this repository a star 🌟 and citing the original SV3D paper.

@inproceedings{voleti2024sv3d,
   author={Voleti, Vikram and Yao, Chun-Han and Boss, Mark and Letts, Adam and Pankratz, David and Tochilkin,  Dmitrii and Laforte, Christian and Rombach, Robin and Jampani, Varun},
   title={{SV3D}: Novel Multi-view Synthesis and {3D} Generation from a Single Image using Latent Video Diffusion},
   booktitle={European Conference on Computer Vision (ECCV)},
   year={2024},
}