YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis

Ant Group


✨ For more results, visit our Project Page ✨

πŸ“Œ Updates

  • [2025.01.10] πŸ”₯ We release our inference codes and models.
  • [2024.11.29] πŸ”₯ Our paper is in public on arxiv.

πŸ› οΈ Installation

Tested Environment

  • System: Centos 7.2
  • GPU: A100
  • Python: 3.10
  • tensorRT: 8.6.1

Clone the codes from GitHub:

git clone https://github.com/antgroup/ditto-talkinghead
cd ditto-talkinghead

Create conda environment:

conda env create -f environment.yaml
conda activate ditto

πŸ“₯ Download Checkpoints

Download checkpoints from HuggingFace and put them in checkpoints dir:

git lfs install
git clone https://huggingface.co/digital-avatar/ditto-talkinghead checkpoints

The checkpoints should be like:

./checkpoints/
β”œβ”€β”€ ditto_cfg
β”‚   β”œβ”€β”€ v0.4_hubert_cfg_trt.pkl
β”‚   └── v0.4_hubert_cfg_trt_online.pkl
β”œβ”€β”€ ditto_onnx
β”‚   β”œβ”€β”€ appearance_extractor.onnx
β”‚   β”œβ”€β”€ blaze_face.onnx
β”‚   β”œβ”€β”€ decoder.onnx
β”‚   β”œβ”€β”€ face_mesh.onnx
β”‚   β”œβ”€β”€ hubert.onnx
β”‚   β”œβ”€β”€ insightface_det.onnx
β”‚   β”œβ”€β”€ landmark106.onnx
β”‚   β”œβ”€β”€ landmark203.onnx
β”‚   β”œβ”€β”€ libgrid_sample_3d_plugin.so
β”‚   β”œβ”€β”€ lmdm_v0.4_hubert.onnx
β”‚   β”œβ”€β”€ motion_extractor.onnx
β”‚   β”œβ”€β”€ stitch_network.onnx
β”‚   └── warp_network.onnx
└── ditto_trt_Ampere_Plus
    β”œβ”€β”€ appearance_extractor_fp16.engine
    β”œβ”€β”€ blaze_face_fp16.engine
    β”œβ”€β”€ decoder_fp16.engine
    β”œβ”€β”€ face_mesh_fp16.engine
    β”œβ”€β”€ hubert_fp32.engine
    β”œβ”€β”€ insightface_det_fp16.engine
    β”œβ”€β”€ landmark106_fp16.engine
    β”œβ”€β”€ landmark203_fp16.engine
    β”œβ”€β”€ lmdm_v0.4_hubert_fp32.engine
    β”œβ”€β”€ motion_extractor_fp32.engine
    β”œβ”€β”€ stitch_network_fp16.engine
    └── warp_network_fp16.engine
  • The ditto_cfg/v0.4_hubert_cfg_trt_online.pkl is online config
  • The ditto_cfg/v0.4_hubert_cfg_trt.pkl is offline config

πŸš€ Inference

Run inference.py:

python inference.py \
    --data_root "<path-to-trt-model>" \
    --cfg_pkl "<path-to-cfg-pkl>" \
    --audio_path "<path-to-input-audio>" \
    --source_path "<path-to-input-image>" \
    --output_path "<path-to-output-mp4>" 

For example:

python inference.py \
    --data_root "./checkpoints/ditto_trt_Ampere_Plus" \
    --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl" \
    --audio_path "./example/audio.wav" \
    --source_path "./example/image.png" \
    --output_path "./tmp/result.mp4" 

❗Note:

We have provided the tensorRT model with hardware-compatibility-level=Ampere_Plus (checkpoints/ditto_trt_Ampere_Plus/). If your GPU does not support it, please execute the cvt_onnx_to_trt.py script to convert from the general onnx model (checkpoints/ditto_onnx/) to the tensorRT model.

python script/cvt_onnx_to_trt.py --onnx_dir "./checkpoints/ditto_onnx" --trt_dir "./checkpoints/ditto_trt_custom"

Then run inference.py with --data_root=./checkpoints/ditto_trt_custom.

πŸ“§ Acknowledgement

Our implementation is based on S2G-MDDiffusion and LivePortrait. Thanks for their remarkable contribution and released code! If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.

βš–οΈ License

This repository is released under the Apache-2.0 license as found in the LICENSE file.

πŸ“š Citation

If you find this codebase useful for your research, please use the following entry.

@article{li2024ditto,
    title={Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis},
    author={Li, Tianqi and Zheng, Ruobing and Yang, Minghui and Chen, Jingdong and Yang, Ming},
    journal={arXiv preprint arXiv:2411.19509},
    year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.