EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Zhiyuan Chen^* Jiajiong Cao^* Zhiquan Chen Yuming Li Chenguang Ma

*Equal Contribution.

Terminal Technology Department, Alipay, Ant Group.

Model Files

./pretrained_models/
├── denoising_unet.pth
├── reference_unet.pth
├── motion_module.pth
├── face_locator.pth
├── sd-vae-ft-mse
│   └── ...
├── sd-image-variations-diffusers
│   └── ...
└── audio_processor
    └── whisper_tiny.pt

Some models in this hub can be directly downloaded from it's original hub:

sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
sd-image-variations-diffusers
audio_processor

Gallery

Audio Driven (Sing)

Audio Driven (English)

Audio Driven (Chinese)

Landmark Driven

Audio + Selected Landmark Driven

（Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.）

Citation

If you find our work useful for your research, please consider citing the paper:

@misc{chen2024echomimic,
  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
  year={2024},
  eprint={2406.01900},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}