File size: 5,916 Bytes
9831173 982a57f 9831173 9da016e 9831173 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
<h1 align='center'>EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning</h1>
<div align='center'>
<a href='https://github.com/yuange250' target='_blank'>Zhiyuan Chen</a><sup>*</sup> 
<a href='https://github.com/JoeFannie' target='_blank'>Jiajiong Cao</a><sup>*</sup> 
<a href='https://github.com/octavianChen' target='_blank'>Zhiquan Chen</a><sup></sup> 
<a href='https://github.com/lymhust' target='_blank'>Yuming Li</a><sup></sup> 
<a href='https://github.com/' target='_blank'>Chenguang Ma</a><sup></sup>
</div>
<div align='center'>
*Equal Contribution.
</div>
<div align='center'>
Terminal Technology Department, Alipay, Ant Group.
</div>
<div align='center'>
<a href='https://badtobest.github.io/echomimic.html'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
<a href='https://github.com/BadToBest/EchoMimic'><img src='https://img.shields.io/github/stars/BadToBest/EchoMimic'></a>
</div>
## Model Files
```
./pretrained_models/
βββ denoising_unet.pth
βββ reference_unet.pth
βββ motion_module.pth
βββ face_locator.pth
βββ sd-vae-ft-mse
β βββ ...
βββ sd-image-variations-diffusers
β βββ ...
βββ audio_processor
βββ whisper_tiny.pt
```
Some models in this hub can be directly downloaded from it's original hub:
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to [stablilityai](https://huggingface.co/stabilityai)_)
- [sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)
- [audio_processor](https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt)
## Gallery
### Audio Driven (Sing)
<table class="center">
<tr>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/d014d921-9f94-4640-97ad-035b00effbfe" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/877603a5-a4f9-4486-a19f-8888422daf78" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/e0cb5afb-40a6-4365-84f8-cb2834c4cfe7" muted="false"></video>
</td>
</tr>
</table>
### Audio Driven (English)
<table class="center">
<tr>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/386982cd-3ff8-470d-a6d9-b621e112f8a5" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/5c60bb91-1776-434e-a720-8857a00b1501" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/1f15adc5-0f33-4afa-b96a-2011886a4a06" muted="false"></video>
</td>
</tr>
</table>
### Audio Driven (Chinese)
<table class="center">
<tr>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/a8092f9a-a5dc-4cd6-95be-1831afaccf00" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/c8b5c59f-0483-42ef-b3ee-4cffae6c7a52" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/532a3e60-2bac-4039-a06c-ff6bf06cb4a4" muted="false"></video>
</td>
</tr>
</table>
### Landmark Driven
<table class="center">
<tr>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/1da6c46f-4532-4375-a0dc-0a4d6fd30a39" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/d4f4d5c1-e228-463a-b383-27fb90ed6172" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/18bd2c93-319e-4d1c-8255-3f02ba717475" muted="false"></video>
</td>
</tr>
</table>
### Audio + Selected Landmark Driven
<table class="center">
<tr>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/4a29d735-ec1b-474d-b843-3ff0bdf85f55" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/b994c8f5-8dae-4dd8-870f-962b50dc091f" muted="false"></video>
</td>
<td width=30% style="border: none">
<video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/955c1d51-07b2-494d-ab93-895b9c43b896" muted="false"></video>
</td>
</tr>
</table>
**οΌSome demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.οΌ**
## Citation
If you find our work useful for your research, please consider citing the paper:
```
@misc{chen2024echomimic,
title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
year={2024},
eprint={2406.01900},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |