Diffusers
Safetensors
lymhust commited on
Commit
76da937
β€’
1 Parent(s): 7af0b72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +184 -45
README.md CHANGED
@@ -1,13 +1,10 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
  <h1 align='center'>EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning</h1>
5
 
6
  <div align='center'>
7
  <a href='https://github.com/yuange250' target='_blank'>Zhiyuan Chen</a><sup>*</sup>&emsp;
8
  <a href='https://github.com/JoeFannie' target='_blank'>Jiajiong Cao</a><sup>*</sup>&emsp;
9
  <a href='https://github.com/octavianChen' target='_blank'>Zhiquan Chen</a><sup></sup>&emsp;
10
- <a href='https://github.com/lymhust' target='_blank'>Yuming Li</a><sup></sup>&emsp;
11
  <a href='https://github.com/' target='_blank'>Chenguang Ma</a><sup></sup>
12
  </div>
13
  <div align='center'>
@@ -17,48 +14,48 @@ license: apache-2.0
17
  <div align='center'>
18
  Terminal Technology Department, Alipay, Ant Group.
19
  </div>
20
-
21
  <div align='center'>
22
- <a href='https://badtobest.github.io/echomimic.html'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
23
- <a href='https://github.com/BadToBest/EchoMimic'><img src='https://img.shields.io/github/stars/BadToBest/EchoMimic'></a>
24
  <a href='https://huggingface.co/spaces/BadToBest/EchoMimic'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Demo-yellow'></a>
 
 
 
25
  </div>
26
 
27
- ## Model Files
28
-
29
- ```
30
- ./pretrained_models/
31
- β”œβ”€β”€ denoising_unet.pth
32
- β”œβ”€β”€ reference_unet.pth
33
- β”œβ”€β”€ motion_module.pth
34
- β”œβ”€β”€ face_locator.pth
35
- β”œβ”€β”€ sd-vae-ft-mse
36
- β”‚ └── ...
37
- β”œβ”€β”€ sd-image-variations-diffusers
38
- β”‚ └── ...
39
- └── audio_processor
40
- └── whisper_tiny.pt
41
- ```
42
 
43
- Some models in this hub can be directly downloaded from it's original hub:
44
- - [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to [stablilityai](https://huggingface.co/stabilityai)_)
45
- - [sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)
46
- - [audio_processor](https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt)
 
 
 
 
 
 
 
 
 
47
 
48
- ## Gallery
49
  ### Audio Driven (Sing)
50
 
51
  <table class="center">
52
 
53
  <tr>
54
  <td width=30% style="border: none">
55
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/d014d921-9f94-4640-97ad-035b00effbfe" muted="false"></video>
56
  </td>
57
  <td width=30% style="border: none">
58
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/877603a5-a4f9-4486-a19f-8888422daf78" muted="false"></video>
59
  </td>
60
  <td width=30% style="border: none">
61
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/e0cb5afb-40a6-4365-84f8-cb2834c4cfe7" muted="false"></video>
62
  </td>
63
  </tr>
64
 
@@ -70,13 +67,13 @@ Some models in this hub can be directly downloaded from it's original hub:
70
 
71
  <tr>
72
  <td width=30% style="border: none">
73
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/386982cd-3ff8-470d-a6d9-b621e112f8a5" muted="false"></video>
74
  </td>
75
  <td width=30% style="border: none">
76
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/5c60bb91-1776-434e-a720-8857a00b1501" muted="false"></video>
77
  </td>
78
  <td width=30% style="border: none">
79
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/1f15adc5-0f33-4afa-b96a-2011886a4a06" muted="false"></video>
80
  </td>
81
  </tr>
82
 
@@ -88,13 +85,13 @@ Some models in this hub can be directly downloaded from it's original hub:
88
 
89
  <tr>
90
  <td width=30% style="border: none">
91
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/a8092f9a-a5dc-4cd6-95be-1831afaccf00" muted="false"></video>
92
  </td>
93
  <td width=30% style="border: none">
94
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/c8b5c59f-0483-42ef-b3ee-4cffae6c7a52" muted="false"></video>
95
  </td>
96
  <td width=30% style="border: none">
97
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/532a3e60-2bac-4039-a06c-ff6bf06cb4a4" muted="false"></video>
98
  </td>
99
  </tr>
100
 
@@ -106,13 +103,13 @@ Some models in this hub can be directly downloaded from it's original hub:
106
 
107
  <tr>
108
  <td width=30% style="border: none">
109
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/1da6c46f-4532-4375-a0dc-0a4d6fd30a39" muted="false"></video>
110
  </td>
111
  <td width=30% style="border: none">
112
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/d4f4d5c1-e228-463a-b383-27fb90ed6172" muted="false"></video>
113
  </td>
114
  <td width=30% style="border: none">
115
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/18bd2c93-319e-4d1c-8255-3f02ba717475" muted="false"></video>
116
  </td>
117
  </tr>
118
 
@@ -124,13 +121,13 @@ Some models in this hub can be directly downloaded from it's original hub:
124
 
125
  <tr>
126
  <td width=30% style="border: none">
127
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/4a29d735-ec1b-474d-b843-3ff0bdf85f55" muted="false"></video>
128
  </td>
129
  <td width=30% style="border: none">
130
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/b994c8f5-8dae-4dd8-870f-962b50dc091f" muted="false"></video>
131
  </td>
132
  <td width=30% style="border: none">
133
- <video controls autoplay loop src="https://github.com/BadToBest/EchoMimic/assets/11451501/955c1d51-07b2-494d-ab93-895b9c43b896" muted="false"></video>
134
  </td>
135
  </tr>
136
 
@@ -138,16 +135,158 @@ Some models in this hub can be directly downloaded from it's original hub:
138
 
139
  **(Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.οΌ‰**
140
 
141
- ## Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
- If you find our work useful for your research, please consider citing the paper:
 
 
 
 
 
 
144
 
145
  ```
146
  @misc{chen2024echomimic,
147
  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
148
  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
149
  year={2024},
 
150
  archivePrefix={arXiv},
151
  primaryClass={cs.CV}
152
  }
153
- ```
 
 
 
 
 
 
 
1
  <h1 align='center'>EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning</h1>
2
 
3
  <div align='center'>
4
  <a href='https://github.com/yuange250' target='_blank'>Zhiyuan Chen</a><sup>*</sup>&emsp;
5
  <a href='https://github.com/JoeFannie' target='_blank'>Jiajiong Cao</a><sup>*</sup>&emsp;
6
  <a href='https://github.com/octavianChen' target='_blank'>Zhiquan Chen</a><sup></sup>&emsp;
7
+ <a href='https://lymhust.github.io/' target='_blank'>Yuming Li</a><sup></sup>&emsp;
8
  <a href='https://github.com/' target='_blank'>Chenguang Ma</a><sup></sup>
9
  </div>
10
  <div align='center'>
 
14
  <div align='center'>
15
  Terminal Technology Department, Alipay, Ant Group.
16
  </div>
17
+ <br>
18
  <div align='center'>
19
+ <a href='https://antgroup.github.io/ai/echomimic/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
20
+ <a href='https://huggingface.co/BadToBest/EchoMimic'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
21
  <a href='https://huggingface.co/spaces/BadToBest/EchoMimic'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Demo-yellow'></a>
22
+ <a href='https://www.modelscope.cn/models/BadToBest/EchoMimic'><img src='https://img.shields.io/badge/ModelScope-Model-purple'></a>
23
+ <a href='https://www.modelscope.cn/studios/BadToBest/BadToBest'><img src='https://img.shields.io/badge/ModelScope-Demo-purple'></a>
24
+ <a href='https://arxiv.org/abs/2407.08136'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
25
  </div>
26
 
27
+ ## &#x1F680; EchoMimic Series
28
+ * EchoMimicV1: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning. [GitHub](https://github.com/antgroup/echomimic)
29
+ * EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. [GitHub](https://github.com/antgroup/echomimic_v2)
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
+ ## &#x1F4E3; Updates
32
+ * [2024.11.21] πŸ”₯πŸ”₯πŸ”₯ We release our [EchoMimicV2](https://github.com/antgroup/echomimic_v2) codes and models.
33
+ * [2024.08.02] πŸ”₯ EchoMimic is now available on [huggingface](https://huggingface.co/spaces/BadToBest/EchoMimic) with A100 GPU. Thanks Wenmeng Zhou@ModelScope.
34
+ * [2024.07.25] πŸ”₯πŸ”₯πŸ”₯ Accelerated models and pipe on **Audio Driven** are released. The inference speed can be improved by **10x** (from ~7mins/240frames to ~50s/240frames on V100 GPU)
35
+ * [2024.07.23] πŸ”₯ EchoMimic gradio demo on [modelscope](https://www.modelscope.cn/studios/BadToBest/BadToBest) is ready.
36
+ * [2024.07.23] πŸ”₯ EchoMimic gradio demo on [huggingface](https://huggingface.co/spaces/fffiloni/EchoMimic) is ready. Thanks Sylvain Filoni@fffiloni.
37
+ * [2024.07.17] πŸ”₯πŸ”₯πŸ”₯ Accelerated models and pipe on **Audio + Selected Landmarks** are released. The inference speed can be improved by **10x** (from ~7mins/240frames to ~50s/240frames on V100 GPU)
38
+ * [2024.07.14] πŸ”₯ [ComfyUI](https://github.com/smthemex/ComfyUI_EchoMimic) is now available. Thanks @smthemex for the contribution.
39
+ * [2024.07.13] πŸ”₯ Thanks [NewGenAI](https://www.youtube.com/@StableAIHub) for the [video installation tutorial](https://www.youtube.com/watch?v=8R0lTIY7tfI).
40
+ * [2024.07.13] πŸ”₯ We release our pose&audio driven codes and models.
41
+ * [2024.07.12] πŸ”₯ WebUI and GradioUI versions are released. We thank @greengerong @Robin021 and @O-O1024 for their contributions.
42
+ * [2024.07.12] πŸ”₯ Our [paper](https://arxiv.org/abs/2407.08136) is in public on arxiv.
43
+ * [2024.07.09] πŸ”₯ We release our audio driven codes and models.
44
 
45
+ ## &#x1F305; Gallery
46
  ### Audio Driven (Sing)
47
 
48
  <table class="center">
49
 
50
  <tr>
51
  <td width=30% style="border: none">
52
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/d014d921-9f94-4640-97ad-035b00effbfe" muted="false"></video>
53
  </td>
54
  <td width=30% style="border: none">
55
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/877603a5-a4f9-4486-a19f-8888422daf78" muted="false"></video>
56
  </td>
57
  <td width=30% style="border: none">
58
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/e0cb5afb-40a6-4365-84f8-cb2834c4cfe7" muted="false"></video>
59
  </td>
60
  </tr>
61
 
 
67
 
68
  <tr>
69
  <td width=30% style="border: none">
70
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/386982cd-3ff8-470d-a6d9-b621e112f8a5" muted="false"></video>
71
  </td>
72
  <td width=30% style="border: none">
73
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/5c60bb91-1776-434e-a720-8857a00b1501" muted="false"></video>
74
  </td>
75
  <td width=30% style="border: none">
76
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/1f15adc5-0f33-4afa-b96a-2011886a4a06" muted="false"></video>
77
  </td>
78
  </tr>
79
 
 
85
 
86
  <tr>
87
  <td width=30% style="border: none">
88
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/a8092f9a-a5dc-4cd6-95be-1831afaccf00" muted="false"></video>
89
  </td>
90
  <td width=30% style="border: none">
91
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/c8b5c59f-0483-42ef-b3ee-4cffae6c7a52" muted="false"></video>
92
  </td>
93
  <td width=30% style="border: none">
94
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/532a3e60-2bac-4039-a06c-ff6bf06cb4a4" muted="false"></video>
95
  </td>
96
  </tr>
97
 
 
103
 
104
  <tr>
105
  <td width=30% style="border: none">
106
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/1da6c46f-4532-4375-a0dc-0a4d6fd30a39" muted="false"></video>
107
  </td>
108
  <td width=30% style="border: none">
109
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/d4f4d5c1-e228-463a-b383-27fb90ed6172" muted="false"></video>
110
  </td>
111
  <td width=30% style="border: none">
112
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/18bd2c93-319e-4d1c-8255-3f02ba717475" muted="false"></video>
113
  </td>
114
  </tr>
115
 
 
121
 
122
  <tr>
123
  <td width=30% style="border: none">
124
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/4a29d735-ec1b-474d-b843-3ff0bdf85f55" muted="false"></video>
125
  </td>
126
  <td width=30% style="border: none">
127
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/b994c8f5-8dae-4dd8-870f-962b50dc091f" muted="false"></video>
128
  </td>
129
  <td width=30% style="border: none">
130
+ <video controls loop src="https://github.com/antgroup/echomimic/assets/11451501/955c1d51-07b2-494d-ab93-895b9c43b896" muted="false"></video>
131
  </td>
132
  </tr>
133
 
 
135
 
136
  **(Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.οΌ‰**
137
 
138
+ ## βš’οΈ Installation
139
+
140
+ ### Download the Codes
141
+
142
+ ```bash
143
+ git clone https://github.com/BadToBest/EchoMimic
144
+ cd EchoMimic
145
+ ```
146
+
147
+ ### Python Environment Setup
148
+
149
+ - Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 11.7
150
+ - Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G)
151
+ - Tested Python Version: 3.8 / 3.10 / 3.11
152
+
153
+ Create conda environment (Recommended):
154
+
155
+ ```bash
156
+ conda create -n echomimic python=3.8
157
+ conda activate echomimic
158
+ ```
159
+
160
+ Install packages with `pip`
161
+ ```bash
162
+ pip install -r requirements.txt
163
+ ```
164
+
165
+ ### Download ffmpeg-static
166
+ Download and decompress [ffmpeg-static](https://www.johnvansickle.com/ffmpeg/old-releases/ffmpeg-4.4-amd64-static.tar.xz), then
167
+ ```
168
+ export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static
169
+ ```
170
+
171
+ ### Download pretrained weights
172
+
173
+ ```shell
174
+ git lfs install
175
+ git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights
176
+ ```
177
+
178
+ The **pretrained_weights** is organized as follows.
179
+
180
+ ```
181
+ ./pretrained_weights/
182
+ β”œβ”€β”€ denoising_unet.pth
183
+ β”œβ”€β”€ reference_unet.pth
184
+ β”œβ”€β”€ motion_module.pth
185
+ β”œβ”€β”€ face_locator.pth
186
+ β”œβ”€β”€ sd-vae-ft-mse
187
+ β”‚ └── ...
188
+ β”œβ”€β”€ sd-image-variations-diffusers
189
+ β”‚ └── ...
190
+ └── audio_processor
191
+ └── whisper_tiny.pt
192
+ ```
193
+
194
+ In which **denoising_unet.pth** / **reference_unet.pth** / **motion_module.pth** / **face_locator.pth** are the main checkpoints of **EchoMimic**. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works:
195
+ - [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)
196
+ - [sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)
197
+ - [audio_processor(whisper)](https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt)
198
+
199
+ ### Audio-Drived Algo Inference
200
+ Run the python inference script:
201
+
202
+ ```bash
203
+ python -u infer_audio2vid.py
204
+ python -u infer_audio2vid_pose.py
205
+ ```
206
+
207
+ ### Audio-Drived Algo Inference On Your Own Cases
208
+
209
+ Edit the inference config file **./configs/prompts/animation.yaml**, and add your own case:
210
+
211
+ ```bash
212
+ test_cases:
213
+ "path/to/your/image":
214
+ - "path/to/your/audio"
215
+ ```
216
+
217
+ The run the python inference script:
218
+ ```bash
219
+ python -u infer_audio2vid.py
220
+ ```
221
+
222
+ ### Motion Alignment between Ref. Img. and Driven Vid.
223
+
224
+ (Firstly download the checkpoints with '_pose.pth' postfix from huggingface)
225
+
226
+ Edit driver_video and ref_image to your path in demo_motion_sync.py, then run
227
+ ```bash
228
+ python -u demo_motion_sync.py
229
+ ```
230
+
231
+ ### Audio&Pose-Drived Algo Inference
232
+ Edit ./configs/prompts/animation_pose.yaml, then run
233
+ ```bash
234
+ python -u infer_audio2vid_pose.py
235
+ ```
236
+
237
+ ### Pose-Drived Algo Inference
238
+ Set draw_mouse=True in line 135 of infer_audio2vid_pose.py. Edit ./configs/prompts/animation_pose.yaml, then run
239
+ ```bash
240
+ python -u infer_audio2vid_pose.py
241
+ ```
242
+
243
+ ### Run the Gradio UI
244
+
245
+ Thanks to the contribution from @Robin021:
246
+
247
+ ```bash
248
+
249
+ python -u webgui.py --server_port=3000
250
+
251
+ ```
252
+
253
+ ## πŸ“ Release Plans
254
+
255
+ | Status | Milestone | ETA |
256
+ |:--------:|:-------------------------------------------------------------------------|:--:|
257
+ | βœ… | The inference source code of the Audio-Driven algo meet everyone on GitHub | 9th July, 2024 |
258
+ | βœ… | Pretrained models trained on English and Mandarin Chinese to be released | 9th July, 2024 |
259
+ | βœ… | The inference source code of the Pose-Driven algo meet everyone on GitHub | 13th July, 2024 |
260
+ | βœ… | Pretrained models with better pose control to be released | 13th July, 2024 |
261
+ | βœ… | Accelerated models to be released | 17th July, 2024 |
262
+ | πŸš€ | Pretrained models with better sing performance to be released | TBD |
263
+ | πŸš€ | Large-Scale and High-resolution Chinese-Based Talking Head Dataset | TBD |
264
+
265
+ ## βš–οΈ Disclaimer
266
+ This project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.
267
+
268
+ ## πŸ™πŸ» Acknowledgements
269
+
270
+ We would like to thank the contributors to the [AnimateDiff](https://github.com/guoyww/AnimateDiff), [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone) and [MuseTalk](https://github.com/TMElyralab/MuseTalk) repositories, for their open research and exploration.
271
 
272
+ We are also grateful to [V-Express](https://github.com/tencent-ailab/V-Express) and [hallo](https://github.com/fudan-generative-vision/hallo) for their outstanding work in the area of diffusion-based talking heads.
273
+
274
+ If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
275
+
276
+ ## πŸ“’ Citation
277
+
278
+ If you find our work useful for your research, please consider citing the paper :
279
 
280
  ```
281
  @misc{chen2024echomimic,
282
  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
283
  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
284
  year={2024},
285
+ eprint={2407.08136},
286
  archivePrefix={arXiv},
287
  primaryClass={cs.CV}
288
  }
289
+ ```
290
+
291
+ ## 🌟 Star History
292
+ [![Star History Chart](https://api.star-history.com/svg?repos=antgroup/echomimic&type=Date)](https://star-history.com/#antgroup/echomimic&Date)