bubbliiiing
commited on
Commit
·
5dfb363
1
Parent(s):
f625293
Update Readme
Browse files- README.md +88 -32
- README_en.md +93 -18
README.md
CHANGED
|
@@ -30,21 +30,6 @@ tasks:
|
|
| 30 |
#- vllm
|
| 31 |
---
|
| 32 |
|
| 33 |
-
# EasyAnimate | 高分辨率长视频生成的端到端解决方案
|
| 34 |
-
😊 EasyAnimate是一个用于生成高分辨率和长视频的端到端解决方案。我们可以训练基于转换器的扩散生成器,训练用于处理长视频的VAE,以及预处理元数据。
|
| 35 |
-
|
| 36 |
-
😊 我们基于DIT,使用transformer进行作为扩散器进行视频与图片生成。
|
| 37 |
-
|
| 38 |
-
😊 Welcome!
|
| 39 |
-
|
| 40 |
-
[](https://arxiv.org/abs/2405.18991)
|
| 41 |
-
[](https://easyanimate.github.io/)
|
| 42 |
-
[](https://modelscope.cn/studios/PAI/EasyAnimate/summary)
|
| 43 |
-
[](https://huggingface.co/spaces/alibaba-pai/EasyAnimate)
|
| 44 |
-
[](https://discord.gg/UzkpB4Bn)
|
| 45 |
-
|
| 46 |
-
[English](./README.md) | 简体中文
|
| 47 |
-
|
| 48 |
# 目录
|
| 49 |
- [目录](#目录)
|
| 50 |
- [简介](#简介)
|
|
@@ -143,6 +128,39 @@ Linux 的详细信息:
|
|
| 143 |
|
| 144 |
我们需要大约 60GB 的可用磁盘空间,请检查!
|
| 145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
#### b. 权重放置
|
| 147 |
我们最好将[权重](#model-zoo)按照指定路径进行放置:
|
| 148 |
|
|
@@ -161,8 +179,7 @@ EasyAnimateV5:
|
|
| 161 |
|
| 162 |
### EasyAnimateV5-12b-zh-InP
|
| 163 |
|
| 164 |
-
|
| 165 |
-
|
| 166 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 167 |
<tr>
|
| 168 |
<td>
|
|
@@ -181,8 +198,6 @@ Resolution-1024
|
|
| 181 |
</table>
|
| 182 |
|
| 183 |
|
| 184 |
-
Resolution-768
|
| 185 |
-
|
| 186 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 187 |
<tr>
|
| 188 |
<td>
|
|
@@ -200,8 +215,6 @@ Resolution-768
|
|
| 200 |
</tr>
|
| 201 |
</table>
|
| 202 |
|
| 203 |
-
Resolution-512
|
| 204 |
-
|
| 205 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 206 |
<tr>
|
| 207 |
<td>
|
|
@@ -219,6 +232,41 @@ Resolution-512
|
|
| 219 |
</tr>
|
| 220 |
</table>
|
| 221 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 222 |
### EasyAnimateV5-12b-zh-Control
|
| 223 |
|
| 224 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
|
@@ -364,6 +412,13 @@ sh scripts/train.sh
|
|
| 364 |
# 模型地址
|
| 365 |
EasyAnimateV5:
|
| 366 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 367 |
| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 368 |
|--|--|--|--|--|--|
|
| 369 |
| EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持中文与英文双语预测 |
|
|
@@ -373,29 +428,29 @@ EasyAnimateV5:
|
|
| 373 |
<details>
|
| 374 |
<summary>(Obsolete) EasyAnimateV4:</summary>
|
| 375 |
|
| 376 |
-
| 名称 | 种类 | 存储空间 |
|
| 377 |
|--|--|--|--|--|--|
|
| 378 |
-
| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | 解压前 8.9 GB / 解压后 14.0 GB | [
|
| 379 |
</details>
|
| 380 |
|
| 381 |
<details>
|
| 382 |
<summary>(Obsolete) EasyAnimateV3:</summary>
|
| 383 |
|
| 384 |
-
| 名称 | 种类 | 存储空间 |
|
| 385 |
|--|--|--|--|--|--|
|
| 386 |
-
| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB
|
| 387 |
-
| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [
|
| 388 |
-
| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [
|
| 389 |
</details>
|
| 390 |
|
| 391 |
<details>
|
| 392 |
<summary>(Obsolete) EasyAnimateV2:</summary>
|
| 393 |
|
| 394 |
-
| 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | 描述 |
|
| 395 |
-
|
| 396 |
-
| EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | [
|
| 397 |
-
| EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | [
|
| 398 |
-
| easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | 使用特定类型的图像进行lora训练的结果。图片可从这里[下载](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/webui/Minimalism.zip). |
|
| 399 |
</details>
|
| 400 |
|
| 401 |
<details>
|
|
@@ -426,6 +481,7 @@ EasyAnimateV5:
|
|
| 426 |
|
| 427 |
# 参考文献
|
| 428 |
- CogVideo: https://github.com/THUDM/CogVideo/
|
|
|
|
| 429 |
- magvit: https://github.com/google-research/magvit
|
| 430 |
- PixArt: https://github.com/PixArt-alpha/PixArt-alpha
|
| 431 |
- Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
|
|
|
|
| 30 |
#- vllm
|
| 31 |
---
|
| 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
# 目录
|
| 34 |
- [目录](#目录)
|
| 35 |
- [简介](#简介)
|
|
|
|
| 128 |
|
| 129 |
我们需要大约 60GB 的可用磁盘空间,请检查!
|
| 130 |
|
| 131 |
+
EasyAnimateV5-12B的视频大小可以由不同的GPU Memory生成,包括:
|
| 132 |
+
| GPU memory |384x672x72|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
|
| 133 |
+
|----------|----------|----------|----------|----------|----------|----------|
|
| 134 |
+
| 16GB | 🧡 | 🧡 | ❌ | ❌ | ❌ | ❌ |
|
| 135 |
+
| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | ❌ | ❌ |
|
| 136 |
+
| 40GB | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
|
| 137 |
+
| 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
| 138 |
+
|
| 139 |
+
✅ 表示它可以在"model_cpu_offload"的情况下运行,🧡代表它可以在"model_cpu_offload_and_qfloat8"的情况下运行,⭕️ 表示它可以在"sequential_cpu_offload"的情况下运行,❌ 表示它无法运行。请注意,使用sequential_cpu_offload运行会更慢。
|
| 140 |
+
|
| 141 |
+
有一些不支持torch.bfloat16的卡型,如2080ti、V100,需要将app.py、predict文件中的weight_dtype修改为torch.float16才可以运行。
|
| 142 |
+
|
| 143 |
+
EasyAnimateV5-12B使用不同GPU在25个steps中的生成时间如下:
|
| 144 |
+
| GPU |384x672x72|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
|
| 145 |
+
|----------|----------|----------|----------|----------|----------|----------|
|
| 146 |
+
| A10 24GB |约120秒 (4.8s/it)|约240秒 (9.6s/it)|约320秒 (12.7s/it)| 约750秒 (29.8s/it)| ❌ | ❌ |
|
| 147 |
+
| A100 80GB |约45秒 (1.75s/it)|约90秒 (3.7s/it)|约120秒 (4.7s/it)|约300秒 (11.4s/it)|约265秒 (10.6s/it)| 约710秒 (28.3s/it)|
|
| 148 |
+
|
| 149 |
+
(⭕️) 表示它可以在low_gpu_memory_mode=True的情况下运行,但速度较慢,同时❌ 表示它无法运行。
|
| 150 |
+
|
| 151 |
+
<details>
|
| 152 |
+
<summary>(Obsolete) EasyAnimateV3:</summary>
|
| 153 |
+
|
| 154 |
+
EasyAnimateV3的视频大小可以由不同的GPU Memory生成,包括:
|
| 155 |
+
| GPU memory | 384x672x72 | 384x672x144 | 576x1008x72 | 576x1008x144 | 720x1280x72 | 720x1280x144 |
|
| 156 |
+
|----------|----------|----------|----------|----------|----------|----------|
|
| 157 |
+
| 12GB | ⭕️ | ⭕️ | ⭕️ | ⭕️ | ❌ | ❌ |
|
| 158 |
+
| 16GB | ✅ | ✅ | ⭕️ | ⭕️ | ⭕️ | ❌ |
|
| 159 |
+
| 24GB | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
| 160 |
+
| 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
| 161 |
+
| 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
| 162 |
+
</details>
|
| 163 |
+
|
| 164 |
#### b. 权重放置
|
| 165 |
我们最好将[权重](#model-zoo)按照指定路径进行放置:
|
| 166 |
|
|
|
|
| 179 |
|
| 180 |
### EasyAnimateV5-12b-zh-InP
|
| 181 |
|
| 182 |
+
#### I2V
|
|
|
|
| 183 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 184 |
<tr>
|
| 185 |
<td>
|
|
|
|
| 198 |
</table>
|
| 199 |
|
| 200 |
|
|
|
|
|
|
|
| 201 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 202 |
<tr>
|
| 203 |
<td>
|
|
|
|
| 215 |
</tr>
|
| 216 |
</table>
|
| 217 |
|
|
|
|
|
|
|
| 218 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 219 |
<tr>
|
| 220 |
<td>
|
|
|
|
| 232 |
</tr>
|
| 233 |
</table>
|
| 234 |
|
| 235 |
+
#### T2V
|
| 236 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 237 |
+
<tr>
|
| 238 |
+
<td>
|
| 239 |
+
<video src="https://github.com/user-attachments/assets/eccb0797-4feb-48e9-91d3-5769ce30142b" width="100%" controls autoplay loop></video>
|
| 240 |
+
</td>
|
| 241 |
+
<td>
|
| 242 |
+
<video src="https://github.com/user-attachments/assets/76b3db64-9c7a-4d38-8854-dba940240ceb" width="100%" controls autoplay loop></video>
|
| 243 |
+
</td>
|
| 244 |
+
<td>
|
| 245 |
+
<video src="https://github.com/user-attachments/assets/0b8fab66-8de7-44ff-bd43-8f701bad6bb7" width="100%" controls autoplay loop></video>
|
| 246 |
+
</td>
|
| 247 |
+
<td>
|
| 248 |
+
<video src="https://github.com/user-attachments/assets/9fbddf5f-7fcd-4cc6-9d7c-3bdf1d4ce59e" width="100%" controls autoplay loop></video>
|
| 249 |
+
</td>
|
| 250 |
+
</tr>
|
| 251 |
+
</table>
|
| 252 |
+
|
| 253 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 254 |
+
<tr>
|
| 255 |
+
<td>
|
| 256 |
+
<video src="https://github.com/user-attachments/assets/19c1742b-e417-45ac-97d6-8bf3a80d8e13" width="100%" controls autoplay loop></video>
|
| 257 |
+
</td>
|
| 258 |
+
<td>
|
| 259 |
+
<video src="https://github.com/user-attachments/assets/641e56c8-a3d9-489d-a3a6-42c50a9aeca1" width="100%" controls autoplay loop></video>
|
| 260 |
+
</td>
|
| 261 |
+
<td>
|
| 262 |
+
<video src="https://github.com/user-attachments/assets/2b16be76-518b-44c6-a69b-5c49d76df365" width="100%" controls autoplay loop></video>
|
| 263 |
+
</td>
|
| 264 |
+
<td>
|
| 265 |
+
<video src="https://github.com/user-attachments/assets/e7d9c0fc-136f-405c-9fab-629389e196be" width="100%" controls autoplay loop></video>
|
| 266 |
+
</td>
|
| 267 |
+
</tr>
|
| 268 |
+
</table>
|
| 269 |
+
|
| 270 |
### EasyAnimateV5-12b-zh-Control
|
| 271 |
|
| 272 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
|
|
|
| 412 |
# 模型地址
|
| 413 |
EasyAnimateV5:
|
| 414 |
|
| 415 |
+
7B:
|
| 416 |
+
| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 417 |
+
|--|--|--|--|--|--|
|
| 418 |
+
| EasyAnimateV5-7b-zh-InP | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh-InP)| 官方的7B图生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持中文与英文双语预测 |
|
| 419 |
+
| EasyAnimateV5-7b-zh | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh)| 官方的7B文生视频权重。可用于进行下游任务的fientune。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持中文与英文双语预测 |
|
| 420 |
+
|
| 421 |
+
12B:
|
| 422 |
| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 423 |
|--|--|--|--|--|--|
|
| 424 |
| EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持中文与英文双语预测 |
|
|
|
|
| 428 |
<details>
|
| 429 |
<summary>(Obsolete) EasyAnimateV4:</summary>
|
| 430 |
|
| 431 |
+
| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 432 |
|--|--|--|--|--|--|
|
| 433 |
+
| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | 解压前 8.9 GB / 解压后 14.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV4-XL-2-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以144帧、每秒24帧进行训练 |
|
| 434 |
</details>
|
| 435 |
|
| 436 |
<details>
|
| 437 |
<summary>(Obsolete) EasyAnimateV3:</summary>
|
| 438 |
|
| 439 |
+
| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 440 |
|--|--|--|--|--|--|
|
| 441 |
+
| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB| [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-512x512)| 官方的512x512分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
|
| 442 |
+
| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-768x768)| 官方的768x768分辨��的图生视频权重。以144帧、每秒24帧进行训练 |
|
| 443 |
+
| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-960x960)| 官方的960x960(720P)分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
|
| 444 |
</details>
|
| 445 |
|
| 446 |
<details>
|
| 447 |
<summary>(Obsolete) EasyAnimateV2:</summary>
|
| 448 |
|
| 449 |
+
| 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | Model Scope | 描述 |
|
| 450 |
+
|--|--|--|--|--|--|--|
|
| 451 |
+
| EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-512x512)| 官方的512x512分辨率的重量。以144帧、每秒24帧进行训练 |
|
| 452 |
+
| EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-768x768)| 官方的768x768分辨率的重量。以144帧、每秒24帧进行训练 |
|
| 453 |
+
| easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | - | 使用特定类型的图像进行lora训练的结果。图片可从这里[下载](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/webui/Minimalism.zip). |
|
| 454 |
</details>
|
| 455 |
|
| 456 |
<details>
|
|
|
|
| 481 |
|
| 482 |
# 参考文献
|
| 483 |
- CogVideo: https://github.com/THUDM/CogVideo/
|
| 484 |
+
- Flux: https://github.com/black-forest-labs/flux
|
| 485 |
- magvit: https://github.com/google-research/magvit
|
| 486 |
- PixArt: https://github.com/PixArt-alpha/PixArt-alpha
|
| 487 |
- Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
|
README_en.md
CHANGED
|
@@ -112,6 +112,41 @@ The detailed of Linux:
|
|
| 112 |
- GPU:Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
|
| 113 |
|
| 114 |
We need about 60GB available on disk (for saving weights), please check!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
#### b. Weights
|
| 117 |
We'd better place the [weights](#model-zoo) along the specified path:
|
|
@@ -131,8 +166,7 @@ The results displayed are all based on image.
|
|
| 131 |
|
| 132 |
### EasyAnimateV5-12b-zh-InP
|
| 133 |
|
| 134 |
-
|
| 135 |
-
|
| 136 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 137 |
<tr>
|
| 138 |
<td>
|
|
@@ -151,8 +185,6 @@ Resolution-1024
|
|
| 151 |
</table>
|
| 152 |
|
| 153 |
|
| 154 |
-
Resolution-768
|
| 155 |
-
|
| 156 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 157 |
<tr>
|
| 158 |
<td>
|
|
@@ -170,8 +202,6 @@ Resolution-768
|
|
| 170 |
</tr>
|
| 171 |
</table>
|
| 172 |
|
| 173 |
-
Resolution-512
|
| 174 |
-
|
| 175 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 176 |
<tr>
|
| 177 |
<td>
|
|
@@ -189,6 +219,41 @@ Resolution-512
|
|
| 189 |
</tr>
|
| 190 |
</table>
|
| 191 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 192 |
### EasyAnimateV5-12b-zh-Control
|
| 193 |
|
| 194 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
|
@@ -335,6 +400,13 @@ For details on setting some parameters, please refer to [Readme Train](scripts/R
|
|
| 335 |
|
| 336 |
EasyAnimateV5:
|
| 337 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 338 |
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|
| 339 |
|--|--|--|--|--|--|
|
| 340 |
| EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
|
|
@@ -344,28 +416,29 @@ EasyAnimateV5:
|
|
| 344 |
<details>
|
| 345 |
<summary>(Obsolete) EasyAnimateV4:</summary>
|
| 346 |
|
| 347 |
-
| Name | Type | Storage Space |
|
| 348 |
|--|--|--|--|--|--|
|
| 349 |
-
| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | Before extraction: 8.9 GB \/ After extraction: 14.0 GB |
|
| 350 |
</details>
|
| 351 |
|
| 352 |
<details>
|
| 353 |
<summary>(Obsolete) EasyAnimateV3:</summary>
|
| 354 |
|
| 355 |
-
| Name | Type | Storage Space |
|
| 356 |
|--|--|--|--|--|--|
|
| 357 |
-
| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [
|
| 358 |
-
| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [
|
| 359 |
-
| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [
|
| 360 |
</details>
|
| 361 |
|
| 362 |
<details>
|
| 363 |
<summary>(Obsolete) EasyAnimateV2:</summary>
|
| 364 |
-
|
| 365 |
-
|
| 366 |
-
|
| 367 |
-
| EasyAnimateV2-XL-2-
|
| 368 |
-
|
|
|
|
|
| 369 |
</details>
|
| 370 |
|
| 371 |
<details>
|
|
@@ -397,6 +470,8 @@ EasyAnimateV5:
|
|
| 397 |
|
| 398 |
|
| 399 |
# Reference
|
|
|
|
|
|
|
| 400 |
- magvit: https://github.com/google-research/magvit
|
| 401 |
- PixArt: https://github.com/PixArt-alpha/PixArt-alpha
|
| 402 |
- Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
|
|
@@ -406,4 +481,4 @@ EasyAnimateV5:
|
|
| 406 |
- HunYuan DiT: https://github.com/tencent/HunyuanDiT
|
| 407 |
|
| 408 |
# License
|
| 409 |
-
This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).
|
|
|
|
| 112 |
- GPU:Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
|
| 113 |
|
| 114 |
We need about 60GB available on disk (for saving weights), please check!
|
| 115 |
+
The video size for EasyAnimateV5-12B can be generated by different GPU Memory, including:
|
| 116 |
+
|
| 117 |
+
| GPU memory | 384x672x72 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
|
| 118 |
+
|------------|------------|------------|------------|------------|------------|------------|
|
| 119 |
+
| 16GB | 🧡 | 🧡 | ❌ | ❌ | ❌ | ❌ |
|
| 120 |
+
| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | ❌ | ❌ |
|
| 121 |
+
| 40GB | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
|
| 122 |
+
| 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
| 123 |
+
|
| 124 |
+
✅ indicates it can run under "model_cpu_offload", 🧡 represents it can run under "model_cpu_offload_and_qfloat8", ⭕️ indicates it can run under "sequential_cpu_offload", ❌ means it can't run. Please note that running with sequential_cpu_offload will be slower.
|
| 125 |
+
|
| 126 |
+
Some GPUs that do not support torch.bfloat16, such as 2080ti and V100, require changing the weight_dtype in app.py and predict files to torch.float16 in order to run.
|
| 127 |
+
|
| 128 |
+
The generation time for EasyAnimateV5-12B using different GPUs over 25 steps is as follows:
|
| 129 |
+
|
| 130 |
+
| GPU | 384x672x72 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
|
| 131 |
+
|-----------|------------------|------------------|------------------|------------------|------------------|-----------------|
|
| 132 |
+
| A10 24GB | ~120s (4.8s/it) | ~240s (9.6s/it) | ~320s (12.7s/it) | ~750s (29.8s/it) | ❌ | ❌ |
|
| 133 |
+
| A100 80GB | ~45s (1.75s/it) | ~90s (3.7s/it) | ~120s (4.7s/it) | ~300s (11.4s/it) | ~265s (10.6s/it) | ~710s (28.3s/it) |
|
| 134 |
+
|
| 135 |
+
(⭕️) indicates it can run with low_gpu_memory_mode=True, but at a slower speed, and ❌ means it can't run.
|
| 136 |
+
|
| 137 |
+
<details>
|
| 138 |
+
<summary>(Obsolete) EasyAnimateV3:</summary>
|
| 139 |
+
|
| 140 |
+
The video size for EasyAnimateV3 can be generated by different GPU Memory, including:
|
| 141 |
+
|
| 142 |
+
| GPU memory | 384x672x72 | 384x672x144 | 576x1008x72 | 576x1008x144 | 720x1280x72 | 720x1280x144 |
|
| 143 |
+
|------------|------------|-------------|-------------|--------------|-------------|--------------|
|
| 144 |
+
| 12GB | ⭕️ | ⭕️ | ⭕️ | ⭕️ | ❌ | ❌ |
|
| 145 |
+
| 16GB | ✅ | ✅ | ⭕️ | ⭕️ | ⭕️ | ❌ |
|
| 146 |
+
| 24GB | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
| 147 |
+
| 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
| 148 |
+
| 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
| 149 |
+
</details>
|
| 150 |
|
| 151 |
#### b. Weights
|
| 152 |
We'd better place the [weights](#model-zoo) along the specified path:
|
|
|
|
| 166 |
|
| 167 |
### EasyAnimateV5-12b-zh-InP
|
| 168 |
|
| 169 |
+
#### I2V
|
|
|
|
| 170 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 171 |
<tr>
|
| 172 |
<td>
|
|
|
|
| 185 |
</table>
|
| 186 |
|
| 187 |
|
|
|
|
|
|
|
| 188 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 189 |
<tr>
|
| 190 |
<td>
|
|
|
|
| 202 |
</tr>
|
| 203 |
</table>
|
| 204 |
|
|
|
|
|
|
|
| 205 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 206 |
<tr>
|
| 207 |
<td>
|
|
|
|
| 219 |
</tr>
|
| 220 |
</table>
|
| 221 |
|
| 222 |
+
#### T2V
|
| 223 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 224 |
+
<tr>
|
| 225 |
+
<td>
|
| 226 |
+
<video src="https://github.com/user-attachments/assets/eccb0797-4feb-48e9-91d3-5769ce30142b" width="100%" controls autoplay loop></video>
|
| 227 |
+
</td>
|
| 228 |
+
<td>
|
| 229 |
+
<video src="https://github.com/user-attachments/assets/76b3db64-9c7a-4d38-8854-dba940240ceb" width="100%" controls autoplay loop></video>
|
| 230 |
+
</td>
|
| 231 |
+
<td>
|
| 232 |
+
<video src="https://github.com/user-attachments/assets/0b8fab66-8de7-44ff-bd43-8f701bad6bb7" width="100%" controls autoplay loop></video>
|
| 233 |
+
</td>
|
| 234 |
+
<td>
|
| 235 |
+
<video src="https://github.com/user-attachments/assets/9fbddf5f-7fcd-4cc6-9d7c-3bdf1d4ce59e" width="100%" controls autoplay loop></video>
|
| 236 |
+
</td>
|
| 237 |
+
</tr>
|
| 238 |
+
</table>
|
| 239 |
+
|
| 240 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 241 |
+
<tr>
|
| 242 |
+
<td>
|
| 243 |
+
<video src="https://github.com/user-attachments/assets/19c1742b-e417-45ac-97d6-8bf3a80d8e13" width="100%" controls autoplay loop></video>
|
| 244 |
+
</td>
|
| 245 |
+
<td>
|
| 246 |
+
<video src="https://github.com/user-attachments/assets/641e56c8-a3d9-489d-a3a6-42c50a9aeca1" width="100%" controls autoplay loop></video>
|
| 247 |
+
</td>
|
| 248 |
+
<td>
|
| 249 |
+
<video src="https://github.com/user-attachments/assets/2b16be76-518b-44c6-a69b-5c49d76df365" width="100%" controls autoplay loop></video>
|
| 250 |
+
</td>
|
| 251 |
+
<td>
|
| 252 |
+
<video src="https://github.com/user-attachments/assets/e7d9c0fc-136f-405c-9fab-629389e196be" width="100%" controls autoplay loop></video>
|
| 253 |
+
</td>
|
| 254 |
+
</tr>
|
| 255 |
+
</table>
|
| 256 |
+
|
| 257 |
### EasyAnimateV5-12b-zh-Control
|
| 258 |
|
| 259 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
|
|
|
| 400 |
|
| 401 |
EasyAnimateV5:
|
| 402 |
|
| 403 |
+
7B:
|
| 404 |
+
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|
| 405 |
+
|--|--|--|--|--|--|
|
| 406 |
+
| EasyAnimateV5-7b-zh-InP | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh-InP) | Official 7B image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
|
| 407 |
+
| EasyAnimateV5-7b-zh | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh) | Official 7B text-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
|
| 408 |
+
|
| 409 |
+
12B:
|
| 410 |
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|
| 411 |
|--|--|--|--|--|--|
|
| 412 |
| EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
|
|
|
|
| 416 |
<details>
|
| 417 |
<summary>(Obsolete) EasyAnimateV4:</summary>
|
| 418 |
|
| 419 |
+
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|
| 420 |
|--|--|--|--|--|--|
|
| 421 |
+
| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | Before extraction: 8.9 GB \/ After extraction: 14.0 GB |[🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV4-XL-2-InP)| | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 144 frames at a rate of 24 frames per second. |
|
| 422 |
</details>
|
| 423 |
|
| 424 |
<details>
|
| 425 |
<summary>(Obsolete) EasyAnimateV3:</summary>
|
| 426 |
|
| 427 |
+
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
|
| 428 |
|--|--|--|--|--|--|
|
| 429 |
+
| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-512x512) | EasyAnimateV3 official weights for 512x512 text and image to video resolution. Training with 144 frames and fps 24 |
|
| 430 |
+
| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-768x768) | EasyAnimateV3 official weights for 768x768 text and image to video resolution. Training with 144 frames and fps 24 |
|
| 431 |
+
| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-960x960) | EasyAnimateV3 official weights for 960x960 text and image to video resolution. Training with 144 frames and fps 24 |
|
| 432 |
</details>
|
| 433 |
|
| 434 |
<details>
|
| 435 |
<summary>(Obsolete) EasyAnimateV2:</summary>
|
| 436 |
+
|
| 437 |
+
| Name | Type | Storage Space | Url | Hugging Face | Model Scope | Description |
|
| 438 |
+
|--|--|--|--|--|--|--|
|
| 439 |
+
| EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-512x512)| EasyAnimateV2 official weights for 512x512 resolution. Training with 144 frames and fps 24 |
|
| 440 |
+
| EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-768x768)| EasyAnimateV2 official weights for 768x768 resolution. Training with 144 frames and fps 24 |
|
| 441 |
+
| easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | - | A lora training with a specifial type images. Images can be downloaded from [Url](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v2/Minimalism.zip). |
|
| 442 |
</details>
|
| 443 |
|
| 444 |
<details>
|
|
|
|
| 470 |
|
| 471 |
|
| 472 |
# Reference
|
| 473 |
+
- CogVideo: https://github.com/THUDM/CogVideo/
|
| 474 |
+
- Flux: https://github.com/black-forest-labs/flux
|
| 475 |
- magvit: https://github.com/google-research/magvit
|
| 476 |
- PixArt: https://github.com/PixArt-alpha/PixArt-alpha
|
| 477 |
- Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
|
|
|
|
| 481 |
- HunYuan DiT: https://github.com/tencent/HunyuanDiT
|
| 482 |
|
| 483 |
# License
|
| 484 |
+
This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).
|