anchor
commited on
Commit
•
f725526
1
Parent(s):
a893898
Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ Kwok-Wai Hung,
|
|
23 |
Chao Zhan,
|
24 |
Yingjie He,
|
25 |
Wenjiang Zhou
|
26 |
-
(<sup>*</sup>Equal Contribution,
|
27 |
</br>
|
28 |
Lyra Lab, Tencent Music Entertainment
|
29 |
|
@@ -352,14 +352,16 @@ please refer to [MuseV](https://github.com/TMElyralab/MuseV)
|
|
352 |
|
353 |
# Acknowledgements
|
354 |
|
355 |
-
1. MuseV has referred much to [TuneAVideo](https://github.com/showlab/Tune-A-Video), [diffusers](https://github.com/huggingface/diffusers), [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone/tree/master/src/pipelines), [animatediff](https://github.com/guoyww/AnimateDiff), [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter), [AnimateAnyone](https://arxiv.org/abs/2311.17117), [VideoFusion](https://arxiv.org/abs/2303.08320).
|
356 |
2. MuseV has been built on `ucf101` and `webvid` datasets.
|
357 |
|
358 |
Thanks for open-sourcing!
|
359 |
|
|
|
360 |
# Limitation
|
361 |
There are still many limitations, including
|
362 |
|
|
|
363 |
1. Limited types of video generation and limited motion range, partly because of limited types of training data. The released `MuseV` has been trained on approximately 60K human text-video pairs with resolution `512*320`. `MuseV` has greater motion range while lower video quality at lower resolution. `MuseV` tends to generate less motion range with high video quality. Trained on larger, higher resolution, higher quality text-video dataset may make `MuseV` better.
|
364 |
1. Watermarks may appear because of `webvid`. A cleaner dataset withour watermarks may solve this issue.
|
365 |
1. Limited types of long video generation. Visual Conditioned Parallel Denoise can solve accumulated error of video generation, but the current method is only suitable for relatively fixed camera scenes.
|
|
|
23 |
Chao Zhan,
|
24 |
Yingjie He,
|
25 |
Wenjiang Zhou
|
26 |
+
(<sup>*</sup>Equal Contribution, <sup>†</sup>Corresponding Author, benbinwu@tencent.com)
|
27 |
</br>
|
28 |
Lyra Lab, Tencent Music Entertainment
|
29 |
|
|
|
352 |
|
353 |
# Acknowledgements
|
354 |
|
355 |
+
1. MuseV has referred much to [TuneAVideo](https://github.com/showlab/Tune-A-Video), [diffusers](https://github.com/huggingface/diffusers), [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone/tree/master/src/pipelines), [animatediff](https://github.com/guoyww/AnimateDiff), [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter), [AnimateAnyone](https://arxiv.org/abs/2311.17117), [VideoFusion](https://arxiv.org/abs/2303.08320), [insightface](https://github.com/deepinsight/insightface).
|
356 |
2. MuseV has been built on `ucf101` and `webvid` datasets.
|
357 |
|
358 |
Thanks for open-sourcing!
|
359 |
|
360 |
+
|
361 |
# Limitation
|
362 |
There are still many limitations, including
|
363 |
|
364 |
+
1. Lack of generalization ability. Some visual condition image perform well, some perform bad. Some t2i pretraied model perform well, some perform bad.
|
365 |
1. Limited types of video generation and limited motion range, partly because of limited types of training data. The released `MuseV` has been trained on approximately 60K human text-video pairs with resolution `512*320`. `MuseV` has greater motion range while lower video quality at lower resolution. `MuseV` tends to generate less motion range with high video quality. Trained on larger, higher resolution, higher quality text-video dataset may make `MuseV` better.
|
366 |
1. Watermarks may appear because of `webvid`. A cleaner dataset withour watermarks may solve this issue.
|
367 |
1. Limited types of long video generation. Visual Conditioned Parallel Denoise can solve accumulated error of video generation, but the current method is only suitable for relatively fixed camera scenes.
|