|
|
|
# AnimatedDiff ControlNet SDXL Example |
|
|
|
This document provides a step-by-step guide to setting up and running the `animatediff_controlnet_sdxl.py` script from the Hugging Face repository. The script leverages the `diffusers-sdxl-controlnet` library to generate animated images using ControlNet and SDXL models. |
|
|
|
## Prerequisites |
|
|
|
Before running the script, ensure you have the necessary dependencies installed. You can install them using the following commands: |
|
|
|
### System Dependencies |
|
|
|
```bash |
|
sudo apt-get update && sudo apt-get install git-lfs cbm ffmpeg |
|
``` |
|
|
|
### Python Dependencies |
|
|
|
```bash |
|
pip install git+https://huggingface.co/svjack/diffusers-sdxl-controlnet |
|
pip install transformers peft sentencepiece moviepy==1.0.3 controlnet_aux |
|
``` |
|
|
|
### Clone the Repository |
|
|
|
```bash |
|
git clone https://huggingface.co/svjack/diffusers-sdxl-controlnet |
|
cp diffusers-sdxl-controlnet/girl-pose.gif . |
|
cp diffusers-sdxl-controlnet/girl_beach.mp4 . |
|
``` |
|
|
|
## Script Modifications |
|
|
|
The script requires some modifications to work correctly. Specifically, you need to comment out certain lines related to LoRA processors: |
|
|
|
```python |
|
''' |
|
drop #LoRAAttnProcessor2_0, |
|
#LoRAXFormersAttnProcessor, |
|
''' |
|
``` |
|
|
|
## GIF to Frames Conversion |
|
|
|
The script includes a function to convert a GIF into individual frames. This is useful for preparing input data for the animation pipeline. |
|
|
|
```python |
|
from PIL import Image, ImageSequence |
|
import os |
|
|
|
def gif_to_frames(gif_path, output_folder): |
|
# Open the GIF file |
|
gif = Image.open(gif_path) |
|
|
|
# Ensure the output folder exists |
|
if not os.path.exists(output_folder): |
|
os.makedirs(output_folder) |
|
|
|
# Iterate through each frame of the GIF |
|
for i, frame in enumerate(ImageSequence.Iterator(gif)): |
|
# Copy the frame |
|
frame_copy = frame.copy() |
|
|
|
# Save the frame to the specified folder |
|
frame_path = os.path.join(output_folder, f"frame_{i:04d}.png") |
|
frame_copy.save(frame_path) |
|
|
|
print(f"Successfully extracted {i + 1} frames to {output_folder}") |
|
|
|
# Example call |
|
gif_to_frames("girl-pose.gif", "girl_pose_frames") |
|
``` |
|
|
|
### Use this girl pose as pose source video (gif) |
|
|
|
![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/6oTdxQtI0nLGq2YB4KYTh.gif) |
|
|
|
## Running the Script |
|
|
|
To run the script, follow these steps: |
|
|
|
1. **Add the Script Path to System Path**: |
|
|
|
```python |
|
import sys |
|
sys.path.insert(0, "diffusers-sdxl-controlnet/examples/community/") |
|
from animatediff_controlnet_sdxl import * |
|
from controlnet_aux.processor import Processor |
|
``` |
|
|
|
2. **Load Necessary Libraries and Models**: |
|
|
|
```python |
|
import torch |
|
from diffusers.models import MotionAdapter |
|
from diffusers import DDIMScheduler |
|
from diffusers.utils import export_to_gif |
|
from diffusers import AutoPipelineForText2Image, ControlNetModel |
|
from diffusers.utils import load_image |
|
from PIL import Image |
|
``` |
|
|
|
3. **Load the MotionAdapter Model**: |
|
|
|
```python |
|
adapter = MotionAdapter.from_pretrained( |
|
"a-r-r-o-w/animatediff-motion-adapter-sdxl-beta", |
|
torch_dtype=torch.float16 |
|
) |
|
``` |
|
|
|
4. **Configure the Scheduler and ControlNet**: |
|
|
|
```python |
|
model_id = "svjack/GenshinImpact_XL_Base" |
|
scheduler = DDIMScheduler.from_pretrained( |
|
model_id, |
|
subfolder="scheduler", |
|
clip_sample=False, |
|
timestep_spacing="linspace", |
|
beta_schedule="linear", |
|
steps_offset=1, |
|
) |
|
|
|
controlnet = ControlNetModel.from_pretrained( |
|
"thibaud/controlnet-openpose-sdxl-1.0", |
|
torch_dtype=torch.float16, |
|
).to("cuda") |
|
``` |
|
|
|
5. **Load the AnimateDiffSDXLControlnetPipeline**: |
|
|
|
```python |
|
pipe = AnimateDiffSDXLControlnetPipeline.from_pretrained( |
|
model_id, |
|
controlnet=controlnet, |
|
motion_adapter=adapter, |
|
scheduler=scheduler, |
|
torch_dtype=torch.float16, |
|
).to("cuda") |
|
``` |
|
|
|
6. **Enable Memory Saving Features**: |
|
|
|
```python |
|
pipe.enable_vae_slicing() |
|
pipe.enable_vae_tiling() |
|
``` |
|
|
|
7. **Load Conditioning Frames**: |
|
|
|
```python |
|
import os |
|
folder_path = "girl_pose_frames/" |
|
frames = os.listdir(folder_path) |
|
frames = list(filter(lambda x: x.endswith(".png"), frames)) |
|
frames.sort() |
|
conditioning_frames = list(map(lambda x: Image.open(os.path.join(folder_path ,x)).resize((1024, 1024)), frames))[:16] |
|
``` |
|
|
|
8. **Process Conditioning Frames**: |
|
|
|
```python |
|
p2 = Processor("openpose") |
|
cn2 = [p2(frame) for frame in conditioning_frames] |
|
``` |
|
|
|
9. **Define Prompts**: |
|
|
|
```python |
|
prompt = ''' |
|
solo,Xiangling\(genshin impact\),1girl, |
|
full body professional photograph of a stunning detailed, sharp focus, dramatic |
|
cinematic lighting, octane render unreal engine (film grain, blurry background |
|
''' |
|
prompt = "solo,Xiangling\(genshin impact\),1girl,full body professional photograph of a stunning detailed" |
|
negative_prompt = "bad quality, worst quality, jpeg artifacts, ugly" |
|
``` |
|
|
|
10. **Generate Output**: (Use Genshin Impact character Xiangling) |
|
|
|
```python |
|
prompt = ''' |
|
solo,Xiangling\(genshin impact\),1girl, |
|
full body professional photograph of a stunning detailed, sharp focus, dramatic |
|
cinematic lighting, octane render unreal engine (film grain, blurry background |
|
''' |
|
prompt = "solo,Xiangling\(genshin impact\),1girl,full body professional photograph of a stunning detailed" |
|
|
|
#prompt = "solo,Xiangling\(genshin impact\),1girl" |
|
negative_prompt = "bad quality, worst quality, jpeg artifacts, ugly" |
|
|
|
generator = torch.Generator(device="cpu").manual_seed(0) |
|
output = pipe( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
num_inference_steps=50, |
|
guidance_scale=20, |
|
controlnet_conditioning_scale = 1.0, |
|
width=512, |
|
height=768, |
|
num_frames=16, |
|
conditioning_frames=cn2, |
|
generator = generator |
|
) |
|
``` |
|
|
|
11. **Export Frames to GIF**: |
|
|
|
```python |
|
frames = output.frames[0] |
|
export_to_gif(frames, "xiangling_animation.gif") |
|
``` |
|
|
|
12. **Display the Result**: |
|
|
|
```python |
|
from IPython import display |
|
display.Image("xiangling_animation.gif") |
|
``` |
|
|
|
### Target gif |
|
|
|
<div style="display: flex; justify-content: center; flex-wrap: nowrap;"> |
|
<div style="margin-right: 10px;"> |
|
<img src="xiangling_animation.gif" alt="Image 1" style="width: 512px; height: 768px;"> |
|
</div> |
|
</div> |
|
|
|
### Use Anime Upscale in https://github.com/svjack/APISR |
|
|
|
<div style="display: flex; justify-content: center; flex-wrap: nowrap;"> |
|
<div style="margin-left: 10px;"> |
|
<img src="xiangling_animation_frames_4x.gif" alt="Image 2" style="width: 512px; height: 768px;"> |
|
</div> |
|
</div> |
|
|
|
### Run in Command line |
|
- animatediff_controlnet_sdxl_run_script.py |
|
```python |
|
import sys |
|
sys.path.insert(0, "diffusers-sdxl-controlnet/examples/community/") |
|
from animatediff_controlnet_sdxl import * |
|
|
|
import argparse |
|
from moviepy.editor import VideoFileClip, ImageSequenceClip |
|
import os |
|
import torch |
|
from diffusers.models import MotionAdapter |
|
from diffusers import DDIMScheduler, AutoPipelineForText2Image, ControlNetModel |
|
from diffusers.utils import export_to_gif |
|
from PIL import Image |
|
from controlnet_aux.processor import Processor |
|
|
|
# 初始化 MotionAdapter 和 ControlNetModel |
|
adapter = MotionAdapter.from_pretrained("a-r-r-o-w/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16) |
|
|
|
def initialize_pipeline(model_id): |
|
scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", beta_schedule="linear", steps_offset=1) |
|
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16).to("cuda") |
|
|
|
# 初始化 AnimateDiffSDXLControlnetPipeline |
|
pipe = AnimateDiffSDXLControlnetPipeline.from_pretrained( |
|
model_id, |
|
controlnet=controlnet, |
|
motion_adapter=adapter, |
|
scheduler=scheduler, |
|
torch_dtype=torch.float16, |
|
).to("cuda") |
|
pipe.enable_vae_slicing() |
|
pipe.enable_vae_tiling() |
|
return pipe |
|
|
|
def split_video_into_frames(input_video_path, num_frames, temp_folder='temp_frames'): |
|
""" |
|
将视频处理成指定帧数的视频,并保持原始的帧率。 |
|
|
|
:param input_video_path: 输入视频文件路径 |
|
:param num_frames: 目标帧数 |
|
:param temp_folder: 临时文件夹路径 |
|
""" |
|
clip = VideoFileClip(input_video_path) |
|
original_duration = clip.duration |
|
segment_duration = original_duration / num_frames |
|
|
|
if not os.path.exists(temp_folder): |
|
os.makedirs(temp_folder) |
|
|
|
for i in range(num_frames): |
|
frame_time = i * segment_duration |
|
frame_path = os.path.join(temp_folder, f'frame_{i:04d}.png') |
|
clip.save_frame(frame_path, t=frame_time) |
|
|
|
frame_paths = [os.path.join(temp_folder, f'frame_{i:04d}.png') for i in range(num_frames)] |
|
final_clip = ImageSequenceClip(frame_paths, fps=clip.fps) |
|
final_clip.write_videofile("resampled_video.mp4", codec='libx264') |
|
|
|
print(f"新的视频已保存到 resampled_video.mp4,包含 {num_frames} 个帧,并保持原始的帧率。") |
|
|
|
def generate_video_with_prompt(input_video_path, prompt, model_id, gif_output_path, seed=0, num_frames=16, keep_imgs=False, temp_folder='temp_frames', num_inference_steps=50, guidance_scale=20, controlnet_conditioning_scale=1.0, width=512, height=768): |
|
""" |
|
生成带有文本提示的视频。 |
|
|
|
:param input_video_path: 输入视频文件路径 |
|
:param prompt: 文本提示 |
|
:param model_id: 模型ID |
|
:param gif_output_path: GIF 输出文件路径 |
|
:param seed: 随机种子 |
|
:param num_frames: 目标帧数 |
|
:param keep_imgs: 是否保留临时图片 |
|
:param temp_folder: 临时文件夹路径 |
|
:param num_inference_steps: 推理步数 |
|
:param guidance_scale: 引导比例 |
|
:param controlnet_conditioning_scale: ControlNet 条件比例 |
|
:param width: 输出宽度 |
|
:param height: 输出高度 |
|
""" |
|
split_video_into_frames(input_video_path, num_frames, temp_folder) |
|
|
|
folder_path = temp_folder |
|
frames = os.listdir(folder_path) |
|
frames = list(filter(lambda x: x.endswith(".png"), frames)) |
|
frames.sort() |
|
conditioning_frames = list(map(lambda x: Image.open(os.path.join(folder_path, x)).resize((1024, 1024)), frames))[:num_frames] |
|
|
|
p2 = Processor("openpose") |
|
cn2 = [p2(frame) for frame in conditioning_frames] |
|
|
|
negative_prompt = "bad quality, worst quality, jpeg artifacts, ugly" |
|
generator = torch.Generator(device="cuda").manual_seed(seed) |
|
|
|
pipe = initialize_pipeline(model_id) |
|
|
|
output = pipe( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
num_inference_steps=num_inference_steps, |
|
guidance_scale=guidance_scale, |
|
controlnet_conditioning_scale=controlnet_conditioning_scale, |
|
width=width, |
|
height=height, |
|
num_frames=num_frames, |
|
conditioning_frames=cn2, |
|
generator=generator |
|
) |
|
|
|
frames = output.frames[0] |
|
export_to_gif(frames, gif_output_path) |
|
|
|
print(f"生成的 GIF 已保存到 {gif_output_path}") |
|
|
|
if not keep_imgs: |
|
# 删除临时文件夹 |
|
import shutil |
|
shutil.rmtree(temp_folder) |
|
|
|
if __name__ == "__main__": |
|
parser = argparse.ArgumentParser(description="生成带有文本提示的视频") |
|
parser.add_argument("input_video", help="输入视频文件路径") |
|
parser.add_argument("prompt", help="文本提示") |
|
parser.add_argument("model_id", help="模型ID") |
|
parser.add_argument("gif_output_path", help="GIF 输出文件路径") |
|
parser.add_argument("--seed", type=int, default=0, help="随机种子") |
|
parser.add_argument("--num_frames", type=int, default=16, help="目标帧数") |
|
parser.add_argument("--keep_imgs", action="store_true", help="是否保留临时图片") |
|
parser.add_argument("--temp_folder", default='temp_frames', help="临时文件夹路径") |
|
parser.add_argument("--num_inference_steps", type=int, default=50, help="推理步数") |
|
parser.add_argument("--guidance_scale", type=float, default=20.0, help="引导比例") |
|
parser.add_argument("--controlnet_conditioning_scale", type=float, default=1.0, help="ControlNet 条件比例") |
|
parser.add_argument("--width", type=int, default=512, help="输出宽度") |
|
parser.add_argument("--height", type=int, default=768, help="输出高度") |
|
|
|
args = parser.parse_args() |
|
|
|
generate_video_with_prompt(args.input_video, args.prompt, args.model_id, args.gif_output_path, args.seed, args.num_frames, |
|
args.keep_imgs, args.temp_folder, args.num_inference_steps, args.guidance_scale, args.controlnet_conditioning_scale, args.width, args.height) |
|
``` |
|
|
|
```bash |
|
python animatediff_controlnet_sdxl_run_script.py girl_beach.mp4 \ |
|
"solo,Xiangling\(genshin impact\),1girl,full body professional photograph of a stunning detailed, drink tea use chinese cup" \ |
|
"svjack/GenshinImpact_XL_Base" \ |
|
xiangling_tea_animation.gif --num_frames 16 --temp_folder temp_frames |
|
``` |
|
- Pose: girl_beach.mp4 |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/pYx23VyLNkLk3YxAAqu5i.mp4"></video> |
|
- Output: xiangling_tea_animation.gif |
|
![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/qUZOvGs5rzxN8zaZ4Xp3s.gif) |
|
- Upscaled: |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/uwUDYOPiZbHuq5v6jWADr.mp4"></video> |
|
|
|
### Some Other Samples |
|
- produce_gif_script.py |
|
```bash |
|
python produce_gif_script.py xiangling_video_seed.csv "svjack/GenshinImpact_XL_Base" xiangling_gif_dir \ |
|
--num_frames 16 --temp_folder temp_frames --seed 0 --controlnet_conditioning_scale 0.3 |
|
``` |
|
![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/R2SpiNASjQj8k_wrZDJA5.gif |
|
![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ssJZD1SXLLu4EdpSZKcP2.gif)]() |
|
|
|
## Conclusion |
|
|
|
This script demonstrates how to use the `diffusers-sdxl-controlnet` library to generate animated images with ControlNet and SDXL models. By following the steps outlined above, you can create and visualize your own animated sequences. |
|
|