This is my second attempt in my quest for consistent animation optimization that I thought it was worth to share this time. It directly uses computed depth frames from a 3D motion here, which means clean depth, allowing qualitative character swap. This approach is different from real-to-anime img2img chick videos. So there is no video reference. Good thing is it avoids the EBSynth hassle. Also VERY few manual aberration correction. The workflow is a bit special since it uses the Koikatsu h-game studio. I guess Blender works too. But this "studio" is perfect for 3D character and pose/scene customization with awesome community and plugins (like depth). The truth is I have more skills in Koikatsu than in Blender (shhh~). Here is the workflow, and I probably need some advice from you to optimize it: KOIKATSU STUDIO Once satisfied with the motion (can be MMD), extract the depth sequence, 15fps, 544x960 STABLE DIFFUSION 2. Use an anime consistent model and LorA 3. t2i : Generate the reference picture with one of the first depth frame 4. i2i : Using Multi-Control Net a. Batch depth with no pre-processor b. Reference with the reference pic generated in 2. c. TemporalKit starting with the reference pic generated in 2. POST PROCESS 5. FILM interpolation (x2 frames) 6. Optionnal : Upscale x2 (Anime6B) 7. FFMPEG to build the video (30fps) 8. Optionnal : Deflicker with Adobe NB : Well known animes are usually rendered at low fps, so I wouldn't overkill it at 60fps to keep the same anime feeling (+ it would take ages to process each step, and also randomly supported by socials apps like TikTok) Short hair + tight clothes are our friends Good consistency even without Deflicker Depth is better than Openpose to keep hair/clothes physics TO IMPROVE : - Hands gestures are still awful even with the TI negatives - Background consistency by processing the character separately and efficiently Hope you enjoy it, all this gives me new ideas...