Introducing MotionCLR: Interactive Motion Editing
Today, we release MotionCLR, our latest progress in interactive motion editing. I am writing this blog to the capacity of MotionCLR from two aspects: 1) for applications and 2) for research. I hope this blogpost will provide some new thoughts/interests to both communities.
To researchers: Introducing your generative models in the interaction loop of humans!
>>> Huamn-in-the-loop
Why do we introduce humans into the loop of generation? The last generated result is your "in-context" for editing.
On one hand, it is extremely hard to avoid hallucination in generative models, where the hallucination is a notoriously tricky problem in the community. On the other hand, some extra information, like jumping height, is not included in the text. Therefore, introducing the extra "click" or "typing" interaction is quite necessary. Therefore, introducing human-in-the-loop to motion generation is really promising.
The following figure shows how MotionCLR closes the loop of human-computer interaction.
>>> Understanding attention in motion generation
This might be the most technical part of the blog. Explainability has been a dark cloud in deep learning for years. MotionCLR revisits the technical designs of previous motion generation works and finds their architectures are not very clear. Therefore, we introduce the MotionCLR, indicating a clear correspondence between each motion frame and each word. We found that the activation of both self-attention and cross-attention exactly models when the action should be executed. This finding motivates us to develop a bag of editing methods. I believe the explainality based on MotionCLR will motivate more exploration in the coming years.
To users: Tell the model what you want again and again!
In the animation industry, there always exists a cumbersome process to produce a desired motion, like motion capturing and hand-crafting animation. With the rapid development of the text2motion task, artists can even produce a motion in a whisper [Dai et al., 20204]. However, a single-turn generation might be unsatisfactory. MotionCLR supports you with a bag of tools for versatile editing of motions. I only take some examples here.
Motion emphasizing or de-emphasizing. If you generate a motion first with the prompt "a man jumps." and you think the height of the jumping action is higher than you want, you can lower the weight of the "jumps" to satisfy your request.
Motion generation with an example. In the loop of interactive motion generation, you can generate a motion of "kicking". You can also generate a lot of motions similar to this motion (with the same motion texture). For example, the original example motion might kick with its left foot, and the new motion might include the kicking motion with the right one.
In-place motion replacement. Here is the case that you would like to generate several motions containing different actions taken in the same time zone. You can synthesize one motion at first, and edit the text via revising words directly.
For more details, you can visit our interactive demo video.
So, please read our paper to satisfy your interest. \(^_^)/