arxiv:2411.16781

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Published on Nov 25

· Submitted by

LiyiGang on Nov 28

Upvote

Authors:

Yiheng Li ,

Ruibing Hou ,

Xilin Chen

Abstract

Human pose plays a crucial role in the digital age. While recent works have achieved impressive progress in understanding and generating human poses, they often support only a single modality of control signals and operate in isolation, limiting their application in real-world scenarios. This paper presents UniPose, a framework employing Large Language Models (LLMs) to comprehend, generate, and edit human poses across various modalities, including images, text, and 3D SMPL poses. Specifically, we apply a pose tokenizer to convert 3D poses into discrete pose tokens, enabling seamless integration into the LLM within a unified vocabulary. To further enhance the fine-grained pose perception capabilities, we facilitate UniPose with a mixture of visual encoders, among them a pose-specific visual encoder. Benefiting from a unified learning strategy, UniPose effectively transfers knowledge across different pose-relevant tasks, adapts to unseen tasks, and exhibits extended capabilities. This work serves as the first attempt at building a general-purpose framework for pose comprehension, generation, and editing. Extensive experiments highlight UniPose's competitive and even superior performance across various pose-relevant tasks.

View arXiv page View PDF Add to collection

Community

LiyiGang

Paper author Paper submitter 20 days ago

This paper introduces UniPose, a unified framework that utilizes LLM to comprehend, generate, and edit human poses across diverse modalities (images, text, and 3D SMPL poses). UniPose employs a pose tokenizer to convert 3D poses into discrete tokens, enabling seamless integration into the LLM’s vocabulary. Additionally, it incorporates a mix of visual encoders, including a pose-specific encoder, to enhance fine-grained pose perception. UniPose effectively transfers knowledge across tasks through a unified learning strategy, and adapts to unseen challenges. As the first general-purpose framework for pose understanding, generation, and editing, UniPose performs various pose-related tasks competitively.