Priority-Centric Human Motion Generation in Discrete Latent Space

by   Hanyang Kong, et al.

Text-to-motion generation is a formidable task, aiming to produce human motions that align with the input text while also adhering to human capabilities and physical laws. While there have been advancements in diffusion models, their application in discrete spaces remains underexplored. Current methods often overlook the varying significance of different motions, treating them uniformly. It is essential to recognize that not all motions hold the same relevance to a particular textual description. Some motions, being more salient and informative, should be given precedence during generation. In response, we introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM), which utilizes a Transformer-based VQ-VAE to derive a concise, discrete motion representation, incorporating a global self-attention mechanism and a regularization term to counteract code collapse. We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token within the entire motion sequence. This approach retains the most salient motions during the reverse diffusion process, leading to more semantically rich and varied motions. Additionally, we formulate two strategies to gauge the importance of motion tokens, drawing from both textual and visual indicators. Comprehensive experiments on the HumanML3D and KIT-ML datasets confirm that our model surpasses existing techniques in fidelity and diversity, particularly for intricate textual descriptions.


page 2

page 4

page 5

page 8


DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

We present DiverseMotion, a new approach for synthesizing high-quality h...

AMD: Autoregressive Motion Diffusion

Human motion generation aims to produce plausible human motion sequences...

TEMOS: Generating diverse human motions from textual descriptions

We address the problem of generating diverse 3D human motions from textu...

InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions

We have recently seen tremendous progress in diffusion advances for gene...

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts

Inspired by the strong ties between vision and language, the two intimat...

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

In this work, we investigate a simple and must-known conditional generat...

MotionGPT: Finetuned LLMs are General-Purpose Motion Generators

Generating realistic human motion from given action descriptions has exp...

Please sign up or login with your details

Forgot password? Click here to reset