Staged Contact-Aware Global Human Motion Forecasting

by   Luca Scofano, et al.

Scene-aware global human motion forecasting is critical for manifold applications, including virtual reality, robotics, and sports. The task combines human trajectory and pose forecasting within the provided scene context, which represents a significant challenge. So far, only Mao et al. NeurIPS'22 have addressed scene-aware global motion, cascading the prediction of future scene contact points and the global motion estimation. They perform the latter as the end-to-end forecasting of future trajectories and poses. However, end-to-end contrasts with the coarse-to-fine nature of the task and it results in lower performance, as we demonstrate here empirically. We propose a STAGed contact-aware global human motion forecasting STAG, a novel three-stage pipeline for predicting global human motion in a 3D environment. We first consider the scene and the respective human interaction as contact points. Secondly, we model the human trajectory forecasting within the scene, predicting the coarse motion of the human body as a whole. The third and last stage matches a plausible fine human joint motion to complement the trajectory considering the estimated contacts. Compared to the state-of-the-art (SoA), STAG achieves a 1.8 overall improvement in pose and trajectory prediction, respectively, on the scene-aware GTA-IM dataset. A comprehensive ablation study confirms the advantages of staged modeling over end-to-end approaches. Furthermore, we establish the significance of a newly proposed temporal counter called the "time-to-go", which tells how long it is before reaching scene contact and endpoints. Notably, STAG showcases its ability to generalize to datasets lacking a scene and achieves a new state-of-the-art performance on CMU-Mocap, without leveraging any social cues. Our code is released at:


page 19

page 20


Contact-aware Human Motion Forecasting

In this paper, we tackle the task of scene-aware 3D human motion forecas...

HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense Contact Guidance

Marker-less monocular 3D human motion capture (MoCap) with scene interac...

Simple Baseline for Single Human Motion Forecasting

Global human motion forecasting is important in many fields, which is th...

Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision

We tackle the problem of Human Locomotion Forecasting, a task for jointl...

Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting

Predicting the future is a crucial first step to effective control, sinc...

Sequential Forecasting of 100,000 Points

Predicting the future is a crucial first step to effective control, sinc...

SDMTL: Semi-Decoupled Multi-grained Trajectory Learning for 3D human motion prediction

Predicting future human motion is critical for intelligent robots to int...

Please sign up or login with your details

Forgot password? Click here to reset