ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

05/21/2021
by   Lumin Xu, et al.
3

Human pose estimation has achieved significant progress in recent years. However, most of the recent methods focus on improving accuracy using complicated models and ignoring real-time efficiency. To achieve a better trade-off between accuracy and efficiency, we propose a novel neural architecture search (NAS) method, termed ViPNAS, to search networks in both spatial and temporal levels for fast online video pose estimation. In the spatial level, we carefully design the search space with five different dimensions including network depth, width, kernel size, group number, and attentions. In the temporal level, we search from a series of temporal feature fusions to optimize the total accuracy and speed across multiple video frames. To the best of our knowledge, we are the first to search for the temporal feature fusion and automatic computation allocation in videos. Extensive experiments demonstrate the effectiveness of our approach on the challenging COCO2017 and PoseTrack2018 datasets. Our discovered model family, S-ViPNAS and T-ViPNAS, achieve significantly higher inference speed (CPU real-time) without sacrificing the accuracy compared to the previous state-of-the-art methods.

READ FULL TEXT

page 4

page 8

page 12

page 14

page 15

research
12/13/2020

EfficientPose: Efficient Human Pose Estimation with Neural Architecture Search

Human pose estimation from image and video is a vital task in many multi...
research
11/02/2020

PV-NAS: Practical Neural Architecture Search for Video Recognition

Recently, deep learning has been utilized to solve video recognition pro...
research
09/16/2019

Pose Neural Fabrics Search

Neural Architecture Search (NAS) technologies have been successfully per...
research
03/09/2019

Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Achieving good speed and accuracy trade-off on target platform is very i...
research
03/30/2021

Differentiable Network Adaption with Elastic Search Space

In this paper we propose a novel network adaption method called Differen...
research
08/30/2021

Searching for Two-Stream Models in Multivariate Space for Video Recognition

Conventional video models rely on a single stream to capture the complex...
research
07/31/2020

DynaMiTe: A Dynamic Local Motion Model with Temporal Constraints for Robust Real-Time Feature Matching

Feature based visual odometry and SLAM methods require accurate and fast...

Please sign up or login with your details

Forgot password? Click here to reset