Understanding hand-object manipulation by modeling the contextual relationship between actions, grasp types and object attributes
This paper proposes a novel method for understanding daily hand-object manipulation by developing computer vision-based techniques. Specifically, we focus on recognizing hand grasp types, object attributes and manipulation actions within an unified framework by exploring their contextual relationships. Our hypothesis is that it is necessary to jointly model hands, objects and actions in order to accurately recognize multiple tasks that are correlated to each other in hand-object manipulation. In the proposed model, we explore various semantic relationships between actions, grasp types and object attributes, and show how the context can be used to boost the recognition of each component. We also explore the spatial relationship between the hand and object in order to detect the manipulated object from hand in cluttered environment. Experiment results on all three recognition tasks show that our proposed method outperforms traditional appearance-based methods which are not designed to take into account contextual relationships involved in hand-object manipulation. The visualization and generalizability study of the learned context further supports our hypothesis.
READ FULL TEXT