Action Anticipation for Collaborative Environments: The Impact of Contextual Information and Uncertainty-Based Prediction

For effectively interacting with humans in collaborative environments, machines need to be able anticipate future events, in order to execute actions in a timely manner. However, the observation of the human limbs movements may not be sufficient to anticipate their actions in an unambiguous manner. In this work we consider two additional sources of information (i.e. context) over time, gaze movements and object information, and study how these additional contextual cues improve the action anticipation performance. We address action anticipation as a classification task, where the model takes the available information as the input, and predicts the most likely action. We propose to use the uncertainty about each prediction as an online decision-making criterion for action anticipation. Uncertainty is modeled as a stochastic process applied to a time-based neural network architecture, which improves the conventional class-likelihood (i.e. deterministic) criterion. The main contributions of this paper are three-fold: (i) we propose a deep architecture that outperforms previous results in the action anticipation task; (ii) we show that contextual information is important do disambiguate the interpretation of similar actions; (iii) we propose the minimization of uncertainty as a more effective criterion for action anticipation, when compared with the maximization of class probability. Our results on the Acticipate dataset showed the importance of contextual information and the uncertainty criterion for action anticipation. We achieve an average accuracy of 98.75 anticipation task using only an average of 25 considering that a good anticipation model should also perform well in the action recognition task, we achieve an average accuracy of 100 recognition on the Acticipate dataset, when the entire observation set is used.


page 7

page 8

page 9

page 14


Contextual Action Recognition with R*CNN

There are multiple cues in an image which reveal what action a person is...

Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions

In this work, we address two coupled tasks of gaze prediction and action...

How Predictable is Your State? Leveraging Lexical and Contextual Information for Predicting Legislative Floor Action at the State Level

Modeling U.S. Congressional legislation and roll-call votes has received...

Predicting Human Intentions from Motion Only: A 2D+3D Fusion Approach

In this paper, we address the new problem of the prediction of human int...

SAFCAR: Structured Attention Fusion for Compositional Action Recognition

We present a general framework for compositional action recognition – i....

Efficient Spatialtemporal Context Modeling for Action Recognition

Contextual information plays an important role in action recognition. Lo...

Ensemble of LSTMs and feature selection for human action prediction

As robots are becoming more and more ubiquitous in human environments, i...

Please sign up or login with your details

Forgot password? Click here to reset