Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks

by   Subhabrata Bhattacharya, et al.

In this paper, we introduce an end-to-end framework for video analysis focused towards practical scenarios built on theoretical foundations from sparse representation, including a novel descriptor for general purpose video analysis. In our approach, we compute kinematic features from optical flow and first and second-order derivatives of intensities to represent motion and appearance respectively. These features are then used to construct covariance matrices which capture joint statistics of both low-level motion and appearance features extracted from a video. Using an over-complete dictionary of the covariance based descriptors built from labeled training samples, we formulate low-level event recognition as a sparse linear approximation problem. Within this, we pose the sparse decomposition of a covariance matrix, which also conforms to the space of semi-positive definite matrices, as a determinant maximization problem. Also since covariance matrices lie on non-linear Riemannian manifolds, we compare our former approach with a sparse linear approximation alternative that is suitable for equivalent vector spaces of covariance matrices. This is done by searching for the best projection of the query data on a dictionary using an Orthogonal Matching pursuit algorithm. We show the applicability of our video descriptor in two different application domains - namely low-level event recognition in unconstrained scenarios and gesture recognition using one shot learning. Our experiments provide promising insights in large scale video analysis.


page 1

page 5

page 7

page 10

page 11

page 13


Log-Euclidean Bag of Words for Human Action Recognition

Representing videos by densely extracted local space-time features has r...

Hierarchical sparse Cholesky decomposition with applications to high-dimensional spatio-temporal filtering

Spatial statistics often involves Cholesky decomposition of covariance m...

DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition

Being symmetric positive-definite (SPD), covariance matrix has tradition...

Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Sparsity-based representations have recently led to notable results in v...

One-Shot-Learning Gesture Recognition using HOG-HOF Features

The purpose of this paper is to describe one-shot-learning gesture recog...

Simultaneous diagonalisation of the covariance and complementary covariance matrices in quaternion widely linear signal processing

Recent developments in quaternion-valued widely linear processing have e...

Characterizing Human Behaviours Using Statistical Motion Descriptor

Identifying human behaviors is a challenging research problem due to the...

Please sign up or login with your details

Forgot password? Click here to reset