A Discriminative CNN Video Representation for Event Detection

11/14/2014
by   Zhongwen Xu, et al.
0

In this paper, we propose a discriminative video representation for event detection over a large scale video dataset when only limited hardware resources are available. The focus of this paper is to effectively leverage deep Convolutional Neural Networks (CNNs) to advance event detection, where only frame level static descriptors can be extracted by the existing CNN toolkit. This paper makes two contributions to the inference of CNN video representation. First, while average pooling and max pooling have long been the standard approaches to aggregating frame level static features, we show that performance can be significantly improved by taking advantage of an appropriate encoding method. Second, we propose using a set of latent concept descriptors as the frame descriptor, which enriches visual information while keeping it computationally affordable. The integration of the two contributions results in a new state-of-the-art performance in event detection over the largest video datasets. Compared to improved Dense Trajectories, which has been recognized as the best video representation for event detection, our new representation improves the Mean Average Precision (mAP) from 27.6 MEDTest 14 dataset and from 34.0 This work is the core part of the winning solution of our CMU-Informedia team in TRECVID MED 2014 competition.

READ FULL TEXT

page 4

page 6

page 8

research
05/28/2014

Detection Bank: An Object Detection Based Video Representation for Multimedia Event Recognition

While low-level image features have proven to be effective representatio...
research
03/07/2016

A novel learning-based frame pooling method for Event Detection

Detecting complex events in a large video collection crawled from video ...
research
09/13/2015

Vectors of Locally Aggregated Centers for Compact Video Representation

We propose a novel vector aggregation technique for compact video repres...
research
06/02/2019

Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network

High accuracy video label prediction (classification) models are attribu...
research
05/17/2022

A CLIP-Hitchhiker's Guide to Long Video Retrieval

Our goal in this paper is the adaptation of image-text models for long v...
research
10/10/2015

TagBook: A Semantic Video Representation without Supervision for Event Detection

We consider the problem of event detection in video for scenarios where ...
research
11/01/2021

AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling

Pooling layers are essential building blocks of Convolutional Neural Net...

Please sign up or login with your details

Forgot password? Click here to reset