Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition

by   Christian Rupprecht, et al.

Webly-supervised learning has recently emerged as an alternative paradigm to traditional supervised learning based on large-scale datasets with manual annotations. The key idea is that models such as CNNs can be learned from the noisy visual data available on the web. In this work we aim to exploit web data for video understanding tasks such as action recognition and detection. One of the main problems in webly-supervised learning is cleaning the noisy labeled data from the web. The state-of-the-art paradigm relies on training a first classifier on noisy data that is then used to clean the remaining dataset. Our key insight is that this procedure biases the second classifier towards samples that the first one understands. Here we train two independent CNNs, a RGB network on web images and video frames and a second network using temporal information from optical flow. We show that training the networks independently is vastly superior to selecting the frames for the flow classifier by using our RGB network. Moreover, we show benefits in enriching the training set with different data sources from heterogeneous public web databases. We demonstrate that our framework outperforms all other webly-supervised methods on two public benchmarks, UCF-101 and Thumos'14.


page 1

page 3

page 4

page 5

page 8


Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web

Recently, attempts have been made to collect millions of videos to train...

Ordered Pooling of Optical Flow Sequences for Action Recognition

Training of Convolutional Neural Networks (CNNs) on long video sequences...

Memory-augmented Dense Predictive Coding for Video Representation Learning

The objective of this paper is self-supervised learning from video, in p...

Is Appearance Free Action Recognition Possible?

Intuition might suggest that motion and dynamic information are key to v...

Learning to Learn from Noisy Web Videos

Understanding the simultaneously very diverse and intricately fine-grain...

Weakly-Supervised Action Detection Guided by Audio Narration

Videos are more well-organized curated data sources for visual concept l...

Webly Supervised Learning of Convolutional Networks

We present an approach to utilize large amounts of web data for learning...

Please sign up or login with your details

Forgot password? Click here to reset