Temporal Dynamic Graph LSTM for Action-driven Video Object Detection

08/02/2017
by   Yuan Yuan, et al.
0

In this paper, we investigate a weakly-supervised object detection framework. Most existing frameworks focus on using static images to learn object detectors. However, these detectors often fail to generalize to videos because of the existing domain shift. Therefore, we investigate learning these detectors directly from boring videos of daily activities. Instead of using bounding boxes, we explore the use of action descriptions as supervision since they are relatively easy to gather. A common issue, however, is that objects of interest that are not involved in human actions are often absent in global action descriptions known as "missing label". To tackle this problem, we propose a novel temporal dynamic graph Long Short-Term Memory network (TD-Graph LSTM). TD-Graph LSTM enables global temporal reasoning by constructing a dynamic graph that is based on temporal correlations of object proposals and spans the entire video. The missing label issue for each individual frame can thus be significantly alleviated by transferring knowledge across correlated objects proposals in the whole video. Extensive evaluations on a large-scale daily-life action dataset (i.e., Charades) demonstrates the superiority of our proposed method. We also release object bounding-box annotations for more than 5,000 frames in Charades. We believe this annotated data can also benefit other research on video-based object recognition in the future.

READ FULL TEXT

page 2

page 6

page 7

page 8

page 11

page 12

page 13

research
10/04/2018

Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images

Deep learning based object detectors require thousands of diversified bo...
research
02/21/2017

Object Detection in Videos with Tubelet Proposal Networks

Object detection in videos has drawn increasing attention recently with ...
research
04/02/2019

Activity Driven Weakly Supervised Object Detection

Weakly supervised object detection aims at reducing the amount of superv...
research
03/06/2021

Learning from Counting: Leveraging Temporal Classification for Weakly Supervised Object Localization and Detection

This paper reports a new solution of leveraging temporal classification ...
research
02/02/2017

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

We introduce a new large-scale data set of video URLs with densely-sampl...
research
12/14/2021

Revisiting 3D Object Detection From an Egocentric Perspective

3D object detection is a key module for safety-critical robotics applica...
research
08/29/2019

Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending

We propose the first multi-frame video object detection framework traine...

Please sign up or login with your details

Forgot password? Click here to reset