EDF: Ensemble, Distill, and Fuse for Easy Video Labeling

12/10/2018
by   Giulio Zhou, et al.
0

We present a way to rapidly bootstrap object detection on unseen videos using minimal human annotations. We accomplish this by combining two complementary sources of knowledge (one generic and the other specific) using bounding box merging and model distillation. The first (generic) knowledge source is obtained from ensembling pre-trained object detectors using a novel bounding box merging and confidence reweighting scheme. We make the observation that model distillation with data augmentation can train a specialized detector that outperforms the noisy labels it was trained on, and train a Student Network on the ensemble detections that obtains higher mAP than the ensemble itself. The second (specialized) knowledge source comes from training a detector (which we call the Supervised Labeler) on a labeled subset of the video to generate detections on the unlabeled portion. We demonstrate on two popular vehicular datasets that these techniques work to emit bounding boxes for all vehicles in the frame with higher mean average precision (mAP) than any of the reference networks used, and that the combination of ensembled and human-labeled data produces object detections that outperform either alone.

READ FULL TEXT

page 6

page 9

research
11/18/2021

Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes

Despite great progress in object detection, most existing methods are li...
research
12/01/2018

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection

The labeling cost of large number of bounding boxes is one of the main c...
research
11/26/2020

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training

Although a polygon is a more accurate representation than an upright bou...
research
04/30/2021

Unsupervised data augmentation for object detection

Data augmentation has always been an effective way to overcome overfitti...
research
05/07/2021

Probabilistic Ranking-Aware Ensembles for Enhanced Object Detections

Model ensembles are becoming one of the most effective approaches for im...
research
12/10/2019

FootAndBall: Integrated player and ball detector

The paper describes a deep neural network-based detector dedicated for b...
research
02/02/2017

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

We introduce a new large-scale data set of video URLs with densely-sampl...

Please sign up or login with your details

Forgot password? Click here to reset