TAO: A Large-Scale Benchmark for Tracking Any Object

05/20/2020
by   Achal Dave, et al.
19

For many years, multi-object tracking benchmarks have focused on a handful of categories. Motivated primarily by surveillance and self-driving applications, these datasets provide tracks for people, vehicles, and animals, ignoring the vast majority of objects in the world. By contrast, in the related field of object detection, the introduction of large-scale, diverse datasets (e.g., COCO) have fostered significant progress in developing highly robust solutions. To bridge this gap, we introduce a similarly diverse dataset for Tracking Any Object (TAO). It consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average. Importantly, we adopt a bottom-up approach for discovering a large vocabulary of 833 categories, an order of magnitude more than prior tracking benchmarks. To this end, we ask annotators to label objects that move at any point in the video, and give names to them post factum. Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets. To ensure scalability of annotation, we employ a federated approach that focuses manual effort on labeling tracks for those relevant objects in a video (e.g., those that move). We perform an extensive evaluation of state-of-the-art trackers and make a number of important discoveries regarding large-vocabulary tracking in an open-world. In particular, we show that existing single- and multi-object trackers struggle when applied to this scenario in the wild, and that detection-based, multi-object trackers are in fact competitive with user-initialized ones. We hope that our dataset and analysis will boost further progress in the tracking community.

READ FULL TEXT

page 4

page 9

page 13

page 19

page 25

page 28

research
04/30/2022

AnimalTrack: A Large-scale Benchmark for Multi-Animal Tracking in the Wild

Multi-animal tracking (MAT), a multi-object tracking (MOT) problem, is c...
research
12/20/2022

Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection

Scaling object taxonomies is one of the important steps toward a robust ...
research
12/15/2021

Reliable Multi-Object Tracking in the Presence of Unreliable Detections

Recent multi-object tracking (MOT) systems have leveraged highly accurat...
research
01/18/2023

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

Recent advances in modeling 3D objects mostly rely on synthetic datasets...
research
08/07/2022

Robust Multi-Object Tracking by Marginal Inference

Multi-object tracking in videos requires to solve a fundamental problem ...
research
12/15/2022

Objaverse: A Universe of Annotated 3D Objects

Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebIm...
research
08/04/2022

SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset

Multi-object tracking (MOT) has been dominated by the use of track by de...

Please sign up or login with your details

Forgot password? Click here to reset