Semantic tracking: Single-target tracking with inter-supervised convolutional networks

by   Jingjing Xiao, et al.

This article presents a semantic tracker which simultaneously tracks a single target and recognises its category. In general, it is hard to design a tracking model suitable for all object categories, e.g., a rigid tracker for a car is not suitable for a deformable gymnast. Category-based trackers usually achieve superior tracking performance for the objects of that specific category, but have difficulties being generalised. Therefore, we propose a novel unified robust tracking framework which explicitly encodes both generic features and category-based features. The tracker consists of a shared convolutional network (NetS), which feeds into two parallel networks, NetC for classification and NetT for tracking. NetS is pre-trained on ImageNet to serve as a generic feature extractor across the different object categories for NetC and NetT. NetC utilises those features within fully connected layers to classify the object category. NetT has multiple branches, corresponding to multiple categories, to distinguish the tracked object from the background. Since each branch in NetT is trained by the videos of a specific category or groups of similar categories, NetT encodes category-based features for tracking. During online tracking, NetC and NetT jointly determine the target regions with the right category and foreground labels for target estimation. To improve the robustness and precision, NetC and NetT inter-supervise each other and trigger network adaptation when their outputs are ambiguous for the same image regions (i.e., when the category label contradicts the foreground/background classification). We have compared the performance of our tracker to other state-of-the-art trackers on a large-scale tracking benchmark (100 sequences)---the obtained results demonstrate the effectiveness of our proposed tracker as it outperformed other 38 state-of-the-art tracking algorithms.


page 1

page 7

page 9

page 10


UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking

Convolutional neural networks (CNN) based tracking approaches have shown...

A Discriminative Single-Shot Segmentation Network for Visual Object Tracking

Template-based discriminative trackers are currently the dominant tracki...

SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

Visual tracking problem demands to efficiently perform robust classifica...

Two is a crowd: tracking relations in videos

Tracking multiple objects individually differs from tracking groups of r...

Network Transplanting

This paper focuses on a novel problem, i.e., transplanting a category-an...

Learning to Track Any Object

Object tracking can be formulated as "finding the right object in a vide...

Leveraging Tacit Information Embedded in CNN Layers for Visual Tracking

Different layers in CNNs provide not only different levels of abstractio...

Please sign up or login with your details

Forgot password? Click here to reset