Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN

by   Jiyang Gao, et al.

Fine-grained image labels are desirable for many computer vision applications, such as visual search or mobile AI assistant. These applications rely on image classification models that can produce hundreds of thousands (e.g. 100K) of diversified fine-grained image labels on input images. However, training a network at this vocabulary scale is challenging, and suffers from intolerable large model size and slow training speed, which leads to unsatisfying classification performance. A straightforward solution would be training separate expert networks (specialists), with each specialist focusing on learning one specific vertical (e.g. cars, birds...). However, deploying dozens of expert networks in a practical system would significantly increase system complexity and inference latency, and consumes large amounts of computational resources. To address these challenges, we propose a Knowledge Concentration method, which effectively transfers the knowledge from dozens of specialists (multiple teacher networks) into one single model (one student network) to classify 100K object categories. There are three salient aspects in our method: (1) a multi-teacher single-student knowledge distillation framework; (2) a self-paced learning mechanism to allow the student to learn from different teachers at various paces; (3) structurally connected layers to expand the student network capacity with limited extra parameters. We validate our method on OpenImage and a newly collected dataset, Entity-Foto-Tree (EFT), with 100K categories, and show that the proposed model performs significantly better than the baseline generalist model.


Iterative Self Knowledge Distillation – From Pothole Classification to Fine-Grained and COVID Recognition

Pothole classification has become an important task for road inspection ...

Teacher Network Calibration Improves Cross-Quality Knowledge Distillation

We investigate cross-quality knowledge distillation (CQKD), a knowledge ...

Efficient Vision Transformers via Fine-Grained Manifold Distillation

This paper studies the model compression problem of vision transformers....

CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation

Knowledge distillation (KD) is an effective tool for compressing deep cl...

Self-Referenced Deep Learning

Knowledge distillation is an effective approach to transferring knowledg...

Distilling Knowledge from Object Classification to Aesthetics Assessment

In this work, we point out that the major dilemma of image aesthetics as...

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

In this paper, we propose a novel self-distillation method for fake spee...

Please sign up or login with your details

Forgot password? Click here to reset