Bag of Instances Aggregation Boosts Self-supervised Learning

by   Haohang Xu, et al.

Recent advances in self-supervised learning have experienced remarkable progress, especially for contrastive learning based methods, which regard each image as well as its augmentations as an individual class and try to distinguish them from all other images. However, due to the large quantity of exemplars, this kind of pretext task intrinsically suffers from slow convergence and is hard for optimization. This is especially true for small scale models, which we find the performance drops dramatically comparing with its supervised counterpart. In this paper, we propose a simple but effective distillation strategy for unsupervised learning. The highlight is that the relationship among similar samples counts and can be seamlessly transferred to the student to boost the performance. Our method, termed as BINGO, which is short for Bag of InstaNces aGgregatiOn, targets at transferring the relationship learned by the teacher to the student. Here bag of instances indicates a set of similar samples constructed by the teacher and are grouped within a bag, and the goal of distillation is to aggregate compact representations over the student with respect to instances in a bag. Notably, BINGO achieves new state-of-the-art performance on small scale models, i.e., 65.5 accuracies with linear evaluation on ImageNet, using ResNet-18 and ResNet-34 as backbone, respectively, surpassing baselines (52.5 by a significant margin. The code will be available at <>.


CompRess: Self-Supervised Learning by Compressing Representations

Self-supervised learning aims to learn good representations with unlabel...

Simple Distillation Baselines for Improving Small Self-supervised Models

While large self-supervised models have rivalled the performance of thei...

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

While self-supervised representation learning (SSL) has received widespr...

ISD: Self-Supervised Learning by Iterative Similarity Distillation

Recently, contrastive learning has achieved great results in self-superv...

Bi-directional Weakly Supervised Knowledge Distillation for Whole Slide Image Classification

Computer-aided pathology diagnosis based on the classification of Whole ...

Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning

Learning image representations without human supervision is an important...

Modulate Your Spectrum in Self-Supervised Learning

Whitening loss provides theoretical guarantee in avoiding feature collap...

Please sign up or login with your details

Forgot password? Click here to reset