Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning

by   Matthew Gwilliam, et al.

By leveraging contrastive learning, clustering, and other pretext tasks, unsupervised methods for learning image representations have reached impressive results on standard benchmarks. The result has been a crowded field - many methods with substantially different implementations yield results that seem nearly identical on popular benchmarks, such as linear evaluation on ImageNet. However, a single result does not tell the whole story. In this paper, we compare methods using performance-based benchmarks such as linear evaluation, nearest neighbor classification, and clustering for several different datasets, demonstrating the lack of a clear front-runner within the current state-of-the-art. In contrast to prior work that performs only supervised vs. unsupervised comparison, we compare several different unsupervised methods against each other. To enrich this comparison, we analyze embeddings with measurements such as uniformity, tolerance, and centered kernel alignment (CKA), and propose two new metrics of our own: nearest neighbor graph similarity and linear prediction overlap. We reveal through our analysis that in isolation, single popular methods should not be treated as though they represent the field as a whole, and that future work ought to consider how to leverage the complimentary nature of these methods. We also leverage CKA to provide a framework to robustly quantify augmentation invariance, and provide a reminder that certain types of invariance will be undesirable for downstream tasks.


page 1

page 2

page 7

page 8

page 13


Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

Self-supervised learning is a powerful paradigm for representation learn...

Adaptive Similarity Bootstrapping for Self-Distillation

Most self-supervised methods for representation learning leverage a cros...

Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos

Contrastive learning has recently narrowed the gap between self-supervis...

pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Nearest neighbor (NN) sampling provides more semantic variations than pr...

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

Masked Image Modeling (MIM) methods, like Masked Autoencoders (MAE), eff...

Unsupervised visualization of image datasets using contrastive learning

Visualization methods based on the nearest neighbor graph, such as t-SNE...

Please sign up or login with your details

Forgot password? Click here to reset