i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?

10/20/2022
by   Kevin Zhang, et al.
0

Masked image modeling (MIM) has been recognized as a strong and popular self-supervised pre-training approach in the vision domain. However, the interpretability of the mechanism and properties of the learned representations by such a scheme are so far not well-explored. In this work, through comprehensive experiments and empirical studies on Masked Autoencoders (MAE), we address two critical questions to explore the behaviors of the learned representations: (i) Are the latent representations in Masked Autoencoders linearly separable if the input is a mixture of two images instead of one? This can be concrete evidence used to explain why MAE-learned representations have superior performance on downstream tasks, as proven by many literature impressively. (ii) What is the degree of semantics encoded in the latent feature space by Masked Autoencoders? To explore these two problems, we propose a simple yet effective Interpretable MAE (i-MAE) framework with a two-way image reconstruction and a latent feature reconstruction with distillation loss to help us understand the behaviors inside MAE's structure. Extensive experiments are conducted on CIFAR-10/100, Tiny-ImageNet and ImageNet-1K datasets to verify the observations we discovered. Furthermore, in addition to qualitatively analyzing the characteristics of the latent representations, we examine the existence of linear separability and the degree of semantics in the latent space by proposing two novel metrics. The surprising and consistent results across the qualitative and quantitative experiments demonstrate that i-MAE is a superior framework design for interpretability research of MAE frameworks, as well as achieving better representational ability. Code is available at https://github.com/vision-learning-acceleration-lab/i-mae.

READ FULL TEXT

page 3

page 4

page 9

page 13

page 14

page 15

page 16

page 18

research
08/18/2023

GiGaMAE: Generalizable Graph Masked Autoencoder via Collaborative Latent Space Reconstruction

Self-supervised learning with masked autoencoders has recently gained po...
research
03/17/2023

Denoising Diffusion Autoencoders are Unified Self-supervised Learners

Inspired by recent advances in diffusion models, which are reminiscent o...
research
03/09/2020

Set-Structured Latent Representations

Unstructured data often has latent component structure, such as the obje...
research
03/07/2019

Adversarial Mixup Resynthesizers

In this paper, we explore new approaches to combining information encode...
research
03/09/2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Masked Autoencoders (MAE) have been popular paradigms for large-scale vi...
research
08/31/2023

CL-MAE: Curriculum-Learned Masked Autoencoders

Masked image modeling has been demonstrated as a powerful pretext task f...
research
08/24/2023

Masked Autoencoders are Efficient Class Incremental Learners

Class Incremental Learning (CIL) aims to sequentially learn new classes ...

Please sign up or login with your details

Forgot password? Click here to reset