Deep ViT Features as Dense Visual Descriptors

12/10/2021
by   Shir Amir, et al.
0

We leverage deep features extracted from a pre-trained Vision Transformer (ViT) as dense visual descriptors. We demonstrate that such features, when extracted from a self-supervised ViT model (DINO-ViT), exhibit several striking properties: (i) the features encode powerful high level information at high spatial resolution – i.e., capture semantic object parts at fine spatial granularity, and (ii) the encoded semantic information is shared across related, yet different object categories (i.e. super-categories). These properties allow us to design powerful dense ViT descriptors that facilitate a variety of applications, including co-segmentation, part co-segmentation and correspondences – all achieved by applying lightweight methodologies to deep ViT features (e.g., binning / clustering). We take these applications further to the realm of inter-class tasks – demonstrating how objects from related categories can be commonly segmented into semantic parts, under significant pose and appearance changes. Our methods, extensively evaluated qualitatively and quantitatively, achieve state-of-the-art part co-segmentation results, and competitive results with recent supervised methods trained specifically for co-segmentation and correspondences.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

research
10/05/2021

Fully Self-Supervised Class Awareness in Dense Object Descriptors

We address the problem of inferring self-supervised dense semantic corre...
research
01/02/2022

Splicing ViT Features for Semantic Appearance Transfer

We present a method for semantically transferring the visual appearance ...
research
09/15/2022

One-Shot Transfer of Affordance Regions? AffCorrs!

In this work, we tackle one-shot visual search of object parts. Given a ...
research
01/25/2019

Visual Categorization of Objects into Animal and Plant Classes Using Global Shape Descriptors

How humans can distinguish between general categories of objects? Are th...
research
04/27/2022

Self-Supervised Learning of Object Parts for Semantic Segmentation

Progress in self-supervised learning has brought strong general image re...
research
02/08/2023

Neural Congealing: Aligning Images to a Joint Semantic Atlas

We present Neural Congealing – a zero-shot self-supervised framework for...
research
04/14/2023

Uncovering the Inner Workings of STEGO for Safe Unsupervised Semantic Segmentation

Self-supervised pre-training strategies have recently shown impressive r...

Please sign up or login with your details

Forgot password? Click here to reset