Differentiable Soft-Masked Attention

06/01/2022
by   Ali Athar, et al.
4

Transformers have become prevalent in computer vision due to their performance and flexibility in modelling complex operations. Of particular significance is the 'cross-attention' operation, which allows a vector representation (e.g. of an object in an image) to be learned by attending to an arbitrarily sized set of input features. Recently, "Masked Attention" was proposed in which a given object representation only attends to those image pixel features for which the segmentation mask of that object is active. This specialization of attention proved beneficial for various image and video segmentation tasks. In this paper, we propose another specialization of attention which enables attending over `soft-masks' (those with continuous mask probabilities instead of binary values), and is also differentiable through these mask probabilities, thus allowing the mask used for attention to be learned within the network without requiring direct loss supervision. This can be useful for several applications. Specifically, we employ our "Differentiable Soft-Masked Attention" for the task of Weakly-Supervised Video Object Segmentation (VOS), where we develop a transformer-based network for VOS which only requires a single annotated image frame for training, but can also benefit from cycle consistency training on a video with just one annotated frame. Although there is no loss for masks in unlabeled frames, the network is still able to segment objects in those frames due to our novel attention formulation.

READ FULL TEXT

page 2

page 3

research
07/26/2023

Tracking Anything in High Quality

Visual object tracking is a fundamental video task in computer vision. R...
research
01/16/2021

VideoClick: Video Object Segmentation with a Single Click

Annotating videos with object segmentation masks typically involves a tw...
research
12/02/2021

GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation

Segmenting an image into its parts is a frequent preprocess for high-lev...
research
05/22/2023

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model

Unsupervised video object segmentation has made significant progress in ...
research
07/20/2020

Relatable Clothing: Detecting Visual Relationships between People and Clothing

Detecting visual relationships between people and clothing in an image h...
research
02/28/2023

One-Shot Video Inpainting

Recently, removing objects from videos and filling in the erased regions...
research
04/13/2023

Boosting Video Object Segmentation via Space-time Correspondence Learning

Current top-leading solutions for video object segmentation (VOS) typica...

Please sign up or login with your details

Forgot password? Click here to reset