Locate then Segment: A Strong Pipeline for Referring Image Segmentation

03/30/2021
by   Ya Jing, et al.
0

Referring image segmentation aims to segment the objects referred by a natural language expression. Previous methods usually focus on designing an implicit and recurrent feature interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask without explicitly modeling the localization information of the referent instances. To tackle these problems, we view this task from another perspective by decoupling it into a "Locate-Then-Segment" (LTS) scheme. Given a language expression, people generally first perform attention to the corresponding target image regions, then generate a fine segmentation mask about the object based on its context. The LTS first extracts and fuses both visual and textual features to get a cross-modal representation, then applies a cross-model interaction on the visual-textual features to locate the referred object with position prior, and finally generates the segmentation result with a light-weight segmentation network. Our LTS is simple but surprisingly effective. On three popular benchmark datasets, the LTS outperforms all the previous state-of-the-art methods by a large margin (e.g., +3.2 and +3.4 explicitly locating the object, which is also proved by visualization experiments. We believe this framework is promising to serve as a strong baseline for referring image segmentation.

READ FULL TEXT

page 1

page 3

page 6

page 7

research
12/04/2021

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Referring image segmentation is a fundamental vision-language task that ...
research
05/24/2023

MMNet: Multi-Mask Network for Referring Image Segmentation

Referring image segmentation aims to segment an object referred to by na...
research
08/18/2023

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Referring image segmentation aims to segment an object mentioned in natu...
research
08/19/2023

Whether you can locate or not? Interactive Referring Expression Generation

Referring Expression Generation (REG) aims to generate unambiguous Refer...
research
10/09/2021

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

Referring Image Segmentation (RIS) aims at segmenting the target object ...
research
01/30/2020

Dual Convolutional LSTM Network for Referring Image Segmentation

We consider referring image segmentation. It is a problem at the interse...
research
10/01/2020

Linguistic Structure Guided Context Modeling for Referring Image Segmentation

Referring image segmentation aims to predict the foreground mask of the ...

Please sign up or login with your details

Forgot password? Click here to reset