Multimodal Query-guided Object Localization

by   Aditay Tripathi, et al.

Consider a scenario in one-shot query-guided object localization where neither an image of the object nor the object category name is available as a query. In such a scenario, a hand-drawn sketch of the object could be a choice for a query. However, hand-drawn crude sketches alone, when used as queries, might be ambiguous for object localization, e.g., a sketch of a laptop could be confused for a sofa. On the other hand, a linguistic definition of the category, e.g., a small portable computer small enough to use in your lap" along with the sketch query, gives better visual and semantic cues for object localization. In this work, we present a multimodal query-guided object localization approach under the challenging open-set setting. In particular, we use queries from two modalities, namely, hand-drawn sketch and description of the object (also known as gloss), to perform object localization. Multimodal query-guided object localization is a challenging task, especially when a large domain gap exists between the queries and the natural images, as well as due to the challenge of combining the complementary and minimal information present across the queries. For example, hand-drawn crude sketches contain abstract shape information of an object, while the text descriptions often capture partial semantic information about a given object category. To address the aforementioned challenges, we present a novel cross-modal attention scheme that guides the region proposal network to generate object proposals relevant to the input queries and a novel orthogonal projection-based proposal scoring technique that scores each proposal with respect to the queries, thereby yielding the final localization results. ...


page 3

page 20


Sketch-Guided Object Localization in Natural Images

We introduce the novel problem of localizing all the instances of an obj...

LiveSketch: Query Perturbations for Guided Sketch-based Visual Search

LiveSketch is a novel algorithm for searching large image collections us...

Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch

In this work, we investigate the problem of sketch-based object localiza...

Analyzing structural characteristics of object category representations from their semantic-part distributions

Studies from neuroscience show that part-mapping computations are employ...

Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild

This work investigates the problem of sketch-guided object localization ...

Sketch-based Video Object Localization

We introduce Sketch-based Video Object Localization (SVOL), a new task a...

Active query-driven visual search using probabilistic bisection and convolutional neural networks

We present a novel efficient object detection and localization framework...

Please sign up or login with your details

Forgot password? Click here to reset