Query-Adaptive R-CNN for Open-Vocabulary Object Detection and Retrieval

11/27/2017
by   Ryota Hinami, et al.
0

We address the problem of open-vocabulary object retrieval and localization, which is to retrieve and localize objects from a very large-scale image database immediately by a textual query (e.g., a word or phrase). We first propose Query-Adaptive R-CNN, a simple yet strong framework for open-vocabulary object detection. Query-Adaptive R-CNN is a simple extension of Faster R-CNN from closed-vocabulary to open-vocabulary object detection: instead of learning a class-specific classifier and regressor, we learn a detector generator that transforms a text into classifier and regressor weights. All of its components can be learned in an end-to-end manner. Even with its simple architecture, it outperforms all state-of-the-art methods in the Flickr30k Entities phrase localization task. In addition, we propose negative phrase augmentation, a generic approach for exploiting hard negatives in the training of open-vocabulary object detection that significantly improves the discriminative ability of the generated classifier. We show that our system can retrieve and localize objects specified by a textual query from one million images in only 0.5 seconds.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset