Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition

06/12/2017
by   Artsiom Ablavatski, et al.
0

We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition. The proposed model is a fully differentiable unit that can be optimized end-to-end by using Stochastic Gradient Descent (SGD). The Spatial Transformer (ST) was employed as visual attention mechanism which allows to learn the geometric transformation of objects within images. With the combination of the Spatial Transformer and the powerful recurrent architecture, the proposed EDRAM can localize and recognize objects simultaneously. EDRAM has been evaluated on two publicly available datasets including MNIST Cluttered (with 70K cluttered digits) and SVHN (with up to 250k real world images of house numbers). Experiments show that it obtains superior performance as compared with the state-of-the-art models.

READ FULL TEXT

page 2

page 6

research
12/24/2014

Multiple Object Recognition with Visual Attention

We present an attention-based model for recognizing multiple objects in ...
research
10/11/2021

Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition

The visual system processes a scene using a sequence of selective glimps...
research
02/21/2022

Guided Visual Attention Model Based on Interactions Between Top-down and Bottom-up Information for Robot Pose Prediction

Learning to control a robot commonly requires mapping between robot stat...
research
10/14/2016

Recurrent 3D Attentional Networks for End-to-End Active Object Recognition in Cluttered Scenes

Active vision is inherently attention-driven: The agent selects views of...
research
10/17/2022

A Saccaded Visual Transformer for General Object Spotting

This paper presents the novel combination of a visual transformer style ...
research
02/09/2019

Improving Deep Image Clustering With Spatial Transformer Layers

Image clustering is an important but challenging task in machine learnin...
research
05/04/2017

Recurrent Soft Attention Model for Common Object Recognition

We propose the Recurrent Soft Attention Model, which integrates the visu...

Please sign up or login with your details

Forgot password? Click here to reset