A big part of achieving Artificial General Intelligence(AGI) is to build...
We develop and evaluate captioning models that allow control of caption
...
The core of our approach, Pixel Consensus Voting, is a framework for ins...
This paper presents a framework for the analysis of changes in visual
st...
In this work, we present a simple yet better variant of Self-Critical
Se...
We investigate the effect of different model architectures, training
obj...
We introduce DIODE, a dataset that contains thousands of diverse high
re...
We present a novel problem setting in zero-shot learning, zero-shot obje...
One property that remains lacking in image captions generated by contemp...
We consider generation and comprehension of natural language referring
e...