Most vision-and-language pretraining research focuses on English tasks.
...
Various efforts in the Natural Language Processing (NLP) community have ...
Pretrained language models have been shown to encode relational informat...
Pretrained vision-and-language BERTs aim to learn representations that
c...
Approaches to Grounded Language Learning typically focus on a single
tas...
Since first introduced, computer simulation has been an increasingly
imp...
We present the results from the second shared task on multimodal machine...
We introduce the Multi30K dataset to stimulate multilingual multimodal
r...
In this paper we present an approach to multi-language image description...
Compounding is a highly productive word-formation process in some langua...