We present our work on the multimodal coreference resolution task of the...
Motivated by the increasing popularity of intelligent editing assistant,...
Visual reasoning tasks such as visual question answering (VQA) require a...
Missing sentence generation (or sentence infilling) fosters a wide range...