Batch Normalization (BN) has been a standard component in designing deep...
The last decade has witnessed remarkable progress in the image captionin...
Referring expression comprehension (REC) aims to localize a target objec...
High-resolution representations are essential for position-sensitive vis...
Visual Grounding (VG) aims to locate the most relevant region in an imag...