Learning visual representations from natural language supervision has
re...
Variational autoencoders (VAEs) are one of the powerful likelihood-based...
Diverse and accurate vision+language modeling is an important goal to re...
Automatically describing an image is an important capability for virtual...
Image captioning is an important but challenging task, applicable to vir...