In this work, we consider the problem of goodness-of-fit (GoF) testing f...
We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for...
We introduce OpenFlamingo, a family of autoregressive vision-language mo...
Stochastic Gradient Descent (SGD) is one of the simplest and most popula...
Incremental decision making in real-world environments is one of the mos...
Attaining a high degree of user controllability in visual generation oft...
The field of text-to-image (T2I) generation has garnered significant
att...
Embodied agents have achieved prominent performance in following human
i...
In recent years, pre-trained large language models have demonstrated
rem...
Contrastive Language-Image Pretraining (CLIP) efficiently learns visual
...
Recent advances in text-to-image synthesis make it possible to visualize...
Language planning aims to implement complex high-level goals by decompos...
Human brains integrate linguistic and perceptual information simultaneou...
Dense video captioning aims to identify the events of interest in an inp...
Automatic evaluations for natural language generation (NLG) conventional...
Vision-and-language navigation (VLN) is a multimodal task where an agent...
A major challenge in visually grounded language generation is to build r...
In the vision-and-language navigation (VLN) task, an agent follows natur...
Stochastic gradient descent (SGD) algorithm is widely used for parameter...
Recent years have seen remarkable progress of text generation in differe...
We introduce Texar, an open-source toolkit aiming to support the broad s...