Large language models are trained in two stages: (1) unsupervised pretra...
Large vision-language models are generally applicable to many downstream...
Current large language models can perform reasonably well on complex tas...
We present Masked Audio-Video Learners (MAViL) to train audio-visual
rep...
We introduce CM3, a family of causally masked generative models trained ...
We present VideoCLIP, a contrastive approach to pre-train a unified mode...
We introduce HTLM, a hyper-text language model trained on a large-scale ...
We present a simplified, task-agnostic multi-modal pre-training approach...
Retrieving relevant contexts from a large corpus is a crucial step for t...
We introduce MARGE, a pre-trained sequence-to-sequence model learned wit...
In web search, typically a candidate generation step selects a small set...