Dense video captioning, a task of localizing meaningful moments and
gene...
The text retrieval task is mainly performed in two ways: the bi-encoder
...
Multi-hop retrieval is the task of retrieving a series of multiple docum...
Video-text retrieval has many real-world applications such as media
anal...