Vision-Language Pre-training (VLP) has achieved impressive performance o...
Due to the limitations of the model structure and pre-training objective...
This paper proposes an approach to Dense Video Captioning (DVC) without
...
Existed pre-training methods either focus on single-modal tasks or
multi...