Video Question Answering (VideoQA) has been significantly advanced from ...
How to efficiently transform large language models (LLMs) into instructi...
Masked Autoencoders (MAE) have been popular paradigms for large-scale vi...
Video recognition has been dominated by the end-to-end learning paradigm...
Capitalizing on large pre-trained models for various downstream tasks of...
This technical report introduces our winning solution to the spatio-temp...
Zero-shot artistic style transfer is an important image synthesis proble...