Few-shot semantic segmentation is the task of learning to locate each pi...
Video question answering (VideoQA) is an essential task in vision-langua...
Recent works on multi-modal emotion recognition move towards end-to-end
...
Data scarcity and data imbalance have attracted a lot of attention in ma...
Video moment retrieval aims at finding the start and end timestamps of a...
Personalized video highlight detection aims to shorten a long video to
i...
Text-based image retrieval has seen considerable progress in recent year...
Text-based person search aims at retrieving target person in an image ga...
Temporal receptive fields of models play an important role in action
seg...
Greedy-NMS inherently raises a dilemma, where a lower NMS threshold will...
Aiming at improving performance of visual classification in a cost-effec...