Vision Transformers (ViTs) have become a dominant paradigm for visual
re...
Convolution has been arguably the most important feature transform for m...
We propose to address the problem of few-shot classification by meta-lea...
Spatio-temporal convolution often fails to learn motion dynamics in vide...
Motion plays a crucial role in understanding videos and most state-of-th...
Most current action recognition methods heavily rely on appearance
infor...