Logit based knowledge distillation gets less attention in recent years s...
Previous work on action representation learning focused on global
repres...
Despite the tremendous progress of Masked Autoencoders (MAE) in developi...
We present Sparrow, an information-seeking dialogue agent trained to be ...
Vision Transformer (ViT), as a powerful alternative to Convolutional Neu...
Monocular 3D object detection is one of the most challenging tasks in 3D...
It is well known that adversarial attacks can fool deep neural networks ...
To deploy a pre-trained deep CNN on resource-constrained mobile devices,...
Adversarial training is currently the most powerful defense against
adve...
Most semantic segmentation models treat semantic segmentation as a pixel...
Introducing explicit constraints on the structural predictions has been ...