There has been a longstanding belief that generation can facilitate a tr...
We present Ego-Only, the first training pipeline that enables
state-of-t...
This paper presents a simple and effective visual prompting method for
a...
Video-language pre-training is crucial for learning powerful multi-modal...
This paper studies the potential of distilling knowledge from pre-traine...
The rise of transformers in vision tasks not only advances network backb...
We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-base...
Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component f...
We present TubeFormer-DeepLab, the first attempt to tackle multiple core...
Image pre-training, the current de-facto paradigm for a wide range of vi...
Recent advances in self-supervised contrastive learning yield good
image...
The success of language Transformers is primarily attributed to the pret...
Recently, self-attention operators have shown superior performance as a
...
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a
...
We establish a new H2 Korn's inequality and its discrete analog, which
g...
Hyperspectral imaging (HSI) unlocks the huge potential to a wide variety...
We present MaX-DeepLab, the first end-to-end model for panoptic segmenta...
The Wide Residual Networks (Wide-ResNets), a shallow but wide model vari...
Contrastive learning has been adopted as a core method for unsupervised
...
Convolution exploits locality for efficiency at a cost of missing long r...
In this paper, we study normalization methods for neural networks from t...
Compositional convolutional networks are generative compositional models...
Deep convolutional neural networks (DCNNs) are powerful models that yiel...
Sketch-based image retrieval (SBIR) is widely recognized as an important...
In this paper, we propose Weight Standardization (WS) to accelerate deep...
Scale variation has been a challenge from traditional to modern approach...