A common practice in metric learning is to train and test an embedding m...
User modeling, which learns to represent users into a low-dimensional
re...
Vision and Language (VL) models offer an effective method for aligning
r...
The neural dynamics underlying brain activity are critical to understand...
Our paper proposes a direct sparse visual odometry method that combines ...
Large-scale pre-trained Vision Language (VL) models have shown remar...
Label-efficient and reliable semantic segmentation is essential for many...
Building object detectors that are robust to domain shifts is critical f...
As a beloved sport worldwide, dancing is getting integrated into traditi...
While transformers have greatly boosted performance in semantic segmenta...
Computer vision models suffer from a phenomenon known as catastrophic
fo...
Vision and Language (VL) models have demonstrated remarkable zero-shot
p...
Recently, large-scale pre-trained Vision-and-Language (VL) foundation mo...
We present a dataset generator engine named Web-based Visual Corpus Buil...
Vision Transformers (ViTs) have recently become the state-of-the-art acr...
A robot guide dog has compelling advantages over animal guide dogs for i...
In this paper, we present a novel Riemannian Motion Policy (RMP)flow-bas...
Each utterance in multi-turn empathetic dialogues has features such as
e...
In this paper, we provide a deep analysis of temporal modeling for actio...
While pose estimation is an important computer vision task, it requires
...
Deep models must learn robust and transferable representations in order ...
Non-IID dataset and heterogeneous environment of the local clients are
r...
Today's robotic quadruped systems can robustly walk over a diverse range...
Quadrupedal landing is a complex process involving large impacts, elabor...
Learning transferable and domain adaptive feature representations from v...
Unsupervised domain adaptation (UDA) methods can dramatically improve
ge...
Understanding documents from their visual snapshots is an emerging probl...
Progress in machine learning is typically measured by training and testi...
Developing video understanding intelligence is quite challenging because...
Semi-supervised learning (SSL) is an effective means to leverage unlabel...
Enabling mobile robots for solving challenging and diverse shape, textur...
Demonstrating acrobatic behavior of a humanoid robot such as flips and
s...
In this paper, we present machine learning models based on random forest...
In this paper, we introduce and examine a variant of the game of Nim (Sh...
Many self-supervised learning (SSL) methods have been successful in lear...
Network embedding is an influential graph mining technique for represent...
Current multilingual vision-language models either require a large numbe...
Existing unsupervised domain adaptation methods aim to transfer knowledg...
Unsupervised domain adaptation methods traditionally assume that all sou...
Prior work in multi-task learning has mainly focused on predictions on a...
Nodes in a multiplex network are connected by multiple types of relation...
Dynamic legged locomotion is a challenging topic because of the lack of
...
Existing vision-language methods typically support two languages at a ti...
This paper describes the control, and evaluation of a new human-scaled b...
Deep models are state-of-the-art for many computer vision tasks includin...
Recently, matrix factorization-based recommendation methods have been
cr...
Many real-world tasks solved by heterogeneous network embedding methods ...
Contemporary domain adaptation methods are very effective at aligning fe...
The recent surge of text-based online counseling applications enables us...
Artist recognition is a task of modeling the artist's musical style. Thi...