In this paper, we focus on inferring whether the given user command is c...
In this work, we present MoLang (a Motion-Language connecting model) for...
In this paper, we focus on the problem of efficiently locating a target
...
360^∘ videos convey holistic views for the surroundings of a scene. It
p...
We present an approach named the Cycled Composition Network that can mea...
As humans, we understand events in the visual world contextually, perfor...
Large-scale datasets are the cornerstone of self-supervised representati...
The recent success of Transformers in the language domain has motivated
...
We present a novel data augmentation technique, CRA (Contextual Response...
We present an approach named CurlingNet that can measure the semantic
di...
We present an approach named JSFusion (Joint Sequence Fusion) that can
m...
We address the problem of story-based temporal summarization of long
360...
We address the problem of highlight detection from a 360 degree video by...
The attention mechanisms in deep neural networks are inspired by human's...
YouTube-8M is the largest video dataset for multi-label video classifica...
We propose a high-level concept word detector that can be integrated wit...