Recent research in representation learning has shown that hierarchical d...
This paper studies zero-shot cross-lingual transfer of vision-language
m...
The dominant paradigm for learning video-text representations – noise
co...
We propose an improved discriminative model prediction method for robust...
Facial image retrieval plays a significant role in forensic investigatio...
Unsupervised machine translation (MT) has recently achieved impressive
r...
An integral part of video analysis and surveillance is temporal activity...
There is a surging need across the world for protection against gun viol...
This paper studies the problem of predicting the distribution over multi...
With the aim of promoting and understanding the multilingual version of ...
Tremendous variation in the scale of people/head size is a critical prob...
The aim of crowd counting is to estimate the number of people in images ...
Contextual reasoning is essential to understand events in long untrimmed...
Bilingual lexicon induction, translating words from the source language ...
Every minute, hundreds of hours of video are uploaded to social media si...
Nowadays a huge number of user-generated videos are uploaded to social m...
The task of retrieving clips within videos based on a given natural lang...
Deciphering human behaviors to predict their future paths/trajectories a...
Inferring universal laws of the environment is an important ability of h...
We propose a traffic danger recognition model that works with arbitrary
...
This paper presents a novel dataset for traffic accidents analysis. Our ...
This paper presents a novel dataset for traffic accidents analysis.Our g...
Moments capture a huge part of our lives. Accurate recognition of these
...
In this work, we explore the cross-scale similarity in crowd counting
sc...
This notebook paper presents our system in the ActivityNet Dense Caption...
Recent insights on language and vision with neural networks have been
su...
A key problem in deep multi-attribute learning is to effectively discove...
We tackle the problem of learning concept classifiers from videos on the...
The topic diversity of open-domain videos leads to various vocabularies ...
This paper proposes a new task, MemexQA: given a collection of photos or...