Procedural activity understanding requires perceiving human actions in t...
In this paper we present an approach for localizing steps of procedural
...
The focus of this work is sign spotting - given a video of an
isolated s...
In this paper, we consider the problem of audio-visual synchronisation
a...
In this work, we introduce the BBC-Oxford British Sign Language (BOBSL)
...
In this paper, we consider the task of spotting spoken keywords in silen...
The goal of this paper is to learn strong lip reading models that can
re...
The goal of this work is to temporally align asynchronous subtitles in s...
We tackle the problem of learning object detectors without supervision.
...
The objective of this work is to localize sound sources that are visible...
The objective of this work is to annotate sign instances across a broad
...
The focus of this work is sign spotting - given a video of an isolated s...
The goal of this work is to automatically determine whether and when a w...
Our objective is to transform a video into a set of discrete audio-visua...
Recent progress in fine-grained gesture and action classification, and
m...
The goal of this paper is speaker diarisation of videos collected 'in th...
The goal of this work is to train strong models for visual speech recogn...
Our objective is an audio-visual model for separating a single speaker f...
The goal of this work is to recognise phrases and sentences being spoken...
This paper introduces a new multi-modal dataset for visual and audio-vis...
The goal of this paper is to develop state-of-the-art models for lip rea...
Our goal is to isolate individual speakers from multi-talker simultaneou...
Cooperative multi-agent systems can be naturally used to model many real...
Many real-world problems, such as network packet routing and urban traff...