WildQA: In-the-Wild Video Question Answering

by   Santiago Castro, et al.
University of Michigan

Existing video understanding datasets mostly focus on human interactions, with little attention being paid to the "in the wild" settings, where the videos are recorded outdoors. We propose WILDQA, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question and answer (Video Evidence Selection). Through evaluations using a wide range of baseline models, we show that WILDQA poses new challenges to the vision and language research communities. The dataset is available at https://lit.eecs.umich.edu/wildqa/.


page 1

page 2

page 5

page 8

page 9

page 18

page 22

page 23


The Forgettable-Watcher Model for Video Question Answering

A number of visual question answering approaches have been proposed rece...

TutorialVQA: Question Answering Dataset for Tutorial Videos

Despite the number of currently available datasets on video question ans...

Locate before Answering: Answer Guided Question Localization for Video Question Answering

Video question answering (VideoQA) is an essential task in vision-langua...

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Traffic event cognition and reasoning in videos is an important task tha...

Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework

Work to date on language-informed video understanding has primarily addr...

A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

While deep convolutional neural networks frequently approach or exceed h...

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

Existing deep video models are limited by specific tasks, fixed input-ou...

Please sign up or login with your details

Forgot password? Click here to reset