Youngjae Yu

research

∙ 06/17/2023

CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents

In this paper, we focus on inferring whether the given user command is c...

0 Jeongeun Park, et al. ∙

research

∙ 10/27/2022

Learning Joint Representation of Human Motion and Language

In this work, we present MoLang (a Motion-Language connecting model) for...

0 Jihoon Kim, et al. ∙

research

∙ 09/19/2022

Active Visual Search in the Wild

In this paper, we focus on the problem of efficiently locating a target ...

0 Jeongeun Park, et al. ∙

research

∙ 10/11/2021

Pano-AVQA: Grounded Audio-Visual Question Answering on 360^∘ Videos

360^∘ videos convey holistic views for the surroundings of a scene. It p...

0 Heeseung Yun, et al. ∙

research

∙ 07/24/2021

Cycled Compositional Learning between Images and Text

We present an approach named the Cycled Composition Network that can mea...

0 Jongseok Kim, et al. ∙

research

∙ 06/04/2021

MERLOT: Multimodal Neural Script Knowledge Models

As humans, we understand events in the visual world contextually, perfor...

0 Rowan Zellers, et al. ∙

research

∙ 01/26/2021

Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning

Large-scale datasets are the cornerstone of self-supervised representati...

8 Sangho Lee, et al. ∙

research

∙ 12/08/2020

Parameter Efficient Multimodal Transformers for Video Representation Learning

The recent success of Transformers in the language domain has motivated ...

0 Sangho Lee, et al. ∙

research

∙ 06/11/2020

Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context

We present a novel data augmentation technique, CRA (Contextual Response...

0 Hankyol Lee, et al. ∙

research

∙ 03/27/2020

CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data

We present an approach named CurlingNet that can measure the semantic di...

0 Youngjae Yu, et al. ∙

research

∙ 08/07/2018

A Joint Sequence Fusion Model for Video Question Answering and Retrieval

We present an approach named JSFusion (Joint Sequence Fusion) that can m...

0 Youngjae Yu, et al. ∙

research

∙ 05/08/2018

A Memory Network Approach for Story-based Temporal Summarization of 360° Videos

We address the problem of story-based temporal summarization of long 360...

0 Sangho Lee, et al. ∙

research

∙ 01/31/2018

A Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360 Video

We address the problem of highlight detection from a 360 degree video by...

0 Youngjae Yu, et al. ∙

research

∙ 07/19/2017

Supervising Neural Attention Models for Video Captioning by Human Gaze Data

The attention mechanisms in deep neural networks are inspired by human's...

0 Youngjae Yu, et al. ∙

research

∙ 06/24/2017

Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset

YouTube-8M is the largest video dataset for multi-label video classifica...

0 Seil Na, et al. ∙

research

∙ 10/10/2016

End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering

We propose a high-level concept word detector that can be integrated wit...

0 Youngjae Yu, et al. ∙

Youngjae Yu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro