Adaptive Video Highlight Detection by Learning from User History
Recently, there is an increasing interest in highlight detection research where the goal is to create a short duration video from a longer video by extracting its interesting moments. However, most existing methods ignore the fact that the definition of video highlight is highly subjective. Different users may have different preferences of highlight for the same input video. In this paper, we propose a simple yet effective framework that learns to adapt highlight detection to a user by exploiting the user's history in the form of highlights that the user has previously created. Our framework consists of two sub-networks: a fully temporal convolutional highlight detection network H that predicts highlight for an input video and a history encoder network M for user history. We introduce a newly designed temporal-adaptive instance normalization (T-AIN) layer to H where the two sub-networks interact with each other. T-AIN has affine parameters that are predicted from M based on the user history and is responsible for the user-adaptive signal to H. Extensive experiments on a large-scale dataset show that our framework can make more accurate and user-specific highlight predictions.
READ FULL TEXT