Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category Prediction

by   Shuqiang Jiang, et al.

In this work, we focus on visual venue category prediction, which can facilitate various applications for location-based service and personalization. Considering that the complementarity of different media platforms, it is reasonable to leverage venue-relevant media data from different platforms to boost the prediction performance. Intuitively, recognizing one venue category involves multiple semantic cues, especially objects and scenes, and thus they should contribute together to venue category prediction. In addition, these venues can be organized in a natural hierarchical structure, which provides prior knowledge to guide venue category estimation. Taking these aspects into account, we propose a Hierarchy-dependent Cross-platform Multi-view Feature Learning (HCM-FL) framework for venue category prediction from videos by leveraging images from other platforms. HCM-FL includes two major components, namely Cross-Platform Transfer Deep Learning (CPTDL) and Multi-View Feature Learning with the Hierarchical Venue Structure (MVFL-HVS). CPTDL is capable of reinforcing the learned deep network from videos using images from other platforms. Specifically, CPTDL first trained a deep network using videos. These images from other platforms are filtered by the learnt network and these selected images are then fed into this learnt network to enhance it. Two kinds of pre-trained networks on the ImageNet and Places dataset are employed. MVFL-HVS is then developed to enable multi-view feature fusion. It is capable of embedding the hierarchical structure ontology to support more discriminative joint feature learning. We conduct the experiment on videos from Vine and images from Foursqure. These experimental results demonstrate the advantage of our proposed framework.


Semantic Guided Level-Category Hybrid Prediction Network for Hierarchical Image Classification

Hierarchical classification (HC) assigns each object with multiple label...

MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Manipulation relationship detection (MRD) aims to guide the robot to gra...

A probabilistic framework for multi-view feature learning with many-to-many associations via neural networks

A simple framework Probabilistic Multi-view Graph Embedding (PMvGE) is p...

Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask

We introduce a novel approach that takes a single semantic mask as input...

CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative Visual Acuity Prediction

Surgery is the only viable treatment for cataract patients with visual a...

Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis

We study the problem of acoustic feature learning in the setting where w...

Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization

Cross-view geo-localization aims to spot images of the same location sho...

Please sign up or login with your details

Forgot password? Click here to reset