Category-Based Deep CCA for Fine-Grained Venue Discovery from Multimodal Data

05/08/2018
by   Yi Yu, et al.
0

In this work, travel destination and business location are taken as venues. Discovering a venue by a photo is very important for context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photos generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, Category-based Deep Canonical Correlation Analysis (C-DCCA). Given a photo as input, this model performs (i) exact venue search (find the venue where the photo was taken), and (ii) group venue search (find relevant venues with the same category as that of the photo), by the cross-modal correlation between the input photo and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modal data from the same venue) for exact venue search and category-based correlation (between different modal data from different venues with the same category) for group venue search are jointly optimized. Because a photo cannot fully reflect rich text description of a venue, the number of photos per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal dataset by integrating Wikipedia featured articles and Foursquare venue photos. Experimental results on this dataset confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available dataset confirms that the proposed method outperforms state-of-the-arts for cross-modal retrieval between image and text.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2022

MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with Multimodal Fine-grained Aligned Annotations

Multimodal fine-grained sentiment analysis has recently attracted increa...
research
09/02/2021

AnANet: Modeling Association and Alignment for Cross-modal Correlation Classification

The explosive increase of multimodal data makes a great demand in many c...
research
07/29/2020

Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Sketch as an image search query is an ideal alternative to text in captu...
research
12/16/2021

Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization

Multimodal summarization with multimodal output (MSMO) generates a summa...
research
06/13/2018

Cross-modal Hallucination for Few-shot Fine-grained Recognition

State-of-the-art deep learning algorithms generally require large amount...
research
03/24/2023

Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR

This paper advances the fine-grained sketch-based image retrieval (FG-SB...
research
03/24/2018

Can We Predict the Scenic Beauty of Locations from Geo-tagged Flickr Images?

In this work, we propose a novel technique to determine the aesthetic sc...

Please sign up or login with your details

Forgot password? Click here to reset