Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

by   Yufeng Huang, et al.

Multi-modal aspect-based sentiment classification (MABSC) is an emerging classification task that aims to classify the sentiment of a given target such as a mentioned entity in data with different modalities. In typical multi-modal data with text and image, previous approaches do not make full use of the fine-grained semantics of the image, especially in conjunction with the semantics of the text and do not fully consider modeling the relationship between fine-grained image information and target, which leads to insufficient use of image and inadequate to identify fine-grained aspects and opinions. To tackle these limitations, we propose a new framework SeqCSG including a method to construct sequential cross-modal semantic graphs and an encoder-decoder model. Specifically, we extract fine-grained information from the original image, image caption, and scene graph, and regard them as elements of the cross-modal semantic graph as well as tokens from texts. The cross-modal semantic graph is represented as a sequence with a multi-modal visible matrix indicating relationships between elements. In order to effectively utilize the cross-modal semantic graph, we propose an encoder-decoder method with a target prompt template. Experimental results show that our approach outperforms existing methods and achieves the state-of-the-art on two standard datasets MABSC. Further analysis demonstrates the effectiveness of each component and our model can implicitly learn the correlation between the target and fine-grained information of the image.


page 3

page 7


MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with Multimodal Fine-grained Aligned Annotations

Multimodal fine-grained sentiment analysis has recently attracted increa...

Fine-Grained Chemical Entity Typing with Multimodal Knowledge Representation

Automated knowledge discovery from trending chemical literature is essen...

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

Our goal in this research is to study a more realistic environment in wh...

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Vision-and-language pretraining (VLP) aims to learn generic multimodal r...

WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation

The top-down and bottom-up methods are two mainstreams of referring segm...

AnANet: Modeling Association and Alignment for Cross-modal Correlation Classification

The explosive increase of multimodal data makes a great demand in many c...

Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering

Multi-modal keyphrase generation aims to produce a set of keyphrases tha...

Please sign up or login with your details

Forgot password? Click here to reset