Cross-Lingual Task-Specific Representation Learning for Text Classification in Resource Poor Languages

06/10/2018
by   Nurendra Choudhary, et al.
0

Neural network models have shown promising results for text classification. However, these solutions are limited by their dependence on the availability of annotated data. The prospect of leveraging resource-rich languages to enhance the text classification of resource-poor languages is fascinating. The performance on resource-poor languages can significantly improve if the resource availability constraints can be offset. To this end, we present a twin Bidirectional Long Short Term Memory (Bi-LSTM) network with shared parameters consolidated by a contrastive loss function (based on a similarity metric). The model learns the representation of resource-poor and resource-rich sentences in a common space by using the similarity between their assigned annotation tags. Hence, the model projects sentences with similar tags closer and those with different tags farther from each other. We evaluated our model on the classification tasks of sentiment analysis and emoji prediction for resource-poor languages - Hindi and Telugu and resource-rich languages - English and Spanish. Our model significantly outperforms the state-of-the-art approaches in both the tasks across all metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2018

Emotions are Universal: Learning Sentiment Based Representations of Resource-Poor Languages using Siamese Networks

Machine learning approaches in sentiment analysis principally rely on th...
research
04/03/2018

Contrastive Learning of Emoji-based Representations for Resource-Poor Languages

The introduction of emojis (or emoticons) in social media platforms has ...
research
11/07/2016

AC-BLSTM: Asymmetric Convolutional Bidirectional LSTM Networks for Text Classification

Recently deeplearning models have been shown to be capable of making rem...
research
05/21/2020

CHEER: Rich Model Helps Poor Model via Knowledge Infusion

There is a growing interest in applying deep learning (DL) to healthcare...
research
05/24/2021

Cross-lingual Text Classification with Heterogeneous Graph Neural Network

Cross-lingual text classification aims at training a classifier on the s...
research
02/20/2021

An Attention Ensemble Approach for Efficient Text Classification of Indian Languages

The recent surge of complex attention-based deep learning architectures ...
research
04/30/2020

Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction

Training recurrent neural networks on long texts, in particular scholarl...

Please sign up or login with your details

Forgot password? Click here to reset