tmn at SemEval-2023 Task 9: Multilingual Tweet Intimacy Detection using XLM-T, Google Translate, and Ensemble Learning

by   Anna Glazkova, et al.

The paper describes a transformer-based system designed for SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. The purpose of the task was to predict the intimacy of tweets in a range from 1 (not intimate at all) to 5 (very intimate). The official training set for the competition consisted of tweets in six languages (English, Spanish, Italian, Portuguese, French, and Chinese). The test set included the given six languages as well as external data with four languages not presented in the training set (Hindi, Arabic, Dutch, and Korean). We presented a solution based on an ensemble of XLM-T, a multilingual RoBERTa model adapted to the Twitter domain. To improve the performance of unseen languages, each tweet was supplemented by its English translation. We explored the effectiveness of translated data for the languages seen in fine-tuning compared to unseen languages and estimated strategies for using translated data in transformer-based models. Our solution ranked 4th on the leaderboard while achieving an overall Pearson's r of 0.599 over the test set. The proposed system improves up to 0.088 Pearson's r over a score averaged across all 45 submissions.


page 1

page 2

page 3

page 4


SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

We propose MINT, a new Multilingual INTimacy analysis dataset covering 1...

Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages

This paper describes our participation in the shared task of hate speech...

OPI at SemEval 2023 Task 9: A Simple But Effective Approach to Multilingual Tweet Intimacy Analysis

This paper describes our submission to the SemEval 2023 multilingual twe...

IIIDYT at SemEval-2018 Task 3: Irony detection in English tweets

In this paper we introduce our system for the task of Irony detection in...

DWReCO at CheckThat! 2023: Enhancing Subjectivity Detection through Style-based Data Sampling

This paper describes our submission for the subjectivity detection task ...

Learning to Translate for Multilingual Question Answering

In multilingual question answering, either the question needs to be tran...

Assessment of Massively Multilingual Sentiment Classifiers

Models are increasing in size and complexity in the hunt for SOTA. But w...

Please sign up or login with your details

Forgot password? Click here to reset