Arabic Dialect Identification in the Wild

05/13/2020
by   Ahmed Abdelali, et al.
0

We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the MENA (Middle East and North Africa) region. Our method for building this dataset relies on applying multiple filters to identify users who belong to different countries based on their account descriptions and to eliminate tweets that are either written in Modern Standard Arabic or contain inappropriate language. The resultant dataset contains 540k tweets from 2,525 users who are evenly distributed across 18 Arab countries. Using intrinsic evaluation, we show that the labels of a set of randomly selected tweets are 91.5 For extrinsic evaluation, we are able to build effective country-level dialect identification on tweets with a macro-averaged F1-score of 60.6 classes.

READ FULL TEXT

page 2

page 5

page 7

page 10

research
07/10/2020

Multi-Dialect Arabic BERT for Country-Level Dialect Identification

Arabic dialect identification is a complex problem for a number of inher...
research
03/04/2021

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

We present the findings and results of the Second Nuanced Arabic Dialect...
research
04/25/2016

Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

In contrast to much previous work that has focused on location classific...
research
02/19/2021

Dialect Identification in Nuanced Arabic Tweets Using Farasa Segmentation and AraBERT

This paper presents our approach to address the EACL WANLP-2021 Shared T...
research
10/21/2020

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

We present the results and findings of the First Nuanced Arabic Dialect ...
research
07/27/2020

NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets

In this paper, we present the system submitted to "SemEval-2020 Task 12"...
research
05/10/2021

Similarities between Arabic Dialects: Investigating Geographical Proximity

The automatic classification of Arabic dialects is an ongoing research c...

Please sign up or login with your details

Forgot password? Click here to reset