Critical Survey of the Freely Available Arabic Corpora

02/25/2017
by   Wajdi Zaghouani, et al.
0

The availability of corpora is a major factor in building natural language processing applications. However, the costs of acquiring corpora can prevent some researchers from going further in their endeavours. The ease of access to freely available corpora is urgent needed in the NLP research community especially for language such as Arabic. Currently, there is not easy was to access to a comprehensive and updated list of freely available Arabic corpora. We present in this paper, the results of a recent survey conducted to identify the list of the freely available Arabic corpora and language resources. Our preliminary results showed an initial list of 66 sources. We presents our findings in the various categories studied and we provided the direct links to get the data when possible.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2019

Arabic natural language processing: An overview

Arabic is recognised as the 4th most used language of the Internet. Arab...
research
09/26/2020

Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

Arabic dialect identification is a specific task of natural language pro...
research
04/11/2022

Resources for Turkish Natural Language Processing: A critical survey

This paper presents a comprehensive survey of corpora and lexical resour...
research
07/10/2012

Arabic CALL system based on pedagogically indexed text

This article introduces the benefits of using computer as a tool for for...
research
05/20/2022

Current Trends and Approaches in Synonyms Extraction: Potential Adaptation to Arabic

Extracting synonyms from dictionaries or corpora is gaining special atte...
research
01/10/2022

A Survey of Plagiarism Detection Systems: Case of Use with English, French and Arabic Languages

In academia, plagiarism is certainly not an emerging concern, but it bec...
research
12/31/2020

Open Korean Corpora: A Practical Report

Korean is often referred to as a low-resource language in the research c...

Please sign up or login with your details

Forgot password? Click here to reset