Can Linguistic Distance help Language Classification? Assessing Hawrami-Zaza and Kurmanji-Sorani

by   Hossein Hassani, et al.

To consider Hawrami and Zaza (Zazaki) standalone languages or dialects of a language have been discussed and debated for a while among linguists active in studying Iranian languages. The question of whether those languages/dialects belong to the Kurdish language or if they are independent descendants of Iranian languages was answered by MacKenzie (1961). However, a majority of people who speak the dialects are against that answer. Their disapproval mainly seems to be based on the sociological, cultural, and historical relationship among the speakers of the dialects. While the case of Hawrami and Zaza has remained unexplored and under-examined, an almost unanimous agreement exists about the classification of Kurmanji and Sorani as Kurdish dialects. The related studies to address the mentioned cases are primarily qualitative. However, computational linguistics could approach the question from a quantitative perspective. In this research, we look into three questions from a linguistic distance point of view. First, how similar or dissimilar Hawrami and Zaza are, considering no common geographical coexistence between the two. Second, what about Kurmanji and Sorani that have geographical overlap. Finally, what is the distance among all these dialects, pair by pair? We base our computation on phonetic presentations of these dialects (languages), and we calculate various linguistic distances among the pairs. We analyze the data and discuss the results to conclude.


page 1

page 2

page 3

page 4


TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

Confidently making progress on multilingual modeling requires challengin...

Quantitative methods for Phylogenetic Inference in Historical Linguistics: An experimental case study of South Central Dravidian

In this paper we examine the usefulness of two classes of algorithms Dis...

Revitalizing Endangered Languages: AI-powered language learning as a catalyst for language appreciation

According to UNESCO, there are nearly 7,000 languages spoken worldwide, ...

Linguistic Taboos and Euphemisms in Nepali

Languages across the world have words, phrases, and behaviors – the tabo...

Bhāx1E63ācitra: Visualising the dialect geography of South Asia

We present Bhāx1E63ācitra, a dialect mapping system for South Asia built...

Network Motifs Analysis of Croatian Literature

In this paper we analyse network motifs in the co-occurrence directed ne...

Linguistic Classification using Instance-Based Learning

Traditionally linguists have organized languages of the world as languag...

Please sign up or login with your details

Forgot password? Click here to reset