Albanian Language Identification in Text Documents

01/14/2019
by   Klesti Hoxha, et al.
0

In this work we investigate the accuracy of standard and state-of-the-art language identification methods in identifying Albanian in written text documents. A dataset consisting of news articles written in Albanian has been constructed for this purpose. We noticed a considerable decrease of accuracy when using test documents that miss the Albanian alphabet letters " Ë " and " Ç " and created a custom training corpus that solved this problem by achieving an accuracy of more than 99 performing language identification methods for Albanian use a naïve Bayes classifier and n-gram based classification features.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro