Development of POS tagger for English-Bengali Code-Mixed data

07/29/2020
by   Tathagata Raha, et al.
0

Code-mixed texts are widespread nowadays due to the advent of social media. Since these texts combine two languages to formulate a sentence, it gives rise to various research problems related to Natural Language Processing. In this paper, we try to excavate one such problem, namely, Parts of Speech tagging of code-mixed texts. We have built a system that can POS tag English-Bengali code-mixed data where the Bengali words were written in Roman script. Our approach initially involves the collection and cleaning of English-Bengali code-mixed tweets. These tweets were used as a development dataset for building our system. The proposed system is a modular approach that starts by tagging individual tokens with their respective languages and then passes them to different POS taggers, designed for different languages (English and Bengali, in our case). Tags given by the two systems are later joined together and the final result is then mapped to a universal POS tag set. Our system was checked using 100 manually POS tagged code-mixed sentences and it returned an accuracy of 75.29

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2016

Experiments with POS Tagging Code-mixed Indian Social Media Text

This paper presents Centre for Development of Advanced Computing Mumbai'...
research
02/01/2017

SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Use of social media has grown dramatically during the last few years. Us...
research
06/14/2018

Humor Detection in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System

The tremendous amount of user generated data through social networking s...
research
04/03/2018

Automatic Normalization of Word Variations in Code-Mixed Social Media Text

Social media platforms such as Twitter and Facebook are becoming popular...
research
01/06/2016

Part-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON 2015

This paper discusses the experiments carried out by us at Jadavpur Unive...
research
12/31/2020

The jsRealB Text Realizer: Organization and Use Cases

This paper describes the design principles behind jsRealB, a surface rea...
research
01/08/2018

Analyzing Roles of Classifiers and Code-Mixed factors for Sentiment Identification

Multilingual speakers often switch between languages to express themselv...

Please sign up or login with your details

Forgot password? Click here to reset