Labeling of Query Words using Conditional Random Field

07/29/2016
by   Satanu Ghosh, et al.
0

This paper describes our approach on Query Word Labeling as an attempt in the shared task on Mixed Script Information Retrieval at Forum for Information Retrieval Evaluation (FIRE) 2015. The query is written in Roman script and the words were in English or transliterated from Indian regional languages. A total of eight Indian languages were present in addition to English. We also identified the Named Entities and special symbols as part of our task. A CRF based machine learning framework was used for labeling the individual words with their corresponding language labels. We used a dictionary based approach for language identification. We also took into account the context of the word while identifying the language. Our system demonstrated an overall accuracy of 75.5 the identification of token level language labels for Bengali, English and Hindi are 0.7486, 0.892 and 0.7972 respectively. The overall weighted F-measure of our system was 0.7498.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2016

UsingWord Embeddings for Query Translation for Hindi to English Cross Language Information Retrieval

Cross-Language Information Retrieval (CLIR) has become an important prob...
research
11/17/2022

CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts

The task of automatically identifying a language used in a given text is...
research
02/19/2023

Intent Identification and Entity Extraction for Healthcare Queries in Indic Languages

Scarcity of data and technological limitations for resource-poor languag...
research
10/09/2020

Word Level Language Identification in English Telugu Code Mixed Data

In a multilingual or sociolingual configuration Intra-sentential Code Sw...
research
10/17/2020

CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets

With the popularity of social media, communications through blogs, Faceb...
research
06/25/2011

Morphological Reconstruction for Word Level Script Identification

A line of a bilingual document page may contain text words in regional l...
research
03/02/2022

Stable and Semi-stable Sampling Approaches for Continuously Used Samples

Information retrieval systems are usually measured by labeling the relev...

Please sign up or login with your details

Forgot password? Click here to reset