Khmer Word Search: Challenges, Solutions, and Semantic-Aware Search

12/16/2021
by   Rina Buoy, et al.
0

Search is one of the key functionalities in digital platforms and applications such as an electronic dictionary, a search engine, and an e-commerce platform. While the search function in some languages is trivial, Khmer word search is challenging given its complex writing system. Multiple orders of characters and different spelling realizations of words impose a constraint on Khmer word search functionality. Additionally, spelling mistakes are common since robust spellcheckers are not commonly available across the input device platforms. These challenges hinder the use of Khmer language in search-embedded applications. Moreover, due to the absence of WordNet-like lexical databases for Khmer language, it is impossible to establish semantic relation between words, enabling semantic search. In this paper, we propose a set of robust solutions to the above challenges associated with Khmer word search. The proposed solutions include character order normalization, grapheme and phoneme-based spellcheckers, and Khmer word semantic model. The semantic model is based on the word embedding model that is trained on a 30-million-word corpus and is used to capture the semantic similarities between words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2018

WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features

Text search based on lexical matching of keywords is not satisfactory du...
research
06/06/2017

Synergistic Union of Word2Vec and Lexicon for Domain Specific Semantic Similarity

Semantic similarity measures are an important part in Natural Language P...
research
09/29/2021

Context based Roman-Urdu to Urdu Script Transliteration System

Now a day computer is necessary for human being and it is very useful in...
research
06/17/2020

On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms

Word Embeddings are used widely in multiple Natural Language Processing ...
research
11/30/2019

Latent Semantic Search and Information Extraction Architecture

The motivation, concept, design and implementation of latent semantic se...
research
08/11/2017

Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding

Many word clouds provide no semantics to the word placement, but use a r...
research
08/25/2023

Media of Langue

This paper aims to archive the materials behind "Media of Langue" by Gok...

Please sign up or login with your details

Forgot password? Click here to reset