N-gram Statistical Stemmer for Bangla Corpus

12/25/2019
by   Rabeya Sadia, et al.
0

Stemming is a process that can be utilized to trim inflected words to stem or root form. It is useful for enhancing the retrieval effectiveness, especially for text search in order to solve the mismatch problems. Previous research on Bangla stemming mostly relied on eliminating multiple suffixes from a solitary word through a recursive rule based procedure to recover progressively applicable relative root. Our proposed system has enhanced the aforementioned exploration by actualizing one of the stemming algorithms called N-gram stemming. By utilizing an affiliation measure called dice coefficient, related sets of words are clustered depending on their character structure. The smallest word in one cluster may be considered as the stem. We additionally analyzed Affinity Propagation clustering algorithms with coefficient similarity as well as with median similarity. Our result indicates N-gram stemming techniques to be effective in general which gave us around 87 clusters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2017

Bangla Word Clustering Based on Tri-gram, 4-gram and 5-gram Language Model

In this paper, we describe a research method that generates Bangla word ...
research
07/27/2020

Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

Next word prediction is an input technology that simplifies the process ...
research
02/17/2021

Contextual Skipgram: Training Word Representation Using Context Information

The skip-gram (SG) model learns word representation by predicting the wo...
research
12/11/2019

Character 3-gram Mover's Distance: An Effective Method for Detecting Near-duplicate Japanese-language Recipes

In websites that collect user-generated recipes, recipes are often poste...
research
01/08/2017

Sentence-level dialects identification in the greater China region

Identifying the different varieties of the same language is more challen...
research
03/31/2017

N-gram Language Modeling using Recurrent Neural Network Estimation

We investigate the effective memory depth of RNN models by using them fo...

Please sign up or login with your details

Forgot password? Click here to reset