XSTEM: An exemplar-based stemming algorithm

05/09/2022
by   Kirk Baker, et al.
0

Stemming is the process of reducing related words to a standard form by removing affixes from them. Existing algorithms vary with respect to their complexity, configurability, handling of unknown words, and ability to avoid under- and over-stemming. This paper presents a fast, simple, configurable, high-precision, high-recall stemming algorithm that combines the simplicity and performance of word-based lookup tables with the strong generalizability of rule-based methods to avert problems with out-of-vocabulary words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

UzbekStemmer: Development of a Rule-Based Stemming Algorithm for Uzbek Language

In this paper we present a rule-based stemming algorithm for the Uzbek l...
research
08/19/2017

A rule based algorithm for detecting negative words in Persian

In this paper, we present a novel method for detecting negative words in...
research
01/31/2016

WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter

Language in social media is mostly driven by new words and spellings tha...
research
10/28/2022

Development of a rule-based lemmatization algorithm through Finite State Machine for Uzbek language

Lemmatization is one of the core concepts in natural language processing...
research
10/16/2019

Rule based Approach for Word Normalization by resolving Transcription Ambiguity in Transliterated Search Queries

Query term matching with document term matching is the basic function of...
research
05/06/2017

Image Annotation using Multi-Layer Sparse Coding

Automatic annotation of images with descriptive words is a challenging p...
research
10/11/2017

Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science

We propose Marve, a system for extracting measurement values, units, and...

Please sign up or login with your details

Forgot password? Click here to reset