Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources

01/28/2021
by   Kirill Milintsevich, et al.
0

We propose a novel hybrid approach to lemmatization that enhances the seq2seq neural model with additional lemmas extracted from an external lexicon or a rule-based system. During training, the enhanced lemmatizer learns both to generate lemmas via a sequential decoder and copy the lemma characters from the external candidates supplied during run-time. Our lemmatizer enhanced with candidates extracted from the Apertium morphological analyzer achieves statistically significant improvements compared to baseline models not utilizing additional lemma information, achieves an average accuracy of 97.25 on a set of 23 UD languages, which is 0.55 Stanford Stanza model on the same set of languages. We also compare with other methods of integrating external data into lemmatization and show that our enhanced system performs considerably better than a simple lexicon extension method based on the Stanza system, and it achieves complementary improvements w.r.t. the data augmentation method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2019

Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

In this paper we present a novel lemmatization method based on a sequenc...
research
04/17/2018

Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages

Morphological segmentation for polysynthetic languages is challenging, b...
research
07/05/2017

Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological Reinflection

This paper presents the submissions by the University of Zurich to the S...
research
12/21/2020

Pattern-aware Data Augmentation for Query Rewriting in Voice Assistant Systems

Query rewriting (QR) systems are widely used to reduce the friction caus...
research
09/14/2021

A Three Step Training Approach with Data Augmentation for Morphological Inflection

We present the BME submission for the SIGMORPHON 2021 Task 0 Part 1, Gen...
research
06/06/2017

Retrosynthetic reaction prediction using neural sequence-to-sequence models

We describe a fully data driven model that learns to perform a retrosynt...
research
03/16/2019

Improving Lemmatization of Non-Standard Languages with Joint Learning

Lemmatization of standard languages is concerned with (i) abstracting ov...

Please sign up or login with your details

Forgot password? Click here to reset