That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory

09/07/2022
by   Xuemei Tang, et al.
4

The evolution of language follows the rule of gradual change. Grammar, vocabulary, and lexical semantic shifts take place over time, resulting in a diachronic linguistic gap. As such, a considerable amount of texts are written in languages of different eras, which creates obstacles for natural language processing tasks, such as word segmentation and machine translation. Although the Chinese language has a long history, previous Chinese natural language processing research has primarily focused on tasks within a specific era. Therefore, we propose a cross-era learning framework for Chinese word segmentation (CWS), CROSSWISE, which uses the Switch-memory (SM) module to incorporate era-specific linguistic knowledge. Experiments on four corpora from different eras show that the performance of each corpus significantly improves. Further analyses also demonstrate that the SM can effectively integrate the knowledge of the eras into the neural network.

READ FULL TEXT
research
11/26/2018

LSICC: A Large Scale Informal Chinese Corpus

Deep learning based natural language processing model is proven powerful...
research
01/18/2019

Chinese Word Segmentation: Another Decade Review (2007-2017)

This paper reviews the development of Chinese word segmentation (CWS) in...
research
05/14/2019

Is Word Segmentation Necessary for Deep Learning of Chinese Representations?

Segmenting a chunk of text into words is usually the first step of proce...
research
03/18/2020

A Corpus of Adpositional Supersenses for Mandarin Chinese

Adpositions are frequent markers of semantic relations, but they are hig...
research
08/26/2020

Machine learning approach of Japanese composition scoring and writing aided system's design

Automatic scoring system is extremely complex for any language. Because ...
research
12/19/2018

Switch-LSTMs for Multi-Criteria Chinese Word Segmentation

Multi-criteria Chinese word segmentation is a promising but challenging ...
research
03/07/2017

Building a Syllable Database to Solve the Problem of Khmer Word Segmentation

Word segmentation is a basic problem in natural language processing. Wit...

Please sign up or login with your details

Forgot password? Click here to reset