Chinese Spelling Error Detection Using a Fusion Lattice LSTM

11/25/2019
by   Hao Wang, et al.
0

Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Due to the characteristics of Chinese Language, Chinese spelling error detection is more challenging than error detection in English. Existing methods are mainly under a pipeline framework, which artificially divides error detection process into two steps. Thus, these methods bring error propagation and cannot always work well due to the complexity of the language environment. Besides existing methods only adopt character or word information, and ignore the positive effect of fusing character, word, pinyin1 information together. We propose an LF-LSTM-CRF model, which is an extension of the LSTMCRF with word lattices and character-pinyin-fusion inputs. Our model takes advantage of the end-to-end framework to detect errors as a whole process, and dynamically integrates character, word and pinyin information. Experiments on the SIGHAN data show that our LF-LSTM-CRF outperforms existing methods with similar external resources consistently, and confirm the feasibility of adopting the end-to-end framework and the availability of integrating of character, word and pinyin information.

READ FULL TEXT
research
05/05/2018

Chinese NER Using Lattice LSTM

We investigate a lattice-structured LSTM model for Chinese NER, which en...
research
09/03/2019

Aspect Detection using Word and Char Embeddings with (Bi)LSTM and CRF

We proposed a new accurate aspect extraction method that makes use of bo...
research
11/07/2019

Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention

Most Chinese pre-trained encoders take a character as a basic unit and l...
research
10/30/2018

Subword Encoding in Lattice LSTM for Chinese Word Segmentation

We investigate a lattice LSTM network for Chinese word segmentation (CWS...
research
05/23/2018

Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques

Intent classification has been widely researched on English data with de...
research
02/27/2018

A Hybrid Word-Character Model for Abstractive Summarization

Abstractive summarization is the popular research topic nowadays. Due to...
research
08/14/2019

Raw-to-End Name Entity Recognition in Social Media

Taking word sequences as the input, typical named entity recognition (NE...

Please sign up or login with your details

Forgot password? Click here to reset