Exploring and Adapting Chinese GPT to Pinyin Input Method

03/01/2022
by   Minghuan Tan, et al.
0

While GPT has become the de-facto method for text generation tasks, its application to pinyin input method remains unexplored. In this work, we make the first exploration to leverage Chinese GPT for pinyin input method. We find that a frozen GPT achieves state-of-the-art performance on perfect pinyin. However, the performance drops dramatically when the input includes abbreviated pinyin. A reason is that an abbreviated pinyin can be mapped to many perfect pinyin, which links to even larger number of Chinese characters. We mitigate this issue with two strategies, including enriching the context with pinyin and optimizing the training process to help distinguish homophones. To further facilitate the evaluation of pinyin input method, we create a dataset consisting of 270K instances from 15 domains. Results show that our approach improves performance on abbreviated pinyin across all domains. Model analysis demonstrates that both strategies contribute to the performance boost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning

The majority of Chinese characters are monophonic, i.e.their pronunciati...
research
05/26/2020

CalliGAN: Style and Structure-aware Chinese Calligraphy Character Generator

Chinese calligraphy is the writing of Chinese characters as an art form ...
research
02/07/2018

Unsupervised Typography Transfer

Traditional methods in Chinese typography synthesis view characters as a...
research
05/30/2023

Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training

We introduce CDBERT, a new learning paradigm that enhances the semantics...
research
04/09/2023

CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model

With the development of deep generative models, recent years have seen g...
research
03/13/2020

Generating Major Types of Chinese Classical Poetry in a Uniformed Framework

Poetry generation is an interesting research topic in the field of text ...
research
02/28/2016

Optimizing the Learning Order of Chinese Characters Using a Novel Topological Sort Algorithm

We present a novel algorithm for optimizing the order in which Chinese c...

Please sign up or login with your details

Forgot password? Click here to reset