wubi2en: Character-level Chinese-English Translation through ASCII Encoding

05/09/2018
by   Mi Xue Tan, et al.
0

Character-level Neural Machine Translation (NMT) models have recently achieved impressive results on many language pairs. They particularly do well for Indo-European language pairs, where the languages share the same writing system. However, for translating between Chinese and English, the gap between the two different writing systems poses a major challenge due to a lack of systematic correspondence between the individual linguistic units. In this paper, we enable character-level NMT for Chinese, by breaking down Chinese characters to linguistic units similar to that of Indo-European languages using the Wubi encoding scheme. We show promising results from training Wubi-based models on the subword- and character-level with recurrent as well as convolutional models.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro