Exploiting Sentence Order in Document Alignment

04/30/2020

∙

In this work, we exploit the simple idea that a document and its translation should contain approximately the same information, in approximately the same order. We propose methods for both document pair candidate generation and candidate re-scoring which incorporate high-level order information. Our method results in 61 result on the WMT16 document alignment shared task. We also apply our method to web-scraped Sinhala-English documents from ParaCrawl and find that our method improves MT performance by 1.2 BLEU over the current ParaCrawl document alignment method.

READ FULL TEXT

Exploiting Sentence Order in Document Alignment

Sign in with Google

Consider DeepAI Pro