Exploiting Sentence Order in Document Alignment

04/30/2020
by   Brian Thompson, et al.
0

In this work, we exploit the simple idea that a document and its translation should contain approximately the same information, in approximately the same order. We propose methods for both document pair candidate generation and candidate re-scoring which incorporate high-level order information. Our method results in 61 result on the WMT16 document alignment shared task. We also apply our method to web-scraped Sinhala-English documents from ParaCrawl and find that our method improves MT performance by 1.2 BLEU over the current ParaCrawl document alignment method.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset