A Streaming Approach For Efficient Batched Beam Search

10/05/2020
by   Kevin Yang, et al.
0

We propose an efficient batching strategy for variable-length decoding on GPU architectures. During decoding, when candidates terminate or are pruned according to heuristics, our streaming approach periodically "refills" the batch before proceeding with a selected subset of candidates. We apply our method to variable-width beam search on a state-of-the-art machine translation model. Our method decreases runtime by up to 71 search baseline and 17 baselines' BLEU. Finally, experiments show that our method can speed up decoding in other domains, such as semantic and syntactic parsing.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset