In this work, we develop and release Llama 2, a collection of pretrained...
Driven by the goal of eradicating language barriers on a global scale,
m...
State-of-the-art encoder-decoder models (e.g. for machine translation (M...
We describe Facebook's multilingual model submission to the WMT2021 shar...
Document-level machine translation conditions on surrounding sentences t...
Pre-training models on vast quantities of unlabeled data has emerged as ...
Existing work in translation demonstrated the potential of massively
mul...
Many semi- and weakly-supervised approaches have been investigated for
o...
Open-domain question answering relies on efficient passage retrieval to
...
This paper demonstrates that multilingual denoising pre-training produce...
We show that margin-based bitext mining in a multilingual sentence space...
Supervised ASR models have reached unprecedented levels of accuracy, tha...
Back-translation is a widely used data augmentation technique which leve...
This paper describes Facebook FAIR's submission to the WMT19 shared news...
The lottery ticket hypothesis proposes that over-parameterization of dee...
fairseq is an open-source sequence modeling toolkit that allows research...
Pre-trained language model representations have been successful in a wid...
We present a new approach for pretraining a bi-directional transformer m...
An effective method to improve neural machine translation with monolingu...
Sequence to sequence learning models still require several days to reach...
There has been much recent work on training neural attention models at t...