The Trade-offs of Domain Adaptation for Neural Language Models
In this paper, we connect language model adaptation with concepts of machine learning theory. We consider a training setup with a large out-of-domain set and a small in-domain set. As a first contribution, we derive how the benefit of training a model on either set depends on the size of the sets and the distance between their underlying distribution. As a second contribution, we present how the most popular data selection techniques – importance sampling, intelligent data selection and influence functions – can be presented in a common framework which highlights their similarity and also their subtle differences.
READ FULL TEXT