Differentiable Entailment for Parameter Efficient Few Shot Learning
Few-shot learning allows pre-trained language models to adapt to downstream tasks while using a limited number of training examples. However, practical applications are limited when all model parameters must be optimized. In this work we apply a new technique for parameter efficient few shot learning while adopting a strict definition of parameter efficiency. Our training method combines 1) intermediate training by reformulating natural language tasks as entailment tasks <cit.> and 2) differentiable optimization of template and label tokens <cit.>. We quantify the tradeoff between parameter efficiency and performance in the few-shot regime and propose a simple model agnostic approach that can be extended to any task By achieving competitive performance while only optimizing 3% of a model's parameters and allowing for batched inference, we allow for more efficient practical deployment of models.
READ FULL TEXT