Large language models, which are often trained for hundreds of thousands...
Mixture of Experts layers (MoEs) enable efficient scaling of language mo...
Large-scale autoregressive language models such as GPT-3 are few-shot
le...
Scaling semantic parsing models for task-oriented dialog systems to new
...