'Rarely' a problem? Language models exhibit inverse scaling in their predictions following 'few'-type quantifiers

12/16/2022

∙

Language Models appear to perform poorly on quantification. We ask how badly. 'Few'-type quantifiers, as in 'few children like vegetables' might pose a particular challenge for Language Models, since the sentence components without the quantifier are likely to co-occur, and because 'few'-type quantifiers are rare. We present 960 sentences stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes. Not only do the models perform poorly on 'few'-type quantifiers, but overall the larger the model, the worse its performance. We interpret this inverse scaling as suggesting that larger models increasingly reflect online rather than offline human processing, and argue that decreasing performance of larger models may challenge uses of Language Models as the basis for Natural Language Systems.

READ FULL TEXT

'Rarely' a problem? Language models exhibit inverse scaling in their predictions following 'few'-type quantifiers

Sign in with Google

Consider DeepAI Pro