Word Sense Induction with Knowledge Distillation from BERT

04/20/2023
by   Anik Saha, et al.
0

Pre-trained contextual language models are ubiquitously employed for language understanding tasks, but are unsuitable for resource-constrained systems. Noncontextual word embeddings are an efficient alternative in these settings. Such methods typically use one vector to encode multiple different meanings of a word, and incur errors due to polysemy. This paper proposes a two-stage method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context and transferring this sense information to fit multi-sense embeddings in a skip-gram-like framework. We demonstrate an effective approach to training the sense disambiguation mechanism in our model with a distribution over word senses extracted from the output layer embeddings of BERT. Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings on multiple benchmark data sets, and experiments with an embedding-based topic model (ETM) demonstrates the benefits of using this multi-sense embedding in a downstream application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2021

Sense representations for Portuguese: experiments with sense embeddings and deep neural language models

Sense representations have gone beyond word representations like Word2Ve...
research
06/17/2016

Sense Embedding Learning for Word Sense Induction

Conventional word sense induction (WSI) methods usually represent each i...
research
01/25/2021

PolyLM: Learning about Polysemy through Language Modeling

To avoid the "meaning conflation deficiency" of word embeddings, a numbe...
research
04/09/2018

Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

Word sense induction (WSI), which addresses polysemy by unsupervised dis...
research
12/10/2020

Multi-Sense Language Modelling

The effectiveness of a language model is influenced by its token represe...
research
05/06/2018

Russian word sense induction by clustering averaged word embeddings

The paper reports our participation in the shared task on word sense ind...
research
09/11/2022

Probing for Understanding of English Verb Classes and Alternations in Large Pre-trained Language Models

We investigate the extent to which verb alternation classes, as describe...

Please sign up or login with your details

Forgot password? Click here to reset