Towards better substitution-based word sense induction

05/29/2019
by   Asaf Amrami, et al.
0

Word sense induction (WSI) is the task of unsupervised clustering of word usages within a sentence to distinguish senses. Recent work obtain strong results by clustering lexical substitutes derived from pre-trained RNN language models (ELMo). Adapting the method to BERT improves the scores even further. We extend the previous method to support a dynamic rather than a fixed number of clusters as supported by other prominent methods, and propose a method for interpreting the resulting clusters by associating them with their most informative substitutes. We then perform extensive error analysis revealing the remaining sources of errors in the WSI task. Our code is available at https://github.com/asafamr/bertwsi.

READ FULL TEXT
research
10/11/2022

Word Sense Induction with Hierarchical Clustering and Mutual Information Maximization

Word sense induction (WSI) is a difficult problem in natural language pr...
research
06/23/2020

Combining Neural Language Models for WordSense Induction

Word sense induction (WSI) is the problem of grouping occurrences of an ...
research
03/27/2022

DeepDPM: Deep Clustering With an Unknown Number of Clusters

Deep Learning (DL) has shown great promise in the unsupervised task of c...
research
01/25/2021

PolyLM: Learning about Polysemy through Language Modeling

To avoid the "meaning conflation deficiency" of word embeddings, a numbe...
research
08/26/2018

Word Sense Induction with Neural biLM and Symmetric Patterns

An established method for Word Sense Induction (WSI) uses a language mod...
research
11/22/2018

AutoSense Model for Word Sense Induction

Word sense induction (WSI), or the task of automatically discovering mul...
research
06/30/2022

Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

Previous Part-Of-Speech (POS) induction models usually assume certain in...

Please sign up or login with your details

Forgot password? Click here to reset