Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings

by   Jacob Turton, et al.

Models based on the transformer architecture, such as BERT, have marked a crucial step forward in the field of Natural Language Processing. Importantly, they allow the creation of word embeddings that capture important semantic information about words in context. However, as single entities, these embeddings are difficult to interpret and the models used to create them have been described as opaque. Binder and colleagues proposed an intuitive embedding space where each dimension is based on one of 65 core semantic features. Unfortunately, the space only exists for a small dataset of 535 words, limiting its uses. Previous work (Utsumi, 2018, 2020, Turton, Vinson Smith, 2020) has shown that Binder features can be derived from static embeddings and successfully extrapolated to a large new vocabulary. Taking the next step, this paper demonstrates that Binder features can be derived from the BERT embedding space. This provides contextualised Binder embeddings, which can aid in understanding semantic differences between words in context. It additionally provides insights into how semantic features are represented across the different layers of the BERT model.


page 5

page 15

page 16

page 17


What do you mean, BERT? Assessing BERT as a Distributional Semantics Model

Contextualized word embeddings, i.e. vector representations for words in...

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Word embeddings are a key component of high-performing natural language ...

Context encoders as a simple but powerful extension of word2vec

With a simple architecture and the ability to learn meaningful word embe...

Analyzing Transformer Dynamics as Movement through Embedding Space

Transformer language models exhibit intelligent behaviors such as unders...

Variation and Instability in Dialect-Based Embedding Spaces

This paper measures variation in embedding spaces which have been traine...

Diagnosing BERT with Retrieval Heuristics

Word embeddings, made widely popular in 2013 with the release of word2ve...

An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on Bert

Natural Language Processing (NLP) has been widely used in the semantic a...

Please sign up or login with your details

Forgot password? Click here to reset