Reliable Measures of Spread in High Dimensional Latent Spaces

12/15/2022
by   Anna C. Marbut, et al.
0

Understanding geometric properties of natural language processing models' latent spaces allows the manipulation of these properties for improved performance on downstream tasks. One such property is the amount of data spread in a model's latent space, or how fully the available latent space is being used. In this work, we define data spread and demonstrate that the commonly used measures of data spread, Average Cosine Similarity and a partition function min/max ratio I(V), do not provide reliable metrics to compare the use of latent space across models. We propose and examine eight alternative measures of data spread, all but one of which improve over these current metrics when applied to seven synthetic data distributions. Of our proposed measures, we recommend one principal component-based measure and one entropy-based measure that provide reliable, relative measures of spread and can be used to compare models of different sizes and dimensionalities.

READ FULL TEXT
research
02/06/2019

Latent Space Cartography: Generalised Metric-Inspired Measures and Measure-Based Transformations for Generative Models

Deep generative models are universal tools for learning data distributio...
research
06/14/2020

PCAAE: Principal Component Analysis Autoencoder for organising the latent space of generative networks

Autoencoders and generative models produce some of the most spectacular ...
research
10/12/2022

Quasi-symbolic explanatory NLI via disentanglement: A geometrical examination

Disentangling the encodings of neural models is a fundamental aspect for...
research
12/09/2019

No Representation without Transformation

We propose to extend Latent Variable Models with a simple idea: learn to...
research
03/27/2023

Ensemble Latent Space Roadmap for Improved Robustness in Visual Action Planning

Planning in learned latent spaces helps to decrease the dimensionality o...
research
05/02/2023

Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement

Reliable application of machine learning is of primary importance to the...
research
09/22/2022

Assessing Robustness of EEG Representations under Data-shifts via Latent Space and Uncertainty Analysis

The recent availability of large datasets in bio-medicine has inspired t...

Please sign up or login with your details

Forgot password? Click here to reset