Intrinsic dimension estimation for discrete metrics

07/20/2022
by   Iuri Macocco, et al.
0

Real world-datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods are designed for continuous spaces, and their use for discrete spaces can lead to errors and biases. In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces. We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting, finding a surprisingly small ID, of order 2. This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2019

Estimating the effective dimension of large biological datasets using Fisher separability analysis

Modern large-scale datasets are frequently said to be high-dimensional. ...
research
06/18/2019

Intrinsic dimension estimation for locally undersampled data

High-dimensional data are ubiquitous in contemporary science and finding...
research
10/11/2022

Intrinsic Dimension for Large-Scale Geometric Learning

The concept of dimension is essential to grasp the complexity of data. A...
research
04/18/2021

The Intrinsic Dimension of Images and Its Impact on Learning

It is widely believed that natural image data exhibits low-dimensional s...
research
04/16/2023

Autoencoders with Intrinsic Dimension Constraints for Learning Low Dimensional Image Representations

Autoencoders have achieved great success in various computer vision appl...
research
03/19/2018

Estimating the intrinsic dimension of datasets by a minimal neighborhood information

Analyzing large volumes of high-dimensional data is an issue of fundamen...
research
04/05/2023

Local Intrinsic Dimensional Entropy

Most entropy measures depend on the spread of the probability distributi...

Please sign up or login with your details

Forgot password? Click here to reset