Time-inhomogeneous diffusion geometry and topology

03/28/2022
by   Guillaume Huguet, et al.
19

Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2022

Geometry of Data

Topological data analysis asks when balls in a metric space (X,d) inters...
research
04/03/2021

Joint Geometric and Topological Analysis of Hierarchical Datasets

In a world abundant with diverse data arising from complex acquisition t...
research
11/19/2015

Diffusion Representations

Diffusion Maps framework is a kernel based method for manifold learning ...
research
07/10/2019

Coarse Graining of Data via Inhomogeneous Diffusion Condensation

Big data often has emergent structure that exists at multiple levels of ...
research
07/25/2022

Orthogonalization of data via Gromov-Wasserstein type feedback for clustering and visualization

In this paper we propose an adaptive approach for clustering and visuali...
research
03/07/2020

Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

Data-dependent metrics are powerful tools for learning the underlying st...
research
05/30/2019

Learning by Active Nonlinear Diffusion

This article proposes an active learning method for high dimensional dat...

Please sign up or login with your details

Forgot password? Click here to reset