A comparison of different clustering approaches for high-dimensional presence-absence data

08/20/2021
by   Gabriele d'Angella, et al.
0

Presence-absence data is defined by vectors or matrices of zeroes and ones, where the ones usually indicate a "presence" in a certain place. Presence-absence data occur for example when investigating geographical species distributions, genetic information, or the occurrence of certain terms in texts. There are many applications for clustering such data; one example is to find so-called biotic elements, i.e., groups of species that tend to occur together geographically. Presence-absence data can be clustered in various ways, namely using a latent class mixture approach with local independence, distance-based hierarchical clustering with the Jaccard distance, or also using clustering methods for continuous data on a multidimensional scaling representation of the distances. These methods are conceptually very different and can therefore not easily be compared theoretically. We compare their performance with a comprehensive simulation study based on models for species distributions.

READ FULL TEXT

page 8

page 17

research
09/05/2018

Preferential sampling for presence/absence data and for fusion of presence/absence data with presence-only data

Presence/absence data and presence-only data are the two customary sourc...
research
11/15/2017

Spatial Joint Species Distribution Modeling using Dirichlet Processes

Species distribution models usually attempt to explain presence-absence ...
research
03/30/2021

Integration of presence-only data from several sources. A case study on dolphins' spatial distribution

Presence-only data are a typical occurrence in species distribution mode...
research
08/26/2019

Clarifying species dependence under joint species distribution modeling

Joint species distribution modeling is attracting increasing attention t...
research
03/08/2016

A Bayesian non-parametric method for clustering high-dimensional binary data

In many real life problems, objects are described by large number of bin...
research
01/04/2018

Understanding the connections between species distribution models

Models for accurately predicting species distributions have become essen...
research
03/27/2019

Jaccard/Tanimoto similarity test and estimation methods

Binary data are used in a broad area of biological sciences. Using binar...

Please sign up or login with your details

Forgot password? Click here to reset