Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder
In this paper, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples especially in the context of small data sets. The proposed method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain MRIs are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3 74.3 normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7 trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.
READ FULL TEXT