Semblance: A Rank-Based Kernel on Probability Spaces for Niche Detection
Kernel methods provide a principled approach for detecting nonlinear relations using well understood linear algorithms. In exploratory data analyses when the underlying structure of the data's probability space is unclear, the choice of kernel is often arbitrary. Here, we present a novel kernel, Semblance, on a probability feature space. The advantage of Semblance lies in its distribution free formulation and its ability to detect niche features by placing greater emphasis on similarity between observation pairs that fall at the tail ends of a distribution, as opposed to those that fall towards the mean. We prove that Semblance is a valid Mercer kernel and illustrate its applicability through simulations and real world examples.
READ FULL TEXT