Predicting and Explaining Behavioral Data with Structured Feature Space Decomposition

by   Peter G. Fennell, et al.

Modeling human behavioral data is challenging due to its scale, sparseness (few observations per individual), heterogeneity (differently behaving individuals), and class imbalance (few observations of the outcome of interest). An additional challenge is learning an interpretable model that not only accurately predicts outcomes, but also identifies important factors associated with a given behavior. To address these challenges, we describe a statistical approach to modeling behavioral data called the structured sum-of-squares decomposition (S3D). The algorithm, which is inspired by decision trees, selects important features that collectively explain the variation of the outcome, quantifies correlations between the features, and partitions the subspace of important features into smaller, more homogeneous blocks that correspond to similarly-behaving subgroups within the population. This partitioned subspace allows us to predict and analyze the behavior of the outcome variable both statistically and visually, giving a medium to examine the effect of various features and to create explainable predictions. We apply S3D to learn models of online activity from large-scale data collected from diverse sites, such as Stack Exchange, Khan Academy, Twitter, Duolingo, and Digg. We show that S3D creates parsimonious models that can predict outcomes in the held-out data at levels comparable to state-of-the-art approaches, but in addition, produces interpretable models that provide insights into behaviors. This is important for informing strategies aimed at changing behavior, designing social systems, but also for explaining predictions, a critical step towards minimizing algorithmic bias.


Using Simpson's Paradox to Discover Interesting Patterns in Behavioral Data

We describe a data-driven discovery method that leverages Simpson's para...

Predicting human decisions with behavioral theories and machine learning

Behavioral decision theories aim to explain human behavior. Can they hel...

Ex-Twit: Explainable Twitter Mining on Health Data

Since most machine learning models provide no explanations for the predi...

Your Actions or Your Associates? Predicting Certification and Dropout in MOOCs with Behavioral and Social Features

The high level of attrition and low rate of certification in Massive Ope...

Explainable Artificial Intelligence for Pharmacovigilance: What Features Are Important When Predicting Adverse Outcomes?

Explainable Artificial Intelligence (XAI) has been identified as a viabl...

Inferring the Spatial Distribution of Physical Activity in Children Population from Characteristics of the Environment

Obesity affects a rising percentage of the children and adolescent popul...

JigSaw: A tool for discovering explanatory high-order interactions from random forests

Machine learning is revolutionizing biology by facilitating the predicti...

Please sign up or login with your details

Forgot password? Click here to reset