Explicit and Implicit Pattern Relation Analysis for Discovering Actionable Negative Sequences
Real-life events, behaviors and interactions produce sequential data. An important but rarely explored problem is to analyze those nonoccurring (also called negative) yet important sequences, forming negative sequence analysis (NSA). A typical NSA area is to discover negative sequential patterns (NSPs) consisting of important non-occurring and occurring elements and patterns. The limited existing work on NSP mining relies on frequentist and downward closure property-based pattern selection, producing large and highly redundant NSPs, nonactionable for business decision-making. This work makes the first attempt for actionable NSP discovery. It builds an NSP graph representation, quantify both explicit occurrence and implicit non-occurrence-based element and pattern relations, and then discover significant, diverse and informative NSPs in the NSP graph to represent the entire NSP set for discovering actionable NSPs. A DPP-based NSP representation and actionable NSP discovery method EINSP introduces novel and significant contributions for NSA and sequence analysis: (1) it represents NSPs by a determinantal point process (DPP) based graph; (2) it quantifies actionable NSPs in terms of their statistical significance, diversity, and strength of explicit/implicit element/pattern relations; and (3) it models and measures both explicit and implicit element/pattern relations in the DPP-based NSP graph to represent direct and indirect couplings between NSP items, elements and patterns. We substantially analyze the effectiveness of EINSP in terms of various theoretical and empirical aspects including complexity, item/pattern coverage, pattern size and diversity, implicit pattern relation strength, and data factors.
READ FULL TEXT