A Guided FP-growth algorithm for fast mining of frequent itemsets from big data

by   Lior Shabtay, et al.

In this paper we present the GFP-growth (Guided FP-growth) algorithm, a novel method for finding the count of a given list of itemsets in large data. Unlike FP-growth, our algorithm is designed to focus on the specific multiple itemsets of interest and hence its time and memory costs are better. We prove that the GFP-growth algorithm yields the exact frequency-counts for the required itemsets. We show that for a number of different problems, a solution can be devised which takes advantage of the efficient implementation of multi-targeted mining for boosting the performance. In particular, we study in detail the problem of generating the minority-class rules from imbalanced data, a scenario that appears in many real-life domains such as medical applications, failure prediction, network and cyber security, and maintenance. We develop the Minority-Report Algorithm that uses the GFP-growth for boosting performance. We prove some theoretical properties of the Minority-Report Algorithm and demonstrate its superior performance using simulations and real data.


page 1

page 2

page 3

page 4


A Guided FP-growth algorithm for multitude-targeted mining of big data

In this paper we present the GFP-growth (Guided FP-growth) algorithm, a ...

Learning of High Dengue Incidence with Clustering and FP-Growth Algorithm using WHO Historical Data

This paper applies FP-Growth algorithm in mining fuzzy association rules...

Evaluation of Frequent Itemset Mining Platforms using Apriori and FP-Growth Algorithm

With the overwhelming amount of complex and heterogeneous data pouring f...

Big Data Meet Cyber-Physical Systems: A Panoramic Survey

The world is witnessing an unprecedented growth of cyber-physical system...

Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms

Frequent itemset mining is a popular data mining technique. Apriori, Ecl...

Learning High-Order Interactions via Targeted Pattern Search

Logistic Regression (LR) is a widely used statistical method in empirica...

Please sign up or login with your details

Forgot password? Click here to reset