Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data

by   Gayan K. Kulatilleke, et al.

With growing credit card transaction volumes, the fraud percentages are also rising, including overhead costs for institutions to combat and compensate victims. The use of machine learning into the financial sector permits more effective protection against fraud and other economic crime. Suitably trained machine learning classifiers help proactive fraud detection, improving stakeholder trust and robustness against illicit transactions. However, the design of machine learning based fraud detection algorithms has been challenging and slow due the massively unbalanced nature of fraud data and the challenges of identifying the frauds accurately and completely to create a gold standard ground truth. Furthermore, there are no benchmarks or standard classifier evaluation metrics to measure and identify better performing classifiers, thus keeping researchers in the dark. In this work, we develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets. By conducting empirical experiments on a hypothetical classifier, with a synthetic data distribution approximated to a popular real world credit card fraud data set, we simulate human annotation errors and extreme imbalance to observe the behavior of popular machine learning classifier evaluation matrices. We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.


page 1

page 2

page 3

page 4


Challenges and Complexities in Machine Learning based Credit Card Fraud Detection

Credit cards play an exploding role in modern economies. Its popularity ...

Credit card fraud detection - Classifier selection strategy

Machine learning has opened up new tools for financial fraud detection. ...

Minimizing the Societal Cost of Credit Card Fraud with Limited and Imbalanced Data

Machine learning has automated much of financial fraud detection, notify...

GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks

Label errors have been found to be prevalent in popular text, vision, an...

Managers versus Machines: Do Algorithms Replicate Human Intuition in Credit Ratings?

We use machine learning techniques to investigate whether it is possible...

Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes

Fraud detection is a challenging task due to the changing nature of frau...

Copying Machine Learning Classifiers

We study model-agnostic copies of machine learning classifiers. We devel...

Please sign up or login with your details

Forgot password? Click here to reset