Managing dataset shift by adversarial validation for credit scoring

by   Hongyi Qian, et al.

Dataset shift is common in credit scoring scenarios, and the inconsistency between the distribution of training data and the data that actually needs to be predicted is likely to cause poor model performance. However, most of the current studies do not take this into account, and they directly mix data from different time periods when training the models. This brings about two problems. Firstly, there is a risk of data leakage, i.e., using future data to predict the past. This can result in inflated results in offline validation, but unsatisfactory results in practical applications. Secondly, the macroeconomic environment and risk control strategies are likely to be different in different time periods, and the behavior patterns of borrowers may also change. The model trained with past data may not be applicable to the recent stage. Therefore, we propose a method based on adversarial validation to alleviate the dataset shift problem in credit scoring scenarios. In this method, partial training set samples with the closest distribution to the predicted data are selected for cross-validation by adversarial validation to ensure the generalization performance of the trained model on the predicted samples. In addition, through a simple splicing method, samples in the training data that are inconsistent with the test data distribution are also involved in the training process of cross-validation, which makes full use of all the data and further improves the model performance. To verify the effectiveness of the proposed method, comparative experiments with several other data split methods are conducted with the data provided by Lending Club. The experimental results demonstrate the importance of dataset shift in the field of credit scoring and the superiority of the proposed method.


page 9

page 13


Mobile Phone Usage Data for Credit Scoring

The aim of this study is to demostrate that mobile phone usage data can ...

Shallow Self-Learning for Reject Inference in Credit Scoring

Credit scoring models support loan approval decisions in the financial s...

Two-stage Modeling for Prediction with Confidence

The use of neural networks has been very successful in a wide variety of...

A Novel Classification Approach for Credit Scoring based on Gaussian Mixture Models

Credit scoring is a rapidly expanding analytical technique used by banks...

A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization

Conventional DNN training paradigms typically rely on one training set a...

Shift Happens: Adjusting Classifiers

Minimizing expected loss measured by a proper scoring rule, such as Brie...

DIVA: Dataset Derivative of a Learning Task

We present a method to compute the derivative of a learning task with re...

Please sign up or login with your details

Forgot password? Click here to reset