An Empirical Study of Deep Learning Models for Vulnerability Detection

by   Benjamin Steenhoek, et al.

Deep learning (DL) models of code have recently reported great progress for vulnerability detection. In some cases, DL-based models have outperformed static analysis tools. Although many great models have been proposed, we do not yet have a good understanding of these models. This limits the further advancement of model robustness, debugging, and deployment for the vulnerability detection. In this paper, we surveyed and reproduced 9 state-of-the-art (SOTA) deep learning models on 2 widely used vulnerability detection datasets: Devign and MSR. We investigated 6 research questions in three areas, namely model capabilities, training data, and model interpretation. We experimentally demonstrated the variability between different runs of a model and the low agreement among different models' outputs. We investigated models trained for specific types of vulnerabilities compared to a model that is trained on all the vulnerabilities at once. We explored the types of programs DL may consider "hard" to handle. We investigated the relations of training data sizes and training data composition with model performance. Finally, we studied model interpretations and analyzed important features that the models used to make predictions. We believe that our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models. All of our datasets, code, and results are available at


page 1

page 5

page 6

page 10


Deep Learning based Vulnerability Detection: Are We There Yet?

Automated detection of software vulnerabilities is a fundamental problem...

Limits of Machine Learning for Automatic Vulnerability Detection

Recent results of machine learning for automatic vulnerability detection...

The data synergy effects of time-series deep learning models in hydrology

When fitting statistical models to variables in geoscientific discipline...

Corpus-Based Paraphrase Detection Experiments and Review

Paraphrase detection is important for a number of applications, includin...

Efficient Training Data Generation for Phase-Based DOA Estimation

Deep learning (DL) based direction of arrival (DOA) estimation is an act...

Shallow or Deep? An Empirical Study on Detecting Vulnerabilities using Deep Learning

Deep learning (DL) techniques are on the rise in the software engineerin...

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

We conduct a systematic study of backdoor vulnerabilities in normally tr...

Please sign up or login with your details

Forgot password? Click here to reset