VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements

by   Yangruibo Ding, et al.

Automatically locating vulnerable statements in source code is crucial to assure software security and alleviate developers' debugging efforts. This becomes even more important in today's software ecosystem, where vulnerable code can flow easily and unwittingly within and across software repositories like GitHub. Across such millions of lines of code, traditional static and dynamic approaches struggle to scale. Although existing machine-learning-based approaches look promising in such a setting, most work detects vulnerable code at a higher granularity – at the method or file level. Thus, developers still need to inspect a significant amount of code to locate the vulnerable statement(s) that need to be fixed. This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph and effectively understand code semantics and vulnerable patterns. To study VELVET's effectiveness, we use an off-the-shelf synthetic dataset and a recently published real-world dataset. In the static analysis setting, where vulnerable functions are not detected in advance, VELVET achieves 4.5x better performance than the baseline static analyzers on the real-world data. For the isolated vulnerability localization task, where we assume the vulnerability of a function is known while the specific vulnerable statement is unknown, we compare VELVET with several neural networks that also attend to local and global context of code. VELVET achieves 99.6 synthetic data and real-world data, respectively, outperforming the baseline deep-learning models by 5.3-29.0


On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

Many studies have developed Machine Learning (ML) approaches to detect S...

Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Identifying vulnerable code is a precautionary measure to counter softwa...

An Information-Theoretic and Contrastive Learning-based Approach for Identifying Code Statements Causing Software Vulnerability

Software vulnerabilities existing in a program or function of computer s...

Vulnerability Detection with Fine-grained Interpretations

Despite the successes of machine learning (ML) and deep learning (DL) ba...

VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source Control Flow

With the rapid development of the computer industry and computer softwar...

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

Though many deep learning (DL)-based vulnerability detection approaches ...

A Hybrid Graph Neural Network Approach for Detecting PHP Vulnerabilities

This paper presents DeepTective, a deep learning approach to detect vuln...

Please sign up or login with your details

Forgot password? Click here to reset