Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

by   Michael Fu, et al.

Deep learning (DL) models have become increasingly popular in identifying software vulnerabilities. Prior studies found that vulnerabilities across different vulnerable programs may exhibit similar vulnerable scopes, implicitly forming discernible vulnerability patterns that can be learned by DL models through supervised training. However, vulnerable scopes still manifest in various spatial locations and formats within a program, posing challenges for models to accurately identify vulnerable statements. Despite this challenge, state-of-the-art vulnerability detection approaches fail to exploit the vulnerability patterns that arise in vulnerable programs. To take full advantage of vulnerability patterns and unleash the ability of DL models, we propose a novel vulnerability-matching approach in this paper, drawing inspiration from program analysis tools that locate vulnerabilities based on pre-defined patterns. Specifically, a vulnerability codebook is learned, which consists of quantized vectors representing various vulnerability patterns. During inference, the codebook is iterated to match all learned patterns and predict the presence of potential vulnerabilities within a given program. Our approach was extensively evaluated on a real-world dataset comprising more than 188,000 C/C++ functions. The evaluation results show that our approach achieves an F1-score of 94 the previous best) for function and statement-level vulnerability identification, respectively. These substantial enhancements highlight the effectiveness of our approach to identifying vulnerabilities. The training code and pre-trained models are available at https://github.com/optimatch/optimatch.


An Information-Theoretic and Contrastive Learning-based Approach for Identifying Code Statements Causing Software Vulnerability

Software vulnerabilities existing in a program or function of computer s...

LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment through Program Metrics

Identifying potentially vulnerable locations in a code base is critical ...

VulSPG: Vulnerability detection based on slice property graph representation learning

Vulnerability detection is an important issue in software security. Alth...

DeFuzz: Deep Learning Guided Directed Fuzzing

Fuzzing is one of the most effective technique to identify potential sof...

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

Though many deep learning (DL)-based vulnerability detection approaches ...

Detecting software vulnerabilities using Language Models

Recently, deep learning techniques have garnered substantial attention f...

Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Response

Considerable delays often exist between the discovery of a vulnerability...

Please sign up or login with your details

Forgot password? Click here to reset