AutoPruner: Transformer-Based Call Graph Pruning

by   Thanh Le-Cong, et al.

Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture information from the call graph by extracting structural features for use in a random forest classifier. It then removes edges that are predicted to be false positives. Despite the improvements shown by machine learning models, they are still limited as they do not consider the source code semantics and thus often are not able to effectively distinguish true and false positives. In this paper, we present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis. Given a call graph constructed by traditional static analysis tools, AutoPruner takes a Transformer-based approach to capture the semantic relationships between the caller and callee functions associated with each edge in the call graph. To do so, AutoPruner fine-tunes a model of code that was pre-trained on a large corpus to represent source code based on descriptions of its semantics. Next, the model is used to extract semantic features from the functions related to each edge in the call graph. AutoPruner uses these semantic features together with the structural features extracted from the call graph to classify each edge via a feed-forward neural network. Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines, improving on F-measure by up to 13 static call graph.


What does Transformer learn about source code?

In the field of source code processing, the transformer-based representa...

Solving the "false positives" problem in fraud prediction

In this paper, we present an automated feature engineering based approac...

An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph

Over the years, open-source software systems have become prey to threat ...

Automated Static Warning Identification via Path-based Semantic Representation

Despite their ability to aid developers in detecting potential defects e...

Test Suites as a Source of Training Data for Static Analysis Alert Classifiers

Flaw-finding static analysis tools typically generate large volumes of c...

Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction

Existing defects in software components is unavoidable and leads to not ...

Nowhere to Hide: Detecting Obfuscated Fingerprinting Scripts

As the web moves away from stateful tracking, browser fingerprinting is ...

Please sign up or login with your details

Forgot password? Click here to reset