xASTNN: Improved Code Representations for Industrial Practice

by   Zhiwei Xu, et al.

The application of deep learning techniques in software engineering becomes increasingly popular. One key problem is developing high-quality and easy-to-use source code representations for code-related tasks. The research community has acquired impressive results in recent years. However, due to the deployment difficulties and performance bottlenecks, seldom these approaches are applied to the industry. In this paper, we present xASTNN, an eXtreme Abstract Syntax Tree (AST)-based Neural Network for source code representation, aiming to push this technique to industrial practice. The proposed xASTNN has three advantages. First, xASTNN is completely based on widely-used ASTs and does not require complicated data pre-processing, making it applicable to various programming languages and practical scenarios. Second, three closely-related designs are proposed to guarantee the effectiveness of xASTNN, including statement subtree sequence for code naturalness, gated recursive unit for syntactical information, and gated recurrent unit for sequential information. Third, a dynamic batching algorithm is introduced to significantly reduce the time complexity of xASTNN. Two code comprehension downstream tasks, code classification and code clone detection, are adopted for evaluation. The results demonstrate that our xASTNN can improve the state-of-the-art while being faster than the baselines.


Cross-Language Binary-Source Code Matching with Intermediate Representations

Binary-source code matching plays an important role in many security and...

InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees

Building deep learning models on source code has found many successful s...

Unified Abstract Syntax Tree Representation Learning for Cross-Language Program Classification

Program classification can be regarded as a high-level abstraction of co...

Rethinking complexity for software code structures: A pioneering study on Linux kernel code repository

The recent progress of artificial intelligence(AI) has shown great poten...

CodeGRU: Context-aware Deep Learning with Gated Recurrent Unit for Source Code Modeling

Recently many NLP-based deep learning models have been applied to model ...

Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?

In recent years, there has been a wide interest in designing deep neural...

Towards a Catalogue of Software Quality Metrics for Infrastructure Code

Infrastructure-as-code (IaC) is a practice to implement continuous deplo...

Please sign up or login with your details

Forgot password? Click here to reset