Structural Language Models of Code

09/30/2019
by   Uri Alon, et al.
0

We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program's abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous techniques that have severely restricted the kinds of expressions that can be generated in this task, our approach can generate arbitrary code in any programming language. Our model significantly outperforms both seq2seq and a variety of structured approaches in generating Java and C# code. We make our code, datasets, and models publicly available.

READ FULL TEXT
research
09/30/2019

Structural Language Models for Any-Code Generation

We address the problem of Any-Code Generation (AnyGen) - generating code...
research
08/04/2018

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snip...
research
08/16/2021

Autoencoders as Tools for Program Synthesis

Recently there have been many advances in research on language modeling ...
research
05/26/2021

TreeBERT: A Tree-Based Pre-Trained Model for Programming Language

Source code can be parsed into the abstract syntax tree (AST) based on d...
research
05/27/2020

A Structural Model for Contextual Code Changes

We address the problem of predicting edit completions based on a learned...
research
04/14/2020

Code Completion using Neural Attention and Byte Pair Encoding

In this paper, we aim to do code completion based on implementing a Neur...
research
06/01/2023

AI Chain on Large Language Model for Unsupervised Control Flow Graph Generation for Statically-Typed Partial Code

Control Flow Graphs (CFGs) are essential for visualizing, understanding ...

Please sign up or login with your details

Forgot password? Click here to reset