PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

03/23/2021
by   Egor Spirin, et al.
0

The application of machine learning algorithms to source code has grown in the past years. Since these algorithms are quite sensitive to input data, it is not surprising that researchers experiment with input representations. Nowadays, a popular starting point to represent code is abstract syntax trees (ASTs). Abstract syntax trees have been used for a long time in various software engineering domains, and in particular in IDEs. The API of modern IDEs allows to manipulate and traverse ASTs, resolve references between code elements, etc. Such algorithms can enrich ASTs with new data and therefore may be useful in ML-based code analysis. In this work, we present PSIMiner - a tool for processing PSI trees from the IntelliJ Platform. PSI trees contain code syntax trees as well as functions to work with them, and therefore can be used to enrich code representation using static analysis algorithms of modern IDEs. To showcase this idea, we use our tool to infer types of identifiers in Java ASTs and extend the code2seq model for the method name prediction problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2023

CodeLens: An Interactive Tool for Visualizing Code Representations

Representing source code in a generic input format is crucial to automat...
research
02/01/2019

Concrete Syntax with Black Box Parsers

Context: Meta programming consists for a large part of matching, analyzi...
research
09/30/2019

Continuous Flow Analysis to Detect Security Problems

We introduce a tool that supports continuous flow analysis in order to d...
research
06/17/2022

Evaluating the Impact of Source Code Parsers on ML4SE Models

As researchers and practitioners apply Machine Learning to increasingly ...
research
07/13/2021

Mining Idioms in the Wild

Existing code repositories contain numerous instances of code patterns t...
research
08/30/2021

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

Code summarization aims to generate concise natural language description...
research
12/23/2020

Crowdsmelling: The use of collective knowledge in code smells detection

Code smells are seen as major source of technical debt and, as such, sho...

Please sign up or login with your details

Forgot password? Click here to reset