PYInfer: Deep Learning Semantic Type Inference for Python Variables

06/27/2021
by   Siwei Cui, et al.
0

Python type inference is challenging in practice. Due to its dynamic properties and extensive dependencies on third-party libraries without type annotations, the performance of traditional static analysis techniques is limited. Although semantics in source code can help manifest intended usage for variables (thus help infer types), they are usually ignored by existing tools. In this paper, we propose PYInfer, an end-to-end learning-based type inference tool that automatically generates type annotations for Python variables. The key insight is that contextual code semantics is critical in inferring the type for a variable. For each use of a variable, we collect a few tokens within its contextual scope, and design a neural network to predict its type. One challenge is that it is difficult to collect a high-quality human-labeled training dataset for this purpose. To address this issue, we apply an existing static analyzer to generate the ground truth for variables in source code. Our main contribution is a novel approach to statically infer variable types effectively and efficiently. Formulating the type inference as a classification problem, we can handle user-defined types and predict type probabilities for each variable. Our model achieves 91.2 in Python and 81.2 substantially outperform the state-of-the-art type annotators. Moreover, PYInfer achieves 5.2X more code coverage and is 187X faster than a state-of-the-art learning-based tool. With similar time consumption, our model annotates 5X more variables than a state-of-the-art static analysis tool. Our model also outperforms a learning-based function-level annotator on annotating types for variables and function arguments. All our tools and datasets are publicly available to facilitate future research in this direction.

READ FULL TEXT
research
04/29/2020

LambdaNet: Probabilistic Type Inference using Graph Neural Networks

As gradual typing becomes increasingly popular in languages like Python ...
research
01/28/2022

Large Scale Generation of Labeled Type Data for Python

Recently, dynamically typed languages, such as Python, have gained unpre...
research
01/12/2021

Type4Py: Deep Similarity Learning-Based Type Inference for Python

Dynamic languages, such as Python and Javascript, trade static typing fo...
research
02/15/2023

Path-sensitive Type Analysis with Backward Analysis for Quality Assurance of Dynamic Typed Language Code

Precise and fast static type analysis for dynamically typed language is ...
research
04/10/2021

ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference

In this paper, we present ManyTypes4Py, a large Python dataset for machi...
research
05/08/2021

HiTyper: A Hybrid Static Type Inference Framework with Neural Prediction

Type inference for dynamic programming languages is an important yet cha...
research
01/11/2023

Enhancing Comprehension and Navigation in Jupyter Notebooks with Static Analysis

Jupyter notebooks enable developers to interleave code snippets with ric...

Please sign up or login with your details

Forgot password? Click here to reset