Cross-Domain Evaluation of a Deep Learning-Based Type Inference System

08/19/2022
by   Bernd Gruner, et al.
0

Optional type annotations allow for enriching dynamic programming languages with static typing features like better Integrated Development Environment (IDE) support, more precise program analysis, and early detection and prevention of type-related runtime errors. Machine learning-based type inference promises interesting results for automating this task. However, the practical usage of such systems depends on their ability to generalize across different domains, as they are often applied outside their training domain. In this work, we investigate Type4Py as a representative of state-of-the-art deep learning-based type inference systems, by conducting extensive cross-domain experiments. Thereby, we address the following problems: class imbalances, out-of-vocabulary words, dataset shifts, and unknown classes. To perform such experiments, we use the datasets ManyTypes4Py and CrossDomainTypes4Py. The latter we introduce in this paper. Our dataset enables the evaluation of type inference systems in different domains of software projects and has over 1,000,000 type annotations mined on the platforms GitHub and Libraries. It consists of data from the two domains web development and scientific calculation. Through our experiments, we detect that the shifts in the dataset and the long-tailed distribution with many rare and unknown data types decrease the performance of the deep learning-based type inference system drastically. In this context, we test unsupervised domain adaptation methods and fine-tuning to overcome these issues. Moreover, we investigate the impact of out-of-vocabulary words.

READ FULL TEXT

page 1

page 9

research
01/27/2020

Learning Transferrable Representations for Unsupervised Domain Adaptation

Supervised learning with large scale labelled datasets and deep layered ...
research
08/04/2023

TIPICAL – Type Inference for Python In Critical Accuracy Level

Type inference methods based on deep learning are becoming increasingly ...
research
01/12/2021

Type4Py: Deep Similarity Learning-Based Type Inference for Python

Dynamic languages, such as Python and Javascript, trade static typing fo...
research
06/29/2022

MaNi: Maximizing Mutual Information for Nuclei Cross-Domain Unsupervised Segmentation

In this work, we propose a mutual information (MI) based unsupervised do...
research
05/03/2017

Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation

While representation learning aims to derive interpretable features for ...
research
12/20/2020

Domain-adaptive Fall Detection Using Deep Adversarial Training

Fall detection (FD) systems are important assistive technologies for hea...
research
11/08/2022

Unsupervised Domain Adaptation for Sparse Retrieval by Filling Vocabulary and Word Frequency Gaps

IR models using a pretrained language model significantly outperform lex...

Please sign up or login with your details

Forgot password? Click here to reset