RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring

by   Hao Liu, et al.

Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring is the most frequently performed refactoring that suggests a new name for an identifier to enhance readability when the identifier is poorly named. However, most existing works only identify renaming activities between two versions of source code, while few works express concern about how to suggest a new name. In this paper, we study automatic rename refactoring on variable names, which is considered more challenging than other rename refactoring activities. We first point out the connections between rename refactoring and various prevalent learning paradigms and the difference between rename refactoring and general text generation in natural language processing. Based on our observations, we propose RefBERT, a two-stage pre-trained framework for rename refactoring on variable names. RefBERT first predicts the number of sub-tokens in the new name and then generates sub-tokens accordingly. Several techniques, including constrained masked language modeling, contrastive learning, and the bag-of-tokens loss, are incorporated into RefBERT to tailor it for automatic rename refactoring on variable names. Through extensive experiments on our constructed refactoring datasets, we show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.


page 5

page 6


Variable Name Recovery in Decompiled Binary Code using Constrained Masked Language Modeling

Decompilation is the procedure of transforming binary programs into a hi...

Topic modeling of public repositories at scale using names in source code

Programming languages themselves have a limited number of reserved keywo...

Improving Semantic Consistency of Variable Names with Use-Flow Graph Analysis

Consistency is one of the keys to maintainable source code and hence a s...

Pre-trained Contextual Embedding of Source Code

The source code of a program not only serves as a formal description of ...

Neural Code Completion with Anonymized Variable Names

Source code processing heavily relies on the methods widely used in natu...

Towards Tracing Code Provenance with Code Watermarking

Recent advances in large language models have raised wide concern in gen...

Please sign up or login with your details

Forgot password? Click here to reset