Recovering Variable Names for Minified Code with Usage Contexts

06/08/2019
by   Hieu Tran, et al.
0

In modern Web technology, JavaScript (JS) code plays an important role. To avoid the exposure of original source code, the variable names in JS code deployed in the wild are often replaced by short, meaningless names, thus making the code extremely difficult to manually understand and analysis. This paper presents JSNeat, an information retrieval (IR)-based approach to recover the variable names in minified JS code. JSNeat follows a data-driven approach to recover names by searching for them in a large corpus of open-source JS code. We use three types of contexts to match a variable in given minified code against the corpus including the context of properties and roles of the variable, the context of that variable and relations with other variables under recovery, and the context of the task of the function to which the variable contributes. We performed several empirical experiments to evaluate JSNeat on the dataset of more than 322K JS files with 1M functions, and 3.5M variables with 176K unique variable names. We found that JSNeat achieves a high accuracy of 69.1 state-of-the-art approaches JSNice and JSNaughty, respectively. The time to recover for a file or for a variable with JSNeat is twice as fast as with JSNice and 4x as fast as with JNaughty, respectively.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 9

page 10

research
08/31/2018

Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts

Most of the JavaScript code deployed in the wild has been minified, a pr...
research
09/19/2019

DIRE: A Neural Approach to Decompiled Identifier Naming

The decompiler is one of the most common tools for examining binaries wi...
research
08/13/2021

Augmenting Decompiler Output with Learned Variable Names and Types

A common tool used by security professionals for reverse-engineering bin...
research
06/05/2023

LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis

Decompilation aims to recover the source code form of a binary executabl...
research
05/21/2023

Towards Tracing Code Provenance with Code Watermarking

Recent advances in large language models have raised wide concern in gen...
research
12/12/2021

Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks

Variable names are important to understand and maintain code. If a varia...
research
03/23/2021

Variable Name Recovery in Decompiled Binary Code using Constrained Masked Language Modeling

Decompilation is the procedure of transforming binary programs into a hi...

Please sign up or login with your details

Forgot password? Click here to reset