XtraLibD: Detecting Irrelevant Third-Party libraries in Java and Python Applications

02/22/2022
by   Ritu Kapur, et al.
0

Software development comprises the use of multiple Third-Party Libraries (TPLs). However, the irrelevant libraries present in software application's distributable often lead to excessive consumption of resources such as CPU cycles, memory, and modile-devices' battery usage. Therefore, the identification and removal of unused TPLs present in an application are desirable. We present a rapid, storage-efficient, obfuscation-resilient method to detect the irrelevant-TPLs in Java and Python applications. Our approach's novel aspects are i) Computing a vector representation of a .class file using a model that we call Lib2Vec. The Lib2Vec model is trained using the Paragraph Vector Algorithm. ii) Before using it for training the Lib2Vec models, a .class file is converted to a normalized form via semantics-preserving transformations. iii) A eXtra Library Detector (XtraLibD) developed and tested with 27 different language-specific Lib2Vec models. These models were trained using different parameters and >30,000 .class and >478,000 .py files taken from >100 different Java libraries and 43,711 Python available at MavenCentral.com and Pypi.com, respectively. XtraLibD achieves an accuracy of 99.48 score of 0.968 and outperforms the existing tools, viz., LibScout, LiteRadar, and LibD with an accuracy improvement of 74.5 respectively. Compared with LibD, XtraLibD achieves a response time improvement of 61.37 program artifacts are available at https://www.doi.org/10.5281/zenodo.5179747.

READ FULL TEXT
research
04/21/2022

LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries

Third-party libraries (TPLs) are reused frequently in software applicati...
research
02/25/2020

An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects

Third-party libraries are a central building block to develop software s...
research
05/12/2023

Design and Development of a Java Parallel I/O Library

Parallel I/O refers to the ability of scientific programs to concurrentl...
research
05/24/2020

Req2Lib: A Semantic Neural Model for Software Library Recommendation

Third-party libraries are crucial to the development of software project...
research
04/29/2021

The Behavioral Diversity of Java JSON Libraries

JSON is a popular file and data format that is precisely specified by th...
research
02/26/2022

Python for Smarter Cities: Comparison of Python libraries for static and interactive visualisations of large vector data

Local governments, as part of 'smart city' initiatives and to promote in...
research
01/13/2020

Testing with Jupyter notebooks: NoteBook VALidation (nbval) plug-in for pytest

The Notebook validation tool nbval allows to load and execute Python cod...

Please sign up or login with your details

Forgot password? Click here to reset