Beryllium: Neural Search for Algorithm Implementations

05/25/2023
by   Adithya Kulkarni, et al.
0

In this paper, we explore the feasibility of finding algorithm implementations from code. Successfully matching code and algorithms can help understand unknown code, provide reference implementations, and automatically collect data for learning-based program synthesis. To achieve the goal, we designed a new language named p-language to specify the algorithms and a static analyzer for the p-language to automatically extract control flow, math, and natural language information from the algorithm descriptions. We embedded the output of p-language (p-code) and source code in a common vector space using self-supervised machine learning methods to match algorithm with code without any manual annotation. We developed a tool named Beryllium. It takes pseudo code as a query and returns a list of ranked code snippets that likely match the algorithm query. Our evaluation on Stony Brook Algorithm Repository and popular GitHub projects show that Beryllium significantly outperformed the state-of-the-art code search tools in both C and Java. Specifically, for 98.5 93.8 10, and 1 ranked list, respectively. Given 87 algorithm queries, we found implementations for 74 algorithms in the GitHub projects where we did not know the algorithms before.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/12/2018

RACK: Code Search in the IDE using Crowdsourced Knowledge

Traditional code search engines often do not perform well with natural l...
04/06/2022

DiffSearch: A Scalable and Precise Search Engine for Code Changes

The source code of successful projects is evolving all the time, resulti...
06/01/2017

Function Assistant: A Tool for NL Querying of APIs

In this paper, we describe Function Assistant, a lightweight Python-base...
05/09/2019

When Deep Learning Met Code Search

There have been multiple recent proposals on using deep neural networks ...
06/16/2021

Cross-Language Code Search using Static and Dynamic Analyses

As code search permeates most activities in software development,code-to...
08/12/2020

OCoR: An Overlapping-Aware Code Retriever

Code retrieval helps developers reuse the code snippet in the open-sourc...

Please sign up or login with your details

Forgot password? Click here to reset