Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises

03/28/2023
by   Giovanni Rosa, et al.
0

Containerization allows developers to define the execution environment in which their software needs to be installed. Docker is the leading platform in this field, and developers that use it are required to write a Dockerfile for their software. Writing Dockerfiles is far from trivial, especially when the system has unusual requirements for its execution environment. Despite several tools exist to support developers in writing Dockerfiles, none of them is able to generate entire Dockerfiles from scratch given a high-level specification of the requirements of the execution environment. In this paper, we present a study in which we aim at understanding to what extent Deep Learning (DL), which has been proven successful for other coding tasks, can be used for this specific coding task. We preliminarily defined a structured natural language specification for Dockerfile requirements and a methodology that we use to automatically infer the requirements from the largest dataset of Dockerfiles currently available. We used the obtained dataset, with 670,982 instances, to train and test a Text-to-Text Transfer Transformer (T5) model, following the current state-of-the-art procedure for coding tasks, to automatically generate Dockerfiles from the structured specifications. The results of our evaluation show that T5 performs similarly to the more trivial IR-based baselines we considered. We also report the open challenges associated with the application of deep learning in the context of Dockerfile generation.

READ FULL TEXT
research
07/22/2021

An Empirical Study on Code Comment Completion

Code comments play a prominent role in program comprehension activities....
research
02/19/2021

"Do this! Do that!, And nothing will happen" Do specifications lead to securely stored passwords?

Does the act of writing a specification (how the code should behave) for...
research
01/06/2021

On the Requirements for Serious Games geared towards Software Developers in the Industry

Teaching industry staff on cybersecurity issues is a fundamental activit...
research
03/02/2023

Deep Learning Based Code Generation Methods: A Literature Review

Code Generation aims at generating relevant code fragments according to ...
research
07/10/2023

Can Large Language Models Write Good Property-Based Tests?

Property-based testing (PBT), while an established technique in the soft...
research
08/29/2017

Why feature dependencies challenge the requirements engineering of automotive systems: An empirical study

Functional dependencies and feature interactions in automotive software ...
research
04/06/2022

Data-Driven Approach for Log Instruction Quality Assessment

In the current IT world, developers write code while system operators ru...

Please sign up or login with your details

Forgot password? Click here to reset