Prevalence, Contents and Automatic Detection of KL-SATD

08/12/2020
by   Leevi Rantala, et al.
0

When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52 contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2023

PENTACET data – 23 Million Contextual Code Comments and 500,000 SATD comments

Most Self-Admitted Technical Debt (SATD) research utilizes explicit SATD...
research
08/23/2022

Preprocessing Source Code Comments for Linguistic Models

Comments are an important part of the source code and are a primary sour...
research
02/04/2022

Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning

Technical debt is a metaphor indicating sub-optimal solutions implemente...
research
03/30/2022

A First Look at Duplicate and Near-duplicate Self-admitted Technical Debt Comments

Self-admitted technical debt (SATD) refers to technical debt that is int...
research
09/07/2021

FixMe: A GitHub Bot for Detecting and Monitoring On-Hold Self-Admitted Technical Debt

Self-Admitted Technical Debt (SATD) is a special form of technical debt ...
research
10/29/2019

MAT: A simple yet strong baseline for identifying self-admitted technical debt

In the process of software evolution, developers often sacrifice the lon...
research
03/23/2021

Characterising the Knowledge about Primitive Variables in Java Code Comments

Primitive types are fundamental components available in any programming ...

Please sign up or login with your details

Forgot password? Click here to reset