Automated Identification of Toxic Code Reviews: How Far Can We Go?

02/26/2022
by   Jaydeb Sarker, et al.
0

Toxic conversations during software development interactions may have serious repercussions on a Free and Open Source Software (FOSS) development project. For example, victims of toxic conversations may become afraid to express themselves, therefore get demotivated, and may eventually leave the project. Automated filtering of toxic conversations may help a FOSS community to maintain healthy interactions among its members. However, off-the-shelf toxicity detectors perform poorly on Software Engineering (SE) dataset, such as one curated from code review comments. To encounter this challenge, we present ToxiCR, a supervised learning-based toxicity identification tool for code review interactions. ToxiCR includes a choice to select one of the ten supervised learning algorithms, an option to select text vectorization techniques, five mandatory and three optional SE domain specific processing steps, and a large scale labeled dataset of 19,571 code review comments. With our rigorous evaluation of the models with various combinations of preprocessing steps and vectorization techniques, we have identified the best combination for our dataset that boosts 95.8 ToxiCR significantly outperforms existing toxicity detectors on our dataset. We have released our dataset, pretrained models, evaluation results, and source code publicly available at: https://github.com/WSU-SEAL/ToxiCR.

READ FULL TEXT
research
09/20/2020

A Benchmark Study of the Contemporary Toxicity Detectors on Software Engineering Interactions

Automated filtering of toxic conversations may help an Open-source softw...
research
07/07/2023

ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

Background: The existence of toxic conversations in open-source platform...
research
06/10/2023

Bootstrapping Code-Text Pretrained Language Model to Detect Inconsistency Between Code and Comment

Comments on source code serve as critical documentation for enabling dev...
research
08/23/2021

The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions

Code review is an important quality assurance activity for software deve...
research
08/31/2023

DevGPT: Studying Developer-ChatGPT Conversations

The emergence of large language models (LLMs) such as ChatGPT has disrup...
research
08/14/2023

CupCleaner: A Data Cleaning Approach for Comment Updating

Recently, deep learning-based techniques have shown promising performanc...
research
07/07/2023

Towards Automated Classification of Code Review Feedback to Support Analytics

Background: As improving code review (CR) effectiveness is a priority fo...

Please sign up or login with your details

Forgot password? Click here to reset