'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on YouTube

02/17/2022
by   Krithika Ramesh, et al.
0

Over the last few years, YouTube Kids has emerged as one of the highly competitive alternatives to television for children's entertainment. Consequently, YouTube Kids' content should receive an additional level of scrutiny to ensure children's safety. While research on detecting offensive or inappropriate content for kids is gaining momentum, little or no current work exists that investigates to what extent AI applications can (accidentally) introduce content that is inappropriate for kids. In this paper, we present a novel (and troubling) finding that well-known automatic speech recognition (ASR) systems may produce text content highly inappropriate for kids while transcribing YouTube Kids' videos. We dub this phenomenon as inappropriate content hallucination. Our analyses suggest that such hallucinations are far from occasional, and the ASR systems often produce them with high confidence. We release a first-of-its-kind data set of audios for which the existing state-of-the-art ASR systems hallucinate inappropriate content for kids. In addition, we demonstrate that some of these errors can be fixed using language models.

READ FULL TEXT
research
11/04/2022

Did your child get disturbed by an inappropriate advertisement on YouTube?

YouTube is a popular video platform for sharing creative content and ide...
research
01/21/2019

Disturbed YouTube for Kids: Characterizing and Detecting Disturbing Content on YouTube

A considerable number of the most-subscribed YouTube channels feature co...
research
04/29/2019

A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech

Automatic Speech Recognition (ASR) systems have proliferated over the re...
research
11/08/2016

Automatic recognition of child speech for robotic applications in noisy environments

Automatic speech recognition (ASR) allows a natural and intuitive interf...
research
11/20/2020

Are Chess Discussions Racist? An Adversarial Hate Speech Data Set

On June 28, 2020, while presenting a chess podcast on Grandmaster Hikaru...
research
02/11/2023

ASDF: A Differential Testing Framework for Automatic Speech Recognition Systems

Recent years have witnessed wider adoption of Automated Speech Recogniti...
research
05/31/2022

Conspiracy Brokers: Understanding the Monetization of YouTube Conspiracy Theories

Conspiracy theories are increasingly a subject of research interest as s...

Please sign up or login with your details

Forgot password? Click here to reset