Human Detection of Political Deepfakes across Transcripts, Audio, and Video

by   Matthew Groh, et al.

Recent advances in technology for hyper-realistic visual effects provoke the concern that deepfake videos of political speeches will soon be visually indistinguishable from authentic video recordings. Yet there exists little empirical research on how audio-visual information influences people's susceptibility to fall for political misinformation. The conventional wisdom in the field of communication research predicts that people will fall for fake news more often when the same version of a story is presented as a video as opposed to text. However, audio-visual manipulations often leave distortions that some but not all people may pick up on. Here, we evaluate how communication modalities influence people's ability to discern real political speeches from fabrications based on a randomized experiment with 5,727 participants who provide 61,792 truth discernment judgments. We show participants soundbites from political speeches that are randomly assigned to appear using permutations of text, audio, and video modalities. We find that communication modalities mediate discernment accuracy: participants are more accurate on video with audio than silent video, and more accurate on silent video than text transcripts. Likewise, we find participants rely more on how something is said (the audio-visual cues) rather than what is said (the speech content itself). However, political speeches that do not match public perceptions of politicians' beliefs reduce participants' reliance on visual cues. In particular, we find that reflective reasoning moderates the degree to which participants consider visual information: low performance on the Cognitive Reflection Test is associated with an underreliance on visual cues and an overreliance on what is said.


page 1

page 4

page 14

page 17


Emotions Don't Lie: A Deepfake Detection Method using Audio-Visual Affective Cues

We present a learning-based multimodal method for detecting real and dee...

Audio Deepfake Perceptions in College Going Populations

Deepfake is content or material that is generated or manipulated using A...

Comparing Human and Machine Deepfake Detection with Affective and Holistic Processing

The recent emergence of deepfake videos leads to an important societal q...

Automated Dyadic Data Recorder (ADDR) Framework and Analysis of Facial Cues in Deceptive Communication

We developed an online framework that can automatically pair two crowd-s...

Comparison of a Head-Mounted Display and a Curved Screen in a Multi-Talker Audiovisual Listening Task

Virtual audiovisual technology has matured and its use in research is wi...

Where and When: Space-Time Attention for Audio-Visual Explanations

Explaining the decision of a multi-modal decision-maker requires to dete...

Hidden bawls, whispers, and yelps: can text be made to sound more than just its words?

Whether a word was bawled, whispered, or yelped, captions will typically...

Please sign up or login with your details

Forgot password? Click here to reset