The Importance of Suppressing Domain Style in Authorship Analysis

05/29/2020
by   Sebastian Bischoff, et al.
0

The prerequisite of many approaches to authorship analysis is a representation of writing style. But despite decades of research, it still remains unclear to what extent commonly used and widely accepted representations like character trigram frequencies actually represent an author's writing style, in contrast to more domain-specific style components or even topic. We address this shortcoming for the first time in a novel experimental setup of fixed authors but swapped domains between training and testing. With this setup, we reveal that approaches using character trigram features are highly susceptible to favor domain information when applied without attention to domains, suffering drops of up to 55.4 percentage points in classification accuracy under domain swapping. We further propose a new remedy based on domain-adversarial learning and compare it to ones from the literature based on heuristic rules. Both can work well, reducing accuracy losses under domain swapping to 3.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2023

Can Authorship Representation Learning Capture Stylistic Features?

Automatically disentangling an author's style from the content of their ...
research
05/02/2020

An Improved Topic Masking Technique for Authorship Analysis

Authorship verification (AV) is an important sub-area of digital text fo...
research
04/11/2022

Same Author or Just Same Topic? Towards Content-Independent Style Representations

Linguistic style is an integral component of language. Recent advances i...
research
01/05/2015

Chasing the Ghosts of Ibsen: A computational stylistic analysis of drama in translation

Research into the stylistic properties of translations is an issue which...
research
09/12/2019

Style-aware Neural Model with Application in Authorship Attribution

Writing style is a combination of consistent decisions associated with a...
research
02/24/2019

Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?

Textual deception constitutes a major problem for online security. Many ...
research
03/25/2021

A Machine Learning Pipeline for Automatic Extraction of Statistic Reports and Experimental Conditions from Scientific Papers

A common writing style for statistical results are the recommendations o...

Please sign up or login with your details

Forgot password? Click here to reset