Automatic Extraction of Personality from Text: Challenges and Opportunities

by   Nazar Akrami, et al.

In this study, we examined the possibility to extract personality traits from a text. We created an extensive dataset by having experts annotate personality traits in a large number of texts from multiple online sources. From these annotated texts, we selected a sample and made further annotations ending up in a large low-reliability dataset and a small high-reliability dataset. We then used the two datasets to train and test several machine learning models to extract personality from text, including a language model. Finally, we evaluated our best models in the wild, on datasets from different domains. Our results show that the models based on the small high-reliability dataset performed better (in terms of R^2) than models based on large low-reliability dataset. Also, language model based on small high-reliability dataset performed better than the random baseline. Finally, and more importantly, the results showed our best model did not perform better than the random baseline when tested in the wild. Taken together, our results show that determining personality traits from a text remains a challenge and that no firm conclusions can be made on model performance before testing in the wild.


page 1

page 2

page 3

page 4


belabBERT: a Dutch RoBERTa-based language model applied to psychiatric classification

Natural language processing (NLP) is becoming an important means for aut...

DeepTextMark: Deep Learning based Text Watermarking for Detection of Large Language Model Generated Text

The capabilities of text generators have grown with the rapid developmen...

Deepfake Text Detection in the Wild

Recent advances in large language models have enabled them to reach a le...

RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

Information surrounds people in modern life. Text is a very efficient ty...

UPB at IberLEF-2023 AuTexTification: Detection of Machine-Generated Text using Transformer Ensembles

This paper describes the solutions submitted by the UPB team to the AuTe...

A Language-independent and Compositional Model for Personality Trait Recognition from Short Texts

Many methods have been used to recognize author personality traits from ...

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Research in mechanistic interpretability seeks to explain behaviors of m...

Please sign up or login with your details

Forgot password? Click here to reset