Mapping Researcher Activity based on Publication Data by means of Transformers
Modern performance on several natural language processing (NLP) tasks has been enhanced thanks to the Transformer-based pre-trained language model BERT. We employ this concept to investigate a local publication database. Research papers are encoded and clustered to form a landscape view of the scientific topics, in which research is active. Authors working on similar topics can be identified by calculating the similarity between their papers. Based on this, we define a similarity metric between authors. Additionally we introduce the concept of self-similarity to indicate the topical variety of authors.
READ FULL TEXT