PDC – a probabilistic distributional clustering algorithm: a case study on suicide articles in PubMed

12/04/2019
by   Rezarta Islamaj, et al.
0

The need to organize a large collection in a manner that facilitates human comprehension is crucial given the ever-increasing volumes of information. In this work, we present PDC (probabilistic distributional clustering), a novel algorithm that, given a document collection, computes disjoint term sets representing topics in the collection. The algorithm relies on probabilities of word co-occurrences to partition the set of terms appearing in the collection of documents into disjoint groups of related terms. In this work, we also present an environment to visualize the computed topics in the term space and retrieve the most related PubMed articles for each group of terms. We illustrate the algorithm by applying it to PubMed documents on the topic of suicide. Suicide is a major public health problem identified as the tenth leading cause of death in the US. In this application, our goal is to provide a global view of the mental health literature pertaining to the subject of suicide, and through this, to help create a rich environment of multifaceted data to guide health care researchers in their endeavor to better understand the breadth, depth and scope of the problem. We demonstrate the usefulness of the proposed algorithm by providing a web portal that allows mental health researchers to peruse the suicide-related literature in PubMed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2020

Navigating the landscape of COVID-19 research through literature analysis: A bird's eye view

Timely access to accurate scientific literature in the battle with the o...
research
01/06/2020

Topic Extraction of Crawled Documents Collection using Correlated Topic Model in MapReduce Framework

The tremendous increase in the amount of available research documents im...
research
03/30/2021

Local and Global Topics in Text Modeling of Web Pages Nested in Web Sites

Topic models are popular models for analyzing a collection of text docum...
research
04/18/2016

Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints

We present our experience in applying distributional semantics (neural w...
research
05/29/2022

COVID-19 Literature Mining and Retrieval using Text Mining Approaches

The novel coronavirus disease (COVID-19) began in Wuhan, China, in late ...
research
09/16/2021

FOMO: Topics versus documents in legal eDiscovery

In the United States, the parties to a lawsuit are required to search th...
research
12/15/2020

Efficient Clustering from Distributions over Topics

There are many scenarios where we may want to find pairs of textually si...

Please sign up or login with your details

Forgot password? Click here to reset