Neural Attention-Aware Hierarchical Topic Model

10/14/2021
by   Yuan Jin, et al.
0

Neural topic models (NTMs) apply deep neural networks to topic modelling. Despite their success, NTMs generally ignore two important aspects: (1) only document-level word count information is utilized for the training, while more fine-grained sentence-level information is ignored, and (2) external semantic knowledge regarding documents, sentences and words are not exploited for the training. To address these issues, we propose a variational autoencoder (VAE) NTM model that jointly reconstructs the sentence and document word counts using combinations of bag-of-words (BoW) topical embeddings and pre-trained semantic embeddings. The pre-trained embeddings are first transformed into a common latent topical space to align their semantics with the BoW embeddings. Our model also features hierarchical KL divergence to leverage embeddings of each document to regularize those of their sentences, thereby paying more attention to semantically relevant sentences. Both quantitative and qualitative experiments have shown the efficacy of our model in 1) lowering the reconstruction errors at both the sentence and document levels, and 2) discovering more coherent topics from real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2023

BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling

With the development of neural topic models in recent years, topic model...
research
04/08/2020

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Topic models extract meaningful groups of words from documents, allowing...
research
04/07/2016

Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves

We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic mod...
research
08/01/2017

SenGen: Sentence Generating Neural Variational Topic Model

We present a new topic model that generates documents by sampling a topi...
research
02/06/2023

Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences

Pre-trained language models have led to a new state-of-the-art in many N...
research
12/12/2022

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Instead of mining coherent topics from a given text corpus in a complete...
research
07/20/2016

An Adaptation of Topic Modeling to Sentences

Advances in topic modeling have yielded effective methods for characteri...

Please sign up or login with your details

Forgot password? Click here to reset