Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

by   Dongsheng Wang, et al.

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document and hence often suffers from poor performance in analyzing short documents. In addition, its parameter estimation often relies on approximate posterior inference that is either not scalable or suffers from large approximation error. This paper introduces a new topic-modeling framework where each document is viewed as a set of word embedding vectors and each topic is modeled as an embedding vector in the same embedding space. Embedding the words and topics in the same vector space, we define a method to measure the semantic difference between the embedding vectors of the words of a document and these of the topics, and optimize the topic embeddings to minimize the expected difference over all documents. Experiments on text analysis demonstrate that the proposed method, which is amenable to mini-batch stochastic gradient descent based optimization and hence scalable to big corpora, provides competitive performance in discovering more coherent and diverse topics and extracting better document representations.


page 1

page 2

page 3

page 4


Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding sp...

Inductive Document Network Embedding with Topic-Word Attention

Document network embedding aims at learning representations for a struct...

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

There has recently been much interest in extending vector-based word rep...

Integrating topic modeling and word embedding to characterize violent deaths

There is an escalating need for methods to identify latent patterns in t...

Towards Better Understanding with Uniformity and Explicit Regularization of Embeddings in Embedding-based Neural Topic Models

Embedding-based neural topic models could explicitly represent words and...

Few-shot Learning for Topic Modeling

Topic models have been successfully used for analyzing text documents. H...

Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

We present a framework that allows users to incorporate the semantics of...

Please sign up or login with your details

Forgot password? Click here to reset