Efficient Inference for Dynamic Topic Modeling with Large Vocabularies

Abstract

Dynamic topic modeling is a well established tool for capturing the temporal dynamics of the topics of a corpus. Currently, dynamic topic models can only consider a small set of frequent words because of their computational complexity and insufficient data for less frequent words. In this work, we de- velop a scalable dynamic topic model by utilizing the correlation among the words in the vocabulary. By correlating previously independent temporal processes for words, our new model allows us to reliably estimate the topic representations contain- ing less frequent words. We develop an amortised variational inference method with self-normalised importance sampling approximation to the word distribution that dramatically reduces the compu- tational complexity and the number of variational parameters in order to handle large vocabularies. With extensive experiments on text datasets, we show that our method significantly outperforms the previous works by modeling word correlations, and it is able to handle real world data with a large vocabulary (80K words) which could not be pro- cessed by previous continuous dynamic topic mod- els. With qualitative analyses, we show that our method can perform inference on infrequent but representative keywords much more reliably than previous methods.

Related

November 2023 | ACM TORS

Unbiased Identification of Broadly Appealing Content Using a Pure Exploration Infinitely-Armed Bandit Strategy

Maryam Aziz, Jesse Anderton, Kevin Jamieson, Alice Wang, Hugues Bouchard, Javed Aslam

October 2023 | CIKM

Graph Learning for Exploratory Query Suggestions in an Instant Search System

Enrico Palumbo, Andreas Damianou, Alice Wang, Alva Liu, Ghazal Fazelnia, Francesco Fabbri, Rui Ferreira, Fabrizio Silvestri, Hugues Bouchard, Claudia Hauff, Mounia Lalmas, Ben Carterette, Praveen Chandar, David Nyhan

September 2023 | CLEF

Cem Mil Podcasts: A Spoken Portuguese Document Corpus For Multi-modal, Multi-lingual and Multi-Dialect Information Access Research

Ekaterina Garmash, Edgar Tanaka, Ann Clifton, Joana Correia, Sharmistha Jat, Winstead Zhu, Rosie Jones, Jussi Karlgren