Latent tree models for hierarchical topic detection

Peixian CHEN, Nevin Lianwen ZHANG, Teng-Fei LIU, Kin Man POON, Zhourong CHEN, Farhan KHAWAR

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies. Copyright © 2017 Published by Elsevier B.V.
Original languageEnglish
Pages (from-to)105-124
JournalArtificial Intelligence
Volume250
Early online dateJun 2017
DOIs
Publication statusPublished - Sep 2017

Fingerprint

Model structures
Co-occurrence
Linear Discriminant Analysis
Hierarchical Model
Meaningfulness

Bibliographical note

Chen, P., Zhang, N. L., Liu, T., Poon, L. K. M., Chen, Z., & Khawar, F. (2017). Latent tree models for hierarchical topic detection. Artificial Intelligence, 250, 105-124.

Keywords

  • Probabilistic graphical models
  • Text analysis
  • Hierarchical latent tree analysis
  • Hierarchical topic detection