UC-LTM: Unidimensional clustering using latent tree models for discrete data

Kin Man POON, April Hua LIU, Nevin Lianwen ZHANG

Research output: Contribution to journalArticle

Abstract

This paper is concerned with model-based clustering of discrete data. Latent class models (LCMs) are usually used for this task. An LCM consists of a latent variable and a number of attributes. It makes the overly restrictive assumption that the attributes are conditionally independent given the latent variable. We propose a novel method to relax this assumption. The key idea is to partition the attributes into groups such that correlations among the attributes in each group can be properly modeled by using a single latent variable. The latent variables for the attribute groups are then used to build a number of models, and one of them is chosen to produce the clustering results. The new method produces unidimensional clustering using latent tree models and is named UC-LTM. Extensive empirical studies were conducted to compare UC-LTM with several model-based and distance-based clustering methods. UC-LTM outperforms the alternative methods in most cases, and the differences are often large. Further, analysis on real-world social capital data further shows improved results given by UC-LTM over results given by LCMs in a previous study. Copyright © 2017 Elsevier B.V. All rights reserved.

Original languageEnglish
Pages (from-to)392-409
JournalInternational Journal of Approximate Reasoning
Volume92
Early online dateOct 2017
DOIs
Publication statusPublished - 2018

Fingerprint

Discrete Data
Latent Variables
Latent Class Model
Attribute
Clustering
Model-based Clustering
Model
Clustering Methods
Empirical Study
Partition
Model-based
Alternatives

Citation

Poon, L. K. M., Liu, A. H., & Zhang, N. L. (2018). UC-LTM: Unidimensional clustering using latent tree models for discrete data. International Journal of Approximate Reasoning, 92, 392-409.

Keywords

  • Unidimensional clustering
  • Latent tree models
  • Latent class models
  • Probabilistic graphical models
  • Unsupervised learning