Model-based clustering of high-dimensional data: Variable selection versus facet determination

Kin Man POON, Nevin L. ZHANG, Tengfei LIU, April H. LIU

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Variable selection is an important problem for cluster analysis of high-dimensional data. It is also a difficult one. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the "best" clustering may be misguided. It makes more sense to identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture models and demonstrate its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously. We present empirical results to show that facet determination usually leads to better clustering results than variable selection. Copyright © 2012 Elsevier Inc. All rights reserved.
Original languageEnglish
Pages (from-to)196-215
JournalInternational Journal of Approximate Reasoning
Volume54
Issue number1
Early online dateAug 2012
DOIs
Publication statusPublished - Jan 2013

Fingerprint

Model-based Clustering
Cluster analysis
High-dimensional Data
Variable Selection
Facet
Attribute
Clustering
Subset
Gaussian Mixture Model
Cluster Analysis
Demonstrate

Citation

Poon, L. K. M., Zhang, N. L., Liu, T., & Liu, A. H. (2013). Model-based clustering of high-dimensional data: Variable selection versus facet determination. International Journal of Approximate Reasoning, 54(1), 196-215. doi: 10.1016/j.ijar.2012.08.001

Keywords

  • Model-based clustering
  • Facet determination
  • Variable selection
  • Latent tree models
  • Gaussian mixture models