Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion

Jianhua ZHAO, Changchun SHANG, Shulan LI, Ling XIN, Leung Ho Philip YU

Research output: Contribution to journalArticlespeer-review

Abstract

The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size N, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the ‘complete’ sample size N is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only Ni<N observations for variable i, which means that using the ‘complete’ sample size N implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel hierarchical BIC (HBIC) criterion is proposed for factor analysis with incomplete data, which is denoted by HBICinc. The novelty is that HBICinc only uses the actual amounts of observed information, namely Ni’s, in the penalty term. Theoretically, it is shown that HBICinc is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBICinc, which means that HBICinc shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBICinc, BIC, and related criteria with various missing rates. The results show that HBICinc and BIC perform similarly when the missing rate is small, but HBICinc is more accurate when the missing rate is not small. Copyright © 2024 Springer-Verlag GmbH Germany, part of Springer Nature.

Original languageEnglish
JournalAdvances in Data Analysis and Classification
Early online dateMar 2024
DOIs
Publication statusE-pub ahead of print - Mar 2024

Citation

Zhao, J., Shang, C., Li, S., Xin, L., & Yu, P. L. H. (2024). Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion. Advances in Data Analysis and Classification. Advance online publication. https://doi.org/10.1007/s11634-024-00582-w

Keywords

  • Factor analysis
  • BIC
  • Model selection
  • Maximum likelihood
  • Incomplete data
  • Variational Bayesian

Fingerprint

Dive into the research topics of 'Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion'. Together they form a unique fingerprint.