Variable selection for high-dimensional incomplete data

Lixing LIANG, Yipeng ZHUANG, Leung Ho Philip YU

Research output: Contribution to journalArticlespeer-review

Abstract

Regression analysis is often affected by high dimensionality, severe multicollinearity, and a large proportion of missing data. These problems may mask important relationships and even lead to biased conclusions. This paper proposes a novel computationally efficient method that integrates data imputation and variable selection to address these issues. More specifically, the proposed method incorporates a new multiple imputation algorithm based on matrix completion (Multiple Accelerated Inexact Soft-Impute), a more stable and accurate new randomized lasso method (Hybrid Random Lasso), and a consistent method to integrate a variable selection method with multiple imputation. Compared to existing methodologies, the proposed approach offers greater accuracy and consistency through mechanisms that enhances robustness against different missing data patterns and sampling variations. The method is applied to analyze the Asian American minority subgroup in the 2017 National Youth Risk Behavior Survey, where key risk factors related to the intention for suicide among Asian Americans are studied. Through simulations and real data analyses on various regression and classification settings, the proposed method demonstrates enhanced accuracy, consistency, and efficiency in both variable selection and prediction. Copyright © 2023 Elsevier B.V. All rights reserved.

Original languageEnglish
Article number107877
JournalComputational Statistics and Data Analysis
Volume192
DOIs
Publication statusPublished - Apr 2024

Citation

Liang, L., Zhuang, Y., & Yu, P. L. H. (2024). Variable selection for high-dimensional incomplete data. Computational Statistics and Data Analysis, 192, Article 107877. https://doi.org/10.1016/j.csda.2023.107877

Keywords

  • High-dimensional
  • Missing data
  • Variable selection
  • Multiple imputation
  • Randomized lasso

Fingerprint

Dive into the research topics of 'Variable selection for high-dimensional incomplete data'. Together they form a unique fingerprint.