Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients

  • Michael D. KUO
  • , Keith W. H. CHIU
  • , David S. WANG
  • , Anna Rita LARICI
  • , Dmytro POPLAVSKIY
  • , Adele VALENTINI
  • , Alessandro NAPOLI
  • , Andrea BORGHESI
  • , Guido LIGABUE
  • , Xin Hao B. FANG
  • , Hing Ki C. WONG
  • , Sailong ZHANG
  • , John R. HUNTER
  • , Abeer MOUSA
  • , Amato INFANTE
  • , Lorenzo ELIA
  • , Salvatore GOLEMI
  • , Leung Ho Philip YU
  • , Christopher K. M. HUI
  • , Bradley J. ERICKSON

Research output: Contribution to journalArticlespeer-review

9 Citations (Scopus)

Abstract

Objectives: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR. 

Methods: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases. 

Results: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78–0.80) on an independent test cohort of 5,894 patients. Delong’s test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar’s test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001). 

Conclusion: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR. 

Key Points

• An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. 

• Differences in AI model performance were seen across region, disease severity, gender, and age. 

• Prevalence simulations on the international test set demonstrate the model’s NPV is greater than 98.5% at any prevalence below 4.5%. Copyright © 2022 The Author(s), under exclusive licence to European Society of Radiology.

Original languageEnglish
Pages (from-to)23-33
JournalEuropean Radiology
Volume33
Early online dateJul 2022
DOIs
Publication statusPublished - Jan 2023

Citation

Kuo, M. D., Chiu, K. W. H., Wang, D. S., Larici, A. R., Poplavskiy, D., Valentini, A., . . . Erickson, B. J. (2023). Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients. European Radiology, 33, 23-33. doi: 10.1007/s00330-022-08969-z

Keywords

  • Artificial intelligence
  • COVID-19
  • Radiology
  • Thoracic
  • Public health

Fingerprint

Dive into the research topics of 'Multi-center validation of an artificial intelligence system for detection of COVID-19 on chest radiographs in symptomatic patients'. Together they form a unique fingerprint.