Trans-Balance: Reducing demographic disparity for prediction models in the presence of class imbalance.

Pubmed ID: 38070817

Pubmed Central ID: PMC10850917

Journal: Journal of biomedical informatics

Publication Date: Jan. 1, 2024

MeSH Terms: Humans, Cohort Studies, Algorithms, Demography, Biomedical Research, Machine Learning

Grants: N01 HC025195, N01 HC095159, U01 NS041588, UL1 RR024156, N01 HC095167, N01 HC095161, N01 HC095164, N01 HC095166, N01 HC095160, N01 HC095169, N01 HC095165, N01 HC095168, N01 HC095163, N01 HC095162, HHSN268201700004I, HHSN268201500001I, HHSN268201500001C, HHSN268201700001I, HHSN268201700003I, HHSN268201700005I, HHSN268201700002I, HHSN268201700002C, HHSN268201700005C, HHSN268201700001C, HHSN268201700003C, HHSN268201700004C, T32 HL079896, R01 HL136666, R61 NS120246, R33 NS120246

Authors: Liu M, Hickey J, Henao R, Pencina M, Hong C, Wojdyla DM

Cite As: Hong C, Liu M, Wojdyla DM, Hickey J, Pencina M, Henao R. Trans-Balance: Reducing demographic disparity for prediction models in the presence of class imbalance. J Biomed Inform 2024 Jan;149:104532. Epub 2023 Dec 7.

Studies:

Abstract

INTRODUCTION: Risk prediction, including early disease detection, prevention, and intervention, is essential to precision medicine. However, systematic bias in risk estimation caused by heterogeneity across different demographic groups can lead to inappropriate or misinformed treatment decisions. In addition, low incidence (class-imbalance) outcomes negatively impact the classification performance of many standard learning algorithms which further exacerbates the racial disparity issues. Therefore, it is crucial to improve the performance of statistical and machine learning models in underrepresented populations in the presence of heavy class imbalance. METHOD: To address demographic disparity in the presence of class imbalance, we develop a novel framework, Trans-Balance, by leveraging recent advances in imbalance learning, transfer learning, and federated learning. We consider a practical setting where data from multiple sites are stored locally under privacy constraints. RESULTS: We show that the proposed Trans-Balance framework improves upon existing approaches by explicitly accounting for heterogeneity across demographic subgroups and cohorts. We demonstrate the feasibility and validity of our methods through numerical experiments and a real application to a multi-cohort study with data from participants of four large, NIH-funded cohorts for stroke risk prediction. CONCLUSION: Our findings indicate that the Trans-Balance approach significantly improves predictive performance, especially in scenarios marked by severe class imbalance and demographic disparity. Given its versatility and effectiveness, Trans-Balance offers a valuable contribution to enhancing risk prediction in biomedical research and related fields.