Predicting CVD risk in cancer survivors remains challenging. In a multicenter analysis published in the Journal of the American Heart Association, predictive performance of regularized logistic regression was compared with multiple machine learning models using longitudinal clinical data.
The study included 3,835 multiracial cancer survivors followed for up to 20 years. A total of 89 clinical, laboratory, and echocardiographic variables were used for model development. Training was performed using repeated random and time-split sampling, with external testing conducted in a separate cohort of 329 patients. Model performance was assessed using the area under the receiver operating characteristic curve (AUC).
Regularized logistic regression achieved an AUC of 0.845 for HF, 0.783 for AF, 0.792 for CAD, and 0.806 for composite CVD. These values were comparable to those obtained with Bayesian additive regression tree (0.837) and random forest (0.848) models for HF. For de novo composite CVD after cancer diagnosis, regularized logistic regression achieved an AUC of 0.826, compared with 0.735 for decision tree and 0.802 for random forest models.
These findings show that regularized logistic regression and advanced machine learning approaches provide similar predictive performance for CVD risk in cancer survivors using longitudinal data.