Accurate forecasting of continuous glucose monitoring (CGM)-derived metrics may enable proactive diabetes management. A study published in Diabetes Technology and Obesity Medicine evaluated the relative performance of modern tabular learning models in predicting glycemic outcomes up to 6 weeks ahead in individuals with type 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus (T2DM).
Four regression models, CatBoost, XGBoost, AutoGluon, and tabPFN, were trained and internally validated using 4622 case-weeks from two cohorts, including 3389 individuals with T1DM and 1233 with T2DM. The models predicted multiple CGM-derived metrics, including time-in-range (TIR), time-in-tight-range (TITR), time-above-range (TAR), time-below-range (TBR), coefficient of variation, and mean amplitude of glycemic excursions (MAGE), along with related quantiles. Model performance was evaluated using mean absolute error (MAE) and mean absolute relative difference (MARD), while quantile classification was assessed using confusion-matrix heatmaps.
Across both cohorts, models demonstrated broadly comparable performance for most glycemic targets. In T1DM, MARD values for TIR, TITR, TAR, and MAGE ranged from 8.5% to 16.5%. In contrast, TBR showed a higher relative error, with a mean MARD of around 48%, despite low MAE. AutoGluon and tabPFN showed lower MAE than XGBoost for several outcomes, including TITR (P < 0.01) and TAR/TBR (P < 0.05-0.01). In T2DM, MARD ranged from 7.8% to 23.9%, while TBR relative error was approximately 78%. TabPFN outperformed other models for TIR (P < 0.01), and AutoGluon and tabPFN outperformed CatBoost and XGBoost for TAR (P < 0.05). Inference time varied substantially across models, ranging from 0.04 seconds for CatBoost and XGBoost to 2.7 seconds for AutoGluon and 699 seconds for tabPFN per 1000 cases.
Overall, week-ahead CGM metrics were predicted with reasonable accuracy using modern tabular models. However, prediction of low-prevalence hypoglycemia remained challenging in relative terms. Advanced automated and foundation models provided modest improvements in accuracy but required substantially higher computational resources. External validation is needed before these approaches can be considered for clinical use.