Alterations in the gut microbiome have been increasingly associated with type 1 diabetes mellitus (T1DM), but the influence of taxonomic resolution and feature-selection methods on predictive modeling remains incompletely understood. A machine learning analysis published in BMC Microbiology evaluated microbiome-based prediction of T1DM using publicly available 16S ribosomal RNA sequencing datasets from two geographically distinct cohorts.
The analysis constructed microbial features across multiple taxonomic levels and hierarchical taxonomic paths designed to preserve phylogenetic structure. Machine learning models were trained using stratified cross-validation and evaluated using cross-cohort validation frameworks. Binary Particle Swarm Optimization (BPSO) was used for feature selection, and differential abundance analysis was performed using the LinDA framework.
Findings
- Random Forest and XGBoost demonstrated the strongest predictive performance across taxonomic feature representations.
- Family-level taxonomic features provided strong classification performance with relatively compact feature sets, whereas higher-resolution representations increased complexity without consistently improving performance.
- BPSO identified consistently selected microbial taxa across validation frameworks, suggesting stable predictive signatures.
- Several selected taxa have previously been associated with inflammatory or metabolically altered gut environments.
- Cross-cohort validation showed lower predictive performance than within-study validation models, highlighting challenges in model generalizability across populations.
The findings suggest that combining machine learning with BPSO-based feature selection may improve identification and interpretation of microbial signatures associated with T1DM, although cross-cohort generalizability remains limited.