Front Immunol. 2026 ;17
1705156
Lina Shan,
Dengyong Xu,
Jie Chen,
Wenjia Liu,
Ji Lin,
Juhang Bao,
Jianfei Huang,
Hanqing Zhang,
Hanchen Zhao,
Wei Xue,
Ziao Lin,
Bingjun Bai.
Background: Early detection of colorectal cancer (CRC) is crucial for improving patient outcomes. Cell-free DNA (cfDNA) analysis has emerged as a promising non-invasive approach for cancer detection. This study aims to develop a machine learning algorithm leveraging cfDNA fragmentomic features to accurately detect CRC.
Methods: 573 individuals from Sir Run Run Shaw Hospital, two community healthcare centers and three additional medical centers, were collected between April 1, 2023, and December 12, 2025. Participants were divided into training, internal validation, and external validation cohorts. A variety of cfDNA fragmentomic features were analyzed and incorporated into machine learning models. The models were evaluated using 10-fold cross-validation and assessed for accuracy, sensitivity, specificity, and AUC values. We also performed differential analysis of key genomic features, such as Alu elements and long terminal repeats (LTRs), between benign and malignant CRC samples.
Results: The machine learning algorithm demonstrated robust discriminative performance across all datasets using generalized linear modeling (GLM), achieving AUC values of 0.959 (training set), 0.979 (internal validation cohort), and 0.959 (external validation cohort). Notably, the model exhibited particularly strong classification accuracy for advanced-stage colorectal cancer (CRC). Comparative cfDNA profiling revealed distinct molecular signatures between benign and malignant samples: benign samples were characterized by elevated frequencies of Alu elements and long terminal repeats (LTRs), whereas malignant samples showed distinct end motif profiles, characterized by the significant enrichment of specific 4-mer end motifs. These findings suggest that these molecular features may serve as potential biomarkers for malignancy detection.
Conclusion: This study demonstrates that cfDNA fragmentomic profiling, particularly differential patterns of Alu and LTR elements, effectively discriminates benign from malignant colorectal lesions. These findings validate the clinical utility of repetitive element analysis and provide a foundation for developing improved non-invasive CRC diagnostics through machine learning approaches incorporating genomic features.
Keywords: Alu elements; cell-free DNA; colorectal cancer; early detection; machine learning