J Adv Res. 2025 Dec 09. pii: S2090-1232(25)00995-6. [Epub ahead of print]
Hengzhen Li,
Meng Chen,
Ying Liu,
Haimeng Tang,
Guangtao Jiao,
Ling Qu,
Yushuai Song,
Dan Jiang,
Chuanfeng Mo,
Xiaona Fan,
Yisheng Dai,
Zhuo Chen,
Haitao Wang,
Ruowei Yang,
Dongqin Zhu,
Xiuxiu Xu,
Hua Bao,
Henan Zhou,
Huaxing Wu,
Wenhui Li,
Huike Yang,
Chao Liu,
Zhiwei Li.
INTRODUCTION: Gastric cancer remains a major global health burden, with high mortality driven by late-stage diagnoses that limit treatment options and reduce survival. Current diagnostic methods such as endoscopy and biopsy are invasive, resource-intensive, and impractical for large-scale early detection.
OBJECTIVES: This study aimed to develop and validate an ensemble machine learning model integrating four cell-free DNA (cfDNA) fragmentomic feature classes derived from 5 × whole genome sequencing (WGS) data to non-invasively differentiate malignant gastric cancer from benign gastric lesions in high-risk or symptomatic patients.
METHODS: A total of 681 plasma samples were prospectively collected, comprising 329 from patients with gastric cancer or high-grade intraepithelial neoplasia (HGIN) and 352 from individuals with benign gastric conditions. The dataset was divided into a training cohort (n = 333) and a temporally independent validation cohort (n = 348). An external validation cohort of 305 participants was also included.
RESULTS: The ensemble model achieved an AUROC of 0.920 in cross-validation testing on the training cohort, 0.912 in the independent validation cohort, and 0.896 (95% CI 0.860-0.932) in the external cohort. At a pre-specified prediction threshold of 0.402, the model demonstrated 93.3% sensitivity and 71.9% specificity in the validation cohort, yielding a PPV of 71.3% and an NPV of 93.5%. In the external cohort, sensitivity and specificity were 91.7% and 69.1%, respectively (PPV 75.7%, NPV 88.8%). Model scores correlated with clinical stage, tumor grade, and histopathological subtype. Approximately 71% of non-cancer patients could have been spared unnecessary endoscopy.
CONCLUSIONS: The cfDNA fragmentomics-based ensemble model enables accurate, non-invasive differentiation between gastric cancer and benign gastric lesions in high-risk or symptomatic patients. This approach demonstrates strong potential as a pre-endoscopy triage tool, supporting earlier detection and more efficient use of diagnostic resources.
Keywords: Early detection; Fragmentomics; Gastric cancer; Machine learning; Stomach-related complications