Front Cell Dev Biol. 2025 ;13 1630231
Background: Repetitive elements account for a large proportion of the human genome and undergo alterations during early tumorigenesis. However, the exclusive fragmentation pattern of DNA-derived cell-free repetitive elements (cfREs) remains unclear.
Methods: This study enrolled 32 healthy volunteers and 112 patients with five types of cancer. A novel repetitive fragmentomics approach was proposed to profile cfREs using low-pass whole genome sequencing (WGS). Five innovative repetitive fragmentomic features were designed: fragment ratio, fragment length, fragment distribution, fragment complexity, and fragment expansion. A machine learning-based multimodal model was developed using these features.
Results: The multimodal model achieved high prediction performance for early tumor detection, even at ultra-low sequencing depths (0.1×, AUC = 0.9824). Alu and short tandem repeat (STR) were identified as the primary cfREs after filtering out low-efficiency subfamilies. Characterization of cfREs within tumor-specific regulatory regions enabled accurate tissue-of-origin (TOO) prediction (0.1×, accuracy = 0.8286) and identified aberrantly transcribed tumor driver genes.
Conclusion: This study highlights the abundance of repetitive DNA in plasma. The innovative fragmentomics approach provides a sensitive, robust, and cost-effective method for early tumor detection and localization.
Keywords: cell-free DNA; early tumor detection; low-pass whole genome sequencing; repetitive element; tissue of origin