Comput Biol Med. 2026 May 15. pii: S0010-4825(26)00320-3. [Epub ahead of print]211
111756
Cell-free deoxyribonucleic acid (cfDNA) fragmentation patterns represent significant non-invasive diagnostic markers. In this study, A new paradigm for cfDNA diagnostics is established by Semantic Inductive Graph-based Diagnostics (SIGD) through the unification of semantic encoding and graph topology. A transformative perspective on cfDNA analysis is introduced by treating genomic fragmentation as a structured linguistic problem. In this research, SIGD was developed as a high-performance framework, leveraging a heterogeneous Graph Convolutional Network integrated with Bidirectional Long Short-Term Memory (BiLSTM) semantic encoders. The architecture is centered on a multi-relational graph topology where complex biological interactions are explicitly modeled. Specifically, Inverse Document Frequency (IDF) weights are utilized to quantify sequence-motif relevance, while Pointwise Mutual Information (PMI) is employed to capture co-occurrence dependencies between motifs. Within this framework, the Term Frequency and Category Relevancy Factor (TFCRF) weighting scheme is strategically implemented to formalize the direct relational mapping between motifs and diagnostic labels, enabling the extraction of category-aware features. Through this integrative approach, high-order patterns that often remain undetected by traditional models are effectively captured. Consequently, superior diagnostic sensitivity and enhanced interpretability in cancer detection are achieved. Finally, the framework was evaluated using 2451 plasma samples across multiple sequencing modalities. Superior performance was achieved by SIGD relative to established baselines. In the testing set, a diagnostic accuracy of 91.43% and an area under the receiver operating curve (AUROC) of 0.967 were attained for general cancer detection, with 64 end-motifs. For hepatocellular carcinoma (HCC)-specific classification, the model reached an accuracy of 99% and an AUROC of 0.998. Model reliability was confirmed via calibration analysis, facilitating real-time inductive inference without retraining. The framework is characterized by high accuracy, interpretability and computational efficiency.
Keywords: Cancer detection; Category-aware learning; Cell-free DNA; End-motif profiling; Graph convolutional networks; Inductive learning