bims-tumhet Biomed News
on Tumor heterogeneity
Issue of 2026–05–17
two papers selected by
Sergio Marchini, Humanitas Research



  1. iScience. 2026 Jun 19. 29(6): 115833
      Cell-free DNA (cfDNA) in plasma provides attractive opportunities for early cancer diagnosis. This study aimed to establish gastric cancer (GC) artificial intelligence algorithms (GC-AIAs) based on cfDNA fragmentome for GC's early detection and subtyping. Whole-genome sequencing data were obtained from the training cohort of 404 participants, the internal testing cohort of 173 participants, and the independent validation cohort of 299 participants. Seven classes of cfDNA fragmentomic features were analyzed and employed to build the GC-AIA employing a stack ensemble model. The model's AUC in the internal testing cohort was 0.958 (95% confidence interval [CI]: 0.931-0.985) and that in the independent validation cohort was 0.951 (95% CI: 0.926-0.975). The GC-AIA showed high performance for various staged/differentiated GC detection and molecular subtyping. The stage shift analysis showed a notable increase in diagnosed stage Ⅰ patients. Our methodology built on the cfDNA fragmentomics exhibited encouraging preliminary performance in early detection and subtyping of GC patients.
    Keywords:  cancer; diagnostic procedure; machine learning
    DOI:  https://doi.org/10.1016/j.isci.2026.115833
  2. Nucleic Acids Res. 2026 May 05. pii: gkag434. [Epub ahead of print]54(9):
      High-throughput spatial transcriptomics (ST) now profiles hundreds of thousands of cells or locations per section, creating computational bottlenecks for routine analysis. Sketching, or intelligent sub-sampling, addresses scale by selecting small, representative subsets. While effective for single-cell RNA sequencing data, existing sketching methods, which optimize coverage in expression space but ignore physical location, can introduce spatial bias when applied to ST data. To explore the impact of sketching on ST analysis, we systematically benchmarked uniform sampling, leverage-score sampling, Geosketch (minimax/Hausdorff), and scSampler (maximin) across multiple real ST datasets (mouse ovary, MERFISH brain, human breast cancer, lung) and simulations, using three input representations: Principle Component Analysis (PCA) embeddings, spatial coordinates, and spatially smoothed embeddings. We show that expression-only designs capture global transcriptomic heterogeneity but distort tissue architecture by over-sampling high-variability regions and under-sampling homogeneous areas. Coordinate-only sampling restores tissue coverage but misses transcriptional extremes. A simple spatially aware extension, computing leverage scores from a randomized singular value decomposition (SVD) basis smoothed by a spatial weights matrix, strikes a favorable balance, recovering rare cell states while maintaining uniform tissue coverage and avoiding edge effects. Across robust Hausdorff distances, clustering stability (Adjusted Rand Index), PCA loading drift, and local cell-type mean squared error (MSE), spatially smoothed leverage scores match or outperform alternatives. These results motivate joint spatial-transcriptomic sketching objectives to enable fast, unbiased analyses of increasingly large ST datasets.
    DOI:  https://doi.org/10.1093/nar/gkag434