bims-gerecp Biomed News
on Gene regulatory networks of epithelial cell plasticity
Issue of 2024–05–12
twenty-two papers selected by
Xiao Qin, University of Oxford



  1. Nat Methods. 2024 May 09.
      Standard scATAC sequencing (scATAC-seq) analysis pipelines represent cells as sparse numeric vectors relative to an atlas of peaks or genomic tiles and consequently ignore genomic sequence information at accessible loci. Here we present CellSpace, an efficient and scalable sequence-informed embedding algorithm for scATAC-seq that learns a mapping of DNA k-mers and cells to the same space, to address this limitation. We show that CellSpace captures meaningful latent structure in scATAC-seq datasets, including cell subpopulations and developmental hierarchies, and can score transcription factor activities in single cells based on proximity to binding motifs embedded in the same space. Importantly, CellSpace implicitly mitigates batch effects arising from multiple samples, donors or assays, even when individual datasets are processed relative to different peak atlases. Thus, CellSpace provides a powerful tool for integrating and interpreting large-scale scATAC-seq compendia.
    DOI:  https://doi.org/10.1038/s41592-024-02274-x
  2. Brief Bioinform. 2024 Mar 27. pii: bbae202. [Epub ahead of print]25(3):
      Inference of cell-cell communication (CCC) provides valuable information in understanding the mechanisms of many important life processes. With the rise of spatial transcriptomics in recent years, many methods have emerged to predict CCCs using spatial information of cells. However, most existing methods only describe CCCs based on ligand-receptor interactions, but lack the exploration of their upstream/downstream pathways. In this paper, we proposed a new method to infer CCCs, called Intercellular Gene Association Network (IGAN). Specifically, it is for the first time that we can estimate the gene associations/network between two specific single spatially adjacent cells. By using the IGAN method, we can not only infer CCCs in an accurate manner, but also explore the upstream/downstream pathways of ligands/receptors from the network perspective, which are actually exhibited as a new panoramic cell-interaction-pathway graph, and thus provide extensive information for the regulatory mechanisms behind CCCs. In addition, IGAN can measure the CCC activity at single cell/spot resolution, and help to discover the CCC spatial heterogeneity. Interestingly, we found that CCC patterns from IGAN are highly consistent with the spatial microenvironment patterns for each cell type, which further indicated the accuracy of our method. Analyses on several public datasets validated the advantages of IGAN.
    Keywords:  cell–cell communication; intercellular gene association; ligand–receptor pathway; spatial microenvironment; spatial transcriptome
    DOI:  https://doi.org/10.1093/bib/bbae202
  3. Comput Struct Biotechnol J. 2024 Dec;23 1886-1896
      Recent advances in single-cell omics technology have transformed the landscape of cellular and molecular research, enriching the scope and intricacy of cellular characterisation. Perturbation modelling seeks to comprehensively grasp the effects of external influences like disease onset or molecular knock-outs or external stimulants on cellular physiology, specifically on transcription factors, signal transducers, biological pathways, and dynamic cell states. Machine and deep learning tools transform complex perturbational phenomena in algorithmically tractable tasks to formulate predictions based on various types of single-cell datasets. However, the recent surge in tools and datasets makes it challenging for experimental biologists and computational scientists to keep track of the recent advances in this rapidly expanding filed of single-cell modelling. Here, we recapitulate the main objectives of perturbation modelling and summarise novel single-cell perturbation technologies based on genetic manipulation like CRISPR or compounds, spanning across omic modalities. We then concisely review a burgeoning group of computational methods extending from classical statistical inference methodologies to various machine and deep learning architectures like shallow models or autoencoders, to biologically informed approaches based on gene regulatory networks, and to combinatorial efforts reminiscent of ensemble learning. We also discuss the rising trend of large foundational models in single-cell perturbation modelling inspired by large language models. Lastly, we critically assess the challenges that underline single-cell perturbation modelling while pointing towards relevant future perspectives like perturbation atlases, multi-omics and spatial datasets, causal machine learning for interpretability, multi-task learning for performance and explainability as well as prospects for solving interoperability and benchmarking pitfalls.
    Keywords:  Deep learning; Machine learning; Perturbation; ScRNAseq; Single-cell RNA sequencing
    DOI:  https://doi.org/10.1016/j.csbj.2024.04.058
  4. bioRxiv. 2024 Apr 27. pii: 2024.04.25.591144. [Epub ahead of print]
      Several classification systems have been developed to define tumor subtypes in colorectal cancer (CRC). One system proposes that tumor heterogeneity derives in part from distinct cancer stem cell populations that co-exist as admixtures of varying proportions. However, the lack of single cell resolution has prohibited a definitive identification of these types of stem cells and therefore any understanding of how each influence tumor phenotypes. Here were report the isolation and characterization of two cancer stem cell subtypes from the SW480 CRC cell line. We find these cancer stem cells are oncogenic versions of the normal Crypt Base Columnar (CBC) and Regenerative Stem Cell (RSC) populations from intestinal crypts and that their gene signatures are consistent with the "Admixture" and other CRC classification systems. Using publicly available single cell RNA sequencing (scRNAseq) data from CRC patients, we determine that RSC and CBC cancer stem cells are commonly co-present in human CRC. To characterize influences on the tumor microenvironment, we develop subtype-specific xenograft models and we define their tumor microenvironments at high resolution via scRNAseq. RSCs create differentiated, inflammatory, slow growing tumors. CBCs create proliferative, undifferentiated, invasive tumors. With this enhanced resolution, we unify current CRC patient classification schema with TME phenotypes and organization.
    DOI:  https://doi.org/10.1101/2024.04.25.591144
  5. Brief Bioinform. 2024 Mar 27. pii: bbae192. [Epub ahead of print]25(3):
      Computational analysis of fluorescent timelapse microscopy images at the single-cell level is a powerful approach to study cellular changes that dictate important cell fate decisions. Core to this approach is the need to generate reliable cell segmentations and classifications necessary for accurate quantitative analysis. Deep learning-based convolutional neural networks (CNNs) have emerged as a promising solution to these challenges. However, current CNNs are prone to produce noisy cell segmentations and classifications, which is a significant barrier to constructing accurate single-cell lineages. To address this, we developed a novel algorithm called Single Cell Track (SC-Track), which employs a hierarchical probabilistic cache cascade model based on biological observations of cell division and movement dynamics. Our results show that SC-Track performs better than a panel of publicly available cell trackers on a diverse set of cell segmentation types. This cell-tracking performance was achieved without any parameter adjustments, making SC-Track an excellent generalized algorithm that can maintain robust cell-tracking performance in varying cell segmentation qualities, cell morphological appearances and imaging conditions. Furthermore, SC-Track is equipped with a cell class correction function to improve the accuracy of cell classifications in multiclass cell segmentation time series. These features together make SC-Track a robust cell-tracking algorithm that works well with noisy cell instance segmentation and classification predictions from CNNs to generate accurate single-cell lineages and classifications.
    Keywords:  cell cycle; cell division; convolutional neural networks; deep learning; single-cell tracking; timelapse microscopy imaging
    DOI:  https://doi.org/10.1093/bib/bbae192
  6. bioRxiv. 2024 Apr 28. pii: 2024.04.23.590827. [Epub ahead of print]
      Recent technological developments have made it possible to map the spatial organization of a tissue at the single-cell resolution. However, computational methods for analyzing spatially continuous variations in tissue microenvironment are still lacking. Here we present ONTraC as a strategy that constructs niche trajectories using a graph neural network-based modeling framework. Our benchmark analysis shows that ONTraC performs more favorably than existing methods for reconstructing spatial trajectories. Applications of ONTraC to public spatial transcriptomics datasets successfully recapitulated the underlying anatomical structure, and further enabled detection of tissue microenvironment-dependent changes in gene regulatory networks and cell-cell interaction activities during embryonic development. Taken together, ONTraC provides a useful and generally applicable tool for the systematic characterization of the structural and functional organization of tissue microenvironments.
    DOI:  https://doi.org/10.1101/2024.04.23.590827
  7. Nat Methods. 2024 May 09.
      The inability to scalably and precisely measure the activity of developmental cis-regulatory elements (CREs) in multicellular systems is a bottleneck in genomics. Here we develop a dual RNA cassette that decouples the detection and quantification tasks inherent to multiplex single-cell reporter assays. The resulting measurement of reporter expression is accurate over multiple orders of magnitude, with a precision approaching the limit set by Poisson counting noise. Together with RNA barcode stabilization via circularization, these scalable single-cell quantitative expression reporters provide high-contrast readouts, analogous to classic in situ assays but entirely from sequencing. Screening >200 regions of accessible chromatin in a multicellular in vitro model of early mammalian development, we identify 13 (8 previously uncharacterized) autonomous and cell-type-specific developmental CREs. We further demonstrate that chimeric CRE pairs generate cognate two-cell-type activity profiles and assess gain- and loss-of-function multicellular expression phenotypes from CRE variants with perturbed transcription factor binding sites. Single-cell quantitative expression reporters can be applied in developmental and multicellular systems to quantitatively characterize native, perturbed and synthetic CREs at scale, with high sensitivity and at single-cell resolution.
    DOI:  https://doi.org/10.1038/s41592-024-02260-3
  8. NPJ Syst Biol Appl. 2024 May 06. 10(1): 47
      Understanding and manipulating cell fate determination is pivotal in biology. Cell fate is determined by intricate and nonlinear interactions among molecules, making mathematical model-based quantitative analysis indispensable for its elucidation. Nevertheless, obtaining the essential dynamic experimental data for model development has been a significant obstacle. However, recent advancements in large-scale omics data technology are providing the necessary foundation for developing such models. Based on accumulated experimental evidence, we can postulate that cell fate is governed by a limited number of core regulatory circuits. Following this concept, we present a conceptual control framework that leverages single-cell RNA-seq data for dynamic molecular regulatory network modeling, aiming to identify and manipulate core regulatory circuits and their master regulators to drive desired cellular state transitions. We illustrate the proposed framework by applying it to the reversion of lung cancer cell states, although it is more broadly applicable to understanding and controlling a wide range of cell-fate determination processes.
    DOI:  https://doi.org/10.1038/s41540-024-00372-2
  9. Nat Aging. 2024 May 09.
      Age-related changes in DNA methylation (DNAm) form the basis of the most robust predictors of age-epigenetic clocks-but a clear mechanistic understanding of exactly which aspects of aging are quantified by these clocks is lacking. Here, to clarify the nature of epigenetic aging, we juxtapose the dynamics of tissue and single-cell DNAm in mice. We compare these changes during early development with those observed during adult aging in mice, and corroborate our analyses with a single-cell RNA sequencing analysis within the same multiomics dataset. We show that epigenetic aging involves co-regulated changes as well as a major stochastic component, and this is consistent with transcriptional patterns. We further support the finding of stochastic epigenetic aging by direct tissue and single-cell DNAm analyses and modeling of aging DNAm trajectories with a stochastic process akin to radiocarbon decay. Finally, we describe a single-cell algorithm for the identification of co-regulated and stochastic CpG clusters showing consistent transcriptomic coordination patterns. Together, our analyses increase our understanding of the basis of epigenetic clocks and highlight potential opportunities for targeting aging and evaluating longevity interventions.
    DOI:  https://doi.org/10.1038/s43587-024-00616-0
  10. Nature. 2024 May 08.
      In somatic tissue differentiation, chromatin accessibility changes govern priming and precursor commitment towards cellular fates1-3. Therefore, somatic mutations are likely to alter chromatin accessibility patterns, as they disrupt differentiation topologies leading to abnormal clonal outgrowth. However, defining the impact of somatic mutations on the epigenome in human samples is challenging due to admixed mutated and wild-type cells. Here, to chart how somatic mutations disrupt epigenetic landscapes in human clonal outgrowths, we developed genotyping of targeted loci with single-cell chromatin accessibility (GoT-ChA). This high-throughput platform links genotypes to chromatin accessibility at single-cell resolution across thousands of cells within a single assay. We applied GoT-ChA to CD34+ cells from patients with myeloproliferative neoplasms with JAK2V617F-mutated haematopoiesis. Differential accessibility analysis between wild-type and JAK2V617F-mutant progenitors revealed both cell-intrinsic and cell-state-specific shifts within mutant haematopoietic precursors, including cell-intrinsic pro-inflammatory signatures in haematopoietic stem cells, and a distinct profibrotic inflammatory chromatin landscape in megakaryocytic progenitors. Integration of mitochondrial genome profiling and cell-surface protein expression measurement allowed expansion of genotyping onto DOGMA-seq through imputation, enabling single-cell capture of genotypes, chromatin accessibility, RNA expression and cell-surface protein expression. Collectively, we show that the JAK2V617F mutation leads to epigenetic rewiring in a cell-intrinsic and cell type-specific manner, influencing inflammation states and differentiation trajectories. We envision that GoT-ChA will empower broad future investigations of the critical link between somatic mutations and epigenetic alterations across clonal populations in malignant and non-malignant contexts.
    DOI:  https://doi.org/10.1038/s41586-024-07388-y
  11. Sci Rep. 2024 05 09. 14(1): 10633
      Single-cell RNA sequencing (scRNA-seq) technology has been widely used to study the differences in gene expression at the single cell level, providing insights into the research of cell development, differentiation, and functional heterogeneity. Various pipelines and workflows of scRNA-seq analysis have been developed but few considered multi-timepoint data specifically. In this study, we develop CASi, a comprehensive framework for analyzing multiple timepoints' scRNA-seq data, which provides users with: (1) cross-timepoint cell annotation, (2) detection of potentially novel cell types emerged over time, (3) visualization of cell population evolution, and (4) identification of temporal differentially expressed genes (tDEGs). Through comprehensive simulation studies and applications to a real multi-timepoint single cell dataset, we demonstrate the robust and favorable performance of the proposal versus existing methods serving similar purposes.
    DOI:  https://doi.org/10.1038/s41598-024-58566-x
  12. Curr Top Dev Biol. 2024 ;pii: S0070-2153(24)00001-2. [Epub ahead of print]159 406-437
      Transcriptional regulation plays a pivotal role in orchestrating the intricate genetic programs governing embryonic development. The expression of developmental genes relies on the combined activity of several cis-regulatory elements (CREs), such as enhancers and silencers, which can be located at long linear distances from the genes that they regulate and that interact with them through establishment of chromatin loops. Mutations affecting their activity or interaction with their target genes can lead to developmental disorders and are thought to have importantly contributed to the evolution of the animal body plan. The income of next-generation-sequencing approaches has allowed identifying over a million of sequences with putative regulatory potential in the human genome. Characterizing their function and establishing gene-CREs maps is essential to decode the logic governing developmental gene expression and is one of the major challenges of the post-genomic era. Chromatin 3D organization plays an essential role in determining how CREs specifically contact their target genes while avoiding deleterious off-target interactions. Our understanding of these aspects has greatly advanced with the income of chromatin conformation capture techniques and fluorescence microscopy approaches to visualize the organization of DNA elements in the nucleus. Here we will summarize relevant aspects of how the interplay between CRE activity and chromatin 3D organization regulates developmental gene expression and how it relates to pathological conditions and the evolution of animal body plan.
    Keywords:  Architectural proteins; Chromatin 3D organization; Cis-regulatory elements; Enhanceropathies and TADopathies; Gene regulation; Vertebrate evolution
    DOI:  https://doi.org/10.1016/bs.ctdb.2024.01.001
  13. Nat Genet. 2024 May 09.
      Chromatin modifications are linked with regulating patterns of gene expression, but their causal role and context-dependent impact on transcription remains unresolved. Here we develop a modular epigenome editing platform that programs nine key chromatin modifications, or combinations thereof, to precise loci in living cells. We couple this with single-cell readouts to systematically quantitate the magnitude and heterogeneity of transcriptional responses elicited by each specific chromatin modification. Among these, we show that installing histone H3 lysine 4 trimethylation (H3K4me3) at promoters can causally instruct transcription by hierarchically remodeling the chromatin landscape. We further dissect how DNA sequence motifs influence the transcriptional impact of chromatin marks, identifying switch-like and attenuative effects within distinct cis contexts. Finally, we examine the interplay of combinatorial modifications, revealing that co-targeted H3K27 trimethylation (H3K27me3) and H2AK119 monoubiquitination (H2AK119ub) maximizes silencing penetrance across single cells. Our precision-perturbation strategy unveils the causal principles of how chromatin modification(s) influence transcription and dissects how quantitative responses are calibrated by contextual interactions.
    DOI:  https://doi.org/10.1038/s41588-024-01706-w
  14. Nat Methods. 2024 May 08.
      The spatial distribution of cell surface proteins governs vital processes of the immune system such as intercellular communication and mobility. However, fluorescence microscopy has limited scalability in the multiplexing and throughput needed to drive spatial proteomics discoveries at subcellular level. We present Molecular Pixelation (MPX), an optics-free, DNA sequence-based method for spatial proteomics of single cells using antibody-oligonucleotide conjugates (AOCs) and DNA-based, nanometer-sized molecular pixels. The relative locations of AOCs are inferred by sequentially associating them into local neighborhoods using the sequence-unique DNA pixels, forming >1,000 spatially connected zones per cell in 3D. For each single cell, DNA-sequencing reads are computationally arranged into spatial proteomics networks for 76 proteins. By studying immune cell dynamics using spatial statistics on graph representations of the data, we identify known and new patterns of spatial organization of proteins on chemokine-stimulated T cells, highlighting the potential of MPX in defining cell states by the spatial arrangement of proteins.
    DOI:  https://doi.org/10.1038/s41592-024-02268-9
  15. FEBS Lett. 2024 May 09.
      The expression level of a gene can vary between genetically identical cells under the same environmental condition-a phenomenon referred to as gene expression noise. Several studies have now elucidated a central role of transcription factors in the generation of expression noise. Transcription factors, as the key components of gene regulatory networks, drive many important cellular decisions in response to cellular and environmental signals. Therefore, a very relevant question is how expression noise impacts gene regulation and influences cellular decision-making. In this Review, we summarize the current understanding of the molecular origins of expression noise, highlighting the role of transcription factors in this process, and discuss the ways in which noise can influence cellular decision-making. As advances in single-cell technologies open new avenues for studying expression noise as well as gene regulatory circuits, a better understanding of the influence of noise on cellular decisions will have important implications for many biological processes.
    Keywords:  bistability; cellular decisions; gene expression noise; gene regulatory network; transcription factors
    DOI:  https://doi.org/10.1002/1873-3468.14898
  16. Nat Biotechnol. 2024 May 09.
      Single-cell chromatin accessibility sequencing (scATAC-seq) reconstructs developmental trajectory by phenotypic similarity. However, inferring the exact developmental trajectory is challenging. Previous studies showed age-associated DNA methylation (DNAm) changes in specific genomic regions, termed clock-like differential methylation loci (ClockDML). Age-associated DNAm could either result from or result in chromatin accessibility changes at ClockDML. As cells undergo mitosis, the heterogeneity of chromatin accessibility on clock-like loci is reduced, providing a measure of mitotic age. In this study, we developed a method, called EpiTrace, that counts the fraction of opened clock-like loci from scATAC-seq data to determine cell age and perform lineage tracing in various cell lineages and animal species. It shows concordance with known developmental hierarchies, correlates well with DNAm-based clocks and is complementary with mutation-based lineage tracing, RNA velocity and stemness predictions. Applying EpiTrace to scATAC-seq data reveals biological insights with clinically relevant implications, ranging from hematopoiesis, organ development, tumor biology and immunity to cortical gyrification.
    DOI:  https://doi.org/10.1038/s41587-024-02241-z
  17. bioRxiv. 2024 Apr 26. pii: 2024.04.26.590475. [Epub ahead of print]
      Current technologies for upregulation of endogenous genes use targeted artificial transcriptional activators but stable gene activation requires persistent expression of these synthetic factors. Although general "hit-and-run" strategies exist for inducing long-term silencing of endogenous genes using targeted artificial transcriptional repressors, to our knowledge no equivalent approach for gene activation has been described to date. Here we show stable gene activation can be achieved by harnessing endogenous transcription factors ( EndoTF s) that are normally expressed in human cells. Specifically, EndoTFs can be recruited to activate endogenous human genes of interest by using CRISPR-based gene editing to introduce EndoTF DNA binding motifs into a target gene promoter. This Precision Editing of Regulatory Sequences to Induce Stable Transcription-On ( PERSIST-On ) approach results in stable long-term gene activation, which we show is durable for at least five months. Using a high-throughput CRISPR prime editing pooled screening method, we also show that the magnitude of gene activation can be finely tuned either by using binding sites for different EndoTF or by introducing specific mutations within such sites. Our results delineate a generalizable framework for using PERSIST-On to induce heritable and fine-tunable gene activation in a hit-and-run fashion, thereby enabling a wide range of research and therapeutic applications that require long-term upregulation of a target gene.
    DOI:  https://doi.org/10.1101/2024.04.26.590475
  18. Nat Genet. 2024 May 09.
      Concurrent readout of sequence and base modifications from long unamplified DNA templates by Pacific Biosciences of California (PacBio) single-molecule sequencing requires large amounts of input material. Here we adapt Tn5 transposition to introduce hairpin oligonucleotides and fragment (tagment) limiting quantities of DNA for generating PacBio-compatible circular molecules. We developed two methods that implement tagmentation and use 90-99% less input than current protocols: (1) single-molecule real-time sequencing by tagmentation (SMRT-Tag), which allows detection of genetic variation and CpG methylation; and (2) single-molecule adenine-methylated oligonucleosome sequencing assay by tagmentation (SAMOSA-Tag), which uses exogenous adenine methylation to add a third channel for probing chromatin accessibility. SMRT-Tag of 40 ng or more human DNA (approximately 7,000 cell equivalents) yielded data comparable to gold standard whole-genome and bisulfite sequencing. SAMOSA-Tag of 30,000-50,000 nuclei resolved single-fiber chromatin structure, CTCF binding and DNA methylation in patient-derived prostate cancer xenografts and uncovered metastasis-associated global epigenome disorganization. Tagmentation thus promises to enable sensitive, scalable and multimodal single-molecule genomics for diverse basic and clinical applications.
    DOI:  https://doi.org/10.1038/s41588-024-01748-0
  19. Methods Cell Biol. 2024 ;pii: S0091-679X(24)00053-0. [Epub ahead of print]186 311-332
      Spectral flow cytometry has emerged as a significant player in the cytometry marketplace, with the potential for rapid growth. Despite a slow start, the technology has made significant strides in advancing various areas of single-cell analysis utilized by the scientific community. The integration of spectral cytometry into clinical laboratories and diagnostic processes is currently underway and is expected to garner a significant level of widespread acceptance in the near future. However, incorporating a new methodological approach into existing research programs can lead to misunderstandings or even misuse. This chapter offers an introductory yet comprehensive explanation of the scientific principles that form the foundation of spectral cytometry. Specifically, it delves into the unmixing processes that are utilized for data analysis. This overview is designed for those who are new to the field and seeking an informative guide to this exciting emerging technology.
    Keywords:  Compensation; Polychromatic cytometry; Spectral cytometry; Spectral unmixing
    DOI:  https://doi.org/10.1016/bs.mcb.2024.02.022
  20. BMC Genomics. 2024 May 06. 25(1): 444
       BACKGROUND: Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data.
    MAIN BODY: The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis.
    CONCLUSIONS: According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.
    Keywords:  Biological variability; Normalization; Single-cell sequencing; Technical variability; scRNA-seq
    DOI:  https://doi.org/10.1186/s12864-024-10364-5