bims-gerecp Biomed News
on Gene regulatory networks of epithelial cell plasticity
Issue of 2024–04–07
eleven papers selected by
Xiao Qin, University of Oxford



  1. Genome Res. 2024 Apr 05.
      Transcriptional regulation controls cellular functions through interactions between transcription factors (TFs) and their chromosomal targets. However, understanding the fate conversion potential of multiple TFs in an inducible manner remains limited. Here, we introduce iTF-seq as a method for identifying individual TFs that can alter cell fate toward specific lineages at a single-cell level. iTF-seq enables time course monitoring of transcriptome changes, and with biotinylated individual TFs, it provides a multi-omics approach to understanding the mechanisms behind TF-mediated cell fate changes. Our iTF-seq study in mouse embryonic stem cells identified multiple TFs that trigger rapid transcriptome changes indicative of differentiation within a day of induction. Moreover, cells expressing these potent TFs often show a slower cell cycle and increased cell death. Further analysis using bioChIP-seq revealed that GCM1 and OTX2 act as pioneer factors and activators by increasing gene accessibility and activating the expression of lineage specification genes during cell fate conversion. iTF-seq has utility in both mapping cell fate conversion and understanding cell fate conversion mechanisms.
    DOI:  https://doi.org/10.1101/gr.277926.123
  2. bioRxiv. 2024 Mar 21. pii: 2024.03.19.585637. [Epub ahead of print]
      Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cell fate in developmental systems. However, identifying the molecular hallmarks of potency - the capacity of a cell to differentiate into other cell types - has remained challenging. Here, we introduce CytoTRACE 2, an interpretable deep learning framework for characterizing potency and differentiation states on an absolute scale from scRNA-seq data. Across 31 human and mouse scRNA-seq datasets encompassing 28 tissue types, CytoTRACE 2 outperformed existing methods for recovering experimentally determined potency levels and differentiation states covering the entire range of cellular ontogeny. Moreover, it reconstructed the temporal hierarchy of mouse embryogenesis across 62 timepoints; identified pan-tissue expression programs that discriminate major potency levels; and facilitated discovery of cellular phenotypes in cancer linked to survival and immunotherapy resistance. Our results illuminate a fundamental feature of cell biology and provide a broadly applicable platform for delineating single-cell differentiation landscapes in health and disease.
    DOI:  https://doi.org/10.1101/2024.03.19.585637
  3. Nat Biotechnol. 2024 Apr 05.
      Single-cell RNA sequencing has been widely used to investigate cell state transitions and gene dynamics of biological processes. Current strategies to infer the sequential dynamics of genes in a process typically rely on constructing cell pseudotime through cell trajectory inference. However, the presence of concurrent gene processes in the same group of cells and technical noise can obscure the true progression of the processes studied. To address this challenge, we present GeneTrajectory, an approach that identifies trajectories of genes rather than trajectories of cells. Specifically, optimal transport distances are calculated between gene distributions across the cell-cell graph to extract gene programs and define their gene pseudotemporal order. Here we demonstrate that GeneTrajectory accurately extracts progressive gene dynamics in myeloid lineage maturation. Moreover, we show that GeneTrajectory deconvolves key gene programs underlying mouse skin hair follicle dermal condensate differentiation that could not be resolved by cell trajectory approaches. GeneTrajectory facilitates the discovery of gene programs that control the changes and activities of biological processes.
    DOI:  https://doi.org/10.1038/s41587-024-02186-3
  4. Nat Biotechnol. 2024 Apr 02.
      A key challenge of analyzing data from high-resolution spatial profiling technologies is to suitably represent the features of cellular neighborhoods or niches. Here we introduce the covariance environment (COVET), a representation that leverages the gene-gene covariate structure across cells in the niche to capture the multivariate nature of cellular interactions within it. We define a principled optimal transport-based distance metric between COVET niches that scales to millions of cells. Using COVET to encode spatial context, we developed environmental variational inference (ENVI), a conditional variational autoencoder that jointly embeds spatial and single-cell RNA sequencing data into a latent space. ENVI includes two decoders: one to impute gene expression across the spatial modality and a second to project spatial information onto single-cell data. ENVI can confer spatial context to genomics data from single dissociated cells and outperforms alternatives for imputing gene expression on diverse spatial datasets.
    DOI:  https://doi.org/10.1038/s41587-024-02193-4
  5. Elife. 2024 Apr 04. pii: RP92559. [Epub ahead of print]13
      Oncogenic mutations in KRAS are among the most common in cancer. Classical models suggest that loss of epithelial characteristics and the acquisition of mesenchymal traits are associated with cancer aggressiveness and therapy resistance. However, the mechanistic link between these phenotypes and mutant KRAS biology remains to be established. Here, we identify STAT3 as a genetic modifier of TGF-beta-induced epithelial to mesenchymal transition. Gene expression profiling of pancreatic cancer cells identifies more than 200 genes commonly regulated by STAT3 and oncogenic KRAS. Functional classification of the STAT3-responsive program reveals its major role in tumor maintenance and epithelial homeostasis. The signatures of STAT3-activated cell states can be projected onto human KRAS mutant tumors, suggesting that they faithfully reflect characteristics of human disease. These observations have implications for therapeutic intervention and tumor aggressiveness.
    Keywords:  KRAS; STAT3; cancer biology; cancer dependency; epithelial carcinogenesis; mouse
    DOI:  https://doi.org/10.7554/eLife.92559
  6. PeerJ. 2024 ;12 e17102
      The standard theory of evolution proposes that mutations cause heritable variations, which are naturally selected, leading to evolution. However, this mutation-led evolution (MLE) is being questioned by an alternative theory called plasticity-led evolution (PLE). PLE suggests that an environmental change induces adaptive phenotypes, which are later genetically accommodated. According to PLE, developmental systems should be able to respond to environmental changes adaptively. However, developmental systems are known to be robust against environmental and mutational perturbations. Thus, we expect a transition from a robust state to a plastic one. To test this hypothesis, we constructed a gene regulatory network (GRN) model that integrates developmental processes, hierarchical regulation, and environmental cues. We then simulated its evolution over different magnitudes of environmental changes. Our findings indicate that this GRN model exhibits PLE under large environmental changes and MLE under small environmental changes. Furthermore, we observed that the GRN model is susceptible to environmental or genetic fluctuations under large environmental changes but is robust under small environmental changes. This indicates a breakdown of robustness due to large environmental changes. Before the breakdown of robustness, the distribution of phenotypes is biased and aligned to the environmental changes, which would facilitate rapid adaptation should a large environmental change occur. These observations suggest that the evolutionary transition from mutation-led to plasticity-led evolution is due to a developmental transition from robust to susceptible regimes over increasing magnitudes of environmental change. Thus, the GRN model can reconcile these conflicting theories of evolution.
    Keywords:  Adaptability; Cryptic genetic variation; Developmental bias; Environmental change; EvoDevo; Genetic accommodation; Phenotypic plasticity; Robustness
    DOI:  https://doi.org/10.7717/peerj.17102
  7. bioRxiv. 2024 Mar 16. pii: 2024.03.14.585078. [Epub ahead of print]
      Single-cell transcriptomics, in conjunction with genetic and compound perturbations, offers a robust approach for exploring cellular behaviors in diverse contexts. Such experiments allow un-covering cell-state-specific responses to perturbations, a crucial aspect in unraveling the intricate molecular mechanisms governing cellular behavior and potentially discovering novel regulatory pathways and therapeutic targets. However, prevailing computational methods predominantly focus on predicting average cellular responses, disregarding the inherent response heterogeneity associated with cell state diversity. In this study, we present CellCap, a deep generative model designed for the end-to-end analysis of single-cell perturbation experiments. CellCap employs sparse dictionary learning in a latent space to deconstruct cell-state-specific perturbation responses into a set of transcriptional response programs. These programs are then utilized by each perturbation condition and each cell at varying degrees. The incorporation of specific model design choices, such as dot-product cross-attention between cell states and response programs, along with a linearly-decoded latent space, underlay the interpretation power of CellCap. We evaluate CellCap's model interpretability through multiple simulated scenarios and apply it to two real single-cell perturbation datasets. These datasets feature either heterogeneous cellular populations or a complex experimental setup. Our results demonstrate that CellCap successfully uncovers the relationship between cell state and perturbation response, unveiling novel insights overlooked in previous analyses. The model's interpretability, coupled with its effectiveness in capturing heterogeneous responses, positions CellCap as a valuable tool for advancing our understanding of cellular behaviors in the context of perturbation experiments.
    DOI:  https://doi.org/10.1101/2024.03.14.585078
  8. PLoS Comput Biol. 2024 Apr 05. 20(4): e1012006
      Single-cell RNA sequencing (scRNASeq) data plays a major role in advancing our understanding of developmental biology. An important current question is how to classify transcriptomic profiles obtained from scRNASeq experiments into the various cell types and identify the lineage relationship for individual cells. Because of the fast accumulation of datasets and the high dimensionality of the data, it has become challenging to explore and annotate single-cell transcriptomic profiles by hand. To overcome this challenge, automated classification methods are needed. Classical approaches rely on supervised training datasets. However, due to the difficulty of obtaining data annotated at single-cell resolution, we propose instead to take advantage of partial annotations. The partial label learning framework assumes that we can obtain a set of candidate labels containing the correct one for each data point, a simpler setting than requiring a fully supervised training dataset. We study and extend when needed state-of-the-art multi-class classification methods, such as SVM, kNN, prototype-based, logistic regression and ensemble methods, to the partial label learning framework. Moreover, we study the effect of incorporating the structure of the label set into the methods. We focus particularly on the hierarchical structure of the labels, as commonly observed in developmental processes. We show, on simulated and real datasets, that these extensions enable to learn from partially labeled data, and perform predictions with high accuracy, particularly with a nonlinear prototype-based method. We demonstrate that the performances of our methods trained with partially annotated data reach the same performance as fully supervised data. Finally, we study the level of uncertainty present in the partially annotated data, and derive some prescriptive results on the effect of this uncertainty on the accuracy of the partial label learning methods. Overall our findings show how hierarchical and non-hierarchical partial label learning strategies can help solve the problem of automated classification of single-cell transcriptomic profiles, interestingly these methods rely on a much less stringent type of annotated datasets compared to fully supervised learning methods.
    DOI:  https://doi.org/10.1371/journal.pcbi.1012006
  9. bioRxiv. 2024 Mar 19. pii: 2024.03.18.585595. [Epub ahead of print]
      Tumors are comprised of a mixture of distinct cell populations that differ in terms of genetic makeup and function. Such heterogeneity plays a role in the development of drug resistance and the ineffectiveness of targeted cancer therapies. Insight into this complexity can be obtained through the construction of a phylogenetic tree, which illustrates the evolutionary lineage of tumor cells as they acquire mutations over time. We propose Canopy2, a Bayesian framework that uses single nucleotide variants derived from bulk DNA and single-cell RNA sequencing to infer tumor phylogeny and conduct mutational profiling of tumor subpopulations. Canopy2 uses Markov chain Monte Carlo methods to sample from a joint probability distribution involving a mixture of binomial and beta-binomial distributions, specifically chosen to account for the sparsity and stochasticity of the single-cell data. Canopy2 demystifies the sources of zeros in the single-cell data and separates zeros categorized as non-cancerous (cells without mutations), stochastic (mutations not expressed due to bursting), and technical (expressed mutations not picked up by sequencing). Simulations demonstrate that Canopy2 consistently outperforms competing methods and reconstructs the clonal tree with high fidelity, even in situations involving low sequencing depth, poor single-cell yield, and highly-advanced and polyclonal tumors. We further assess the performance of Canopy2 through application to breast cancer and glioblastoma data, benchmarking against existing methods. Canopy2 is an open-source R package available at https://github.com/annweideman/canopy2 .
    DOI:  https://doi.org/10.1101/2024.03.18.585595
  10. bioRxiv. 2024 Mar 22. pii: 2024.03.20.585815. [Epub ahead of print]
      Genome-wide identification of chromatin organization and structure has been generally probed by measuring accessibility of the underlying DNA to nucleases or methyltransferases. These methods either only observe the positioning of a single nucleosome or rely on large enzymes to modify or cleave the DNA. We developed adduct sequencing (Add-seq), a method to probe chromatin accessibility by treating chromatin with the small molecule angelicin, which preferentially intercalates into DNA not bound to core nucleosomes. We show that Nanopore sequencing of the angelicin-modified DNA is possible and allows visualization and analysis of long single molecules with distinct chromatin structure. The angelicin modification can be detected from the Nanopore current signal data using a neural network model trained on unmodified and modified chromatin-free DNA. Applying Add-seq to Saccharomyces cerevisiae nuclei, we identified expected patterns of accessibility around annotated gene loci in yeast. We also identify individual clusters of single molecule reads displaying different chromatin structure at specific yeast loci, which demonstrates heterogeneity in the chromatin structure of the yeast population. Thus, using Add-seq, we are able to profile DNA accessibility in the yeast genome across long molecules.
    GRAPHICAL ABSTRACT:
    DOI:  https://doi.org/10.1101/2024.03.20.585815