bims-crepig Biomed News
on Chromatin regulation and epigenetics in cell fate and cancer
Issue of 2023–10–29
twelve papers selected by
Connor Rogerson, University of Cambridge



  1. Genome Res. 2023 Oct 26.
      Housekeeping genes are considered to be regulated by common enhancers across different tissues. Here we report that most of the commonly expressed mouse or human genes across different cell types, including more than half of the previously identified housekeeping genes, are associated with cell type-specific enhancers. Furthermore, the binding of most transcription factors (TFs) is cell type-specific. We reason that these cell type specificities are causally related to the collective TF recruitment at regulatory sites, as TFs tend to bind to regions associated with many other TFs and each cell type has a unique repertoire of expressed TFs. Based on binding profiles of hundreds of TFs from HepG2, K562, and GM12878 cells, we show that 80% of all TF peaks overlapping H3K27ac signals are in the top 20,000-23,000 most TF-enriched H3K27ac peak regions, and approximately 12,000-15,000 of these peaks are enhancers (nonpromoters). Those enhancers are mainly cell type-specific and include those linked to the majority of commonly expressed genes. Moreover, we show that the top 15,000 most TF-enriched regulatory sites in HepG2 cells, associated with about 200 TFs, can be predicted largely from the binding profile of as few as 30 TFs. Through motif analysis, we show that major enhancers harbor diverse and clustered motifs from a combination of available TFs uniquely present in each cell type. We propose a mechanism that explains how the highly focused TF binding at regulatory sites results in cell type specificity of enhancers for housekeeping and commonly expressed genes.
    DOI:  https://doi.org/10.1101/gr.278130.123
  2. Nucleic Acids Res. 2023 Oct 27. pii: gkad925. [Epub ahead of print]
      Enhancer RNAs (eRNAs) transcribed from distal active enhancers serve as key regulators in gene transcriptional regulation. The accumulation of eRNAs from multiple sequencing assays has led to an urgent need to comprehensively collect and process these data to illustrate the regulatory landscape of eRNAs. To address this need, we developed the eRNAbase (http://bio.liclab.net/eRNAbase/index.php) to store the massive available resources of human and mouse eRNAs and provide comprehensive annotation and analyses for eRNAs. The current version of eRNAbase cataloged 10 399 928 eRNAs from 1012 samples, including 858 human samples and 154 mouse samples. These eRNAs were first identified and uniformly processed from 14 eRNA-related experiment types manually collected from GEO/SRA and ENCODE. Importantly, the eRNAbase provides detailed and abundant (epi)genetic annotations in eRNA regions, such as super enhancers, enhancers, common single nucleotide polymorphisms, expression quantitative trait loci, transcription factor binding sites, CRISPR/Cas9 target sites, DNase I hypersensitivity sites, chromatin accessibility regions, methylation sites, chromatin interactions regions, topologically associating domains and RNA spatial interactions. Furthermore, the eRNAbase provides users with three novel analyses including eRNA-mediated pathway regulatory analysis, eRNA-based variation interpretation analysis and eRNA-mediated TF-target gene analysis. Hence, eRNAbase is a powerful platform to query, browse and visualize regulatory cues associated with eRNAs.
    DOI:  https://doi.org/10.1093/nar/gkad925
  3. Nucleic Acids Res. 2023 Oct 27. 51(19): 10261-10277
      Three-dimensional (3D) chromatin structure is linked to transcriptional regulation in multicellular eukaryotes including plants. Taking advantage of high-resolution Hi-C (high-throughput chromatin conformation capture), we detected a small structural unit with 3D chromatin architecture in the Arabidopsis genome, which lacks topologically associating domains, and also in the genomes of tomato, maize, and Marchantia polymorpha. The 3D folding domain unit was usually established around an individual gene and was dependent on chromatin accessibility at the transcription start site (TSS) and transcription end site (TES). We also observed larger contact domains containing two or more neighboring genes, which were dependent on accessible border regions. Binding of transcription factors to accessible TSS/TES regions formed these gene domains. We successfully simulated these Hi-C contact maps via computational modeling using chromatin accessibility as input. Our results demonstrate that gene domains establish basic 3D chromatin architecture units that likely contribute to higher-order 3D genome folding in plants.
    DOI:  https://doi.org/10.1093/nar/gkad710
  4. Nucleic Acids Res. 2023 Oct 27. pii: gkad872. [Epub ahead of print]
      Cooperative DNA-binding by transcription factor (TF) proteins is critical for eukaryotic gene regulation. In the human genome, many regulatory regions contain TF-binding sites in close proximity to each other, which can facilitate cooperative interactions. However, binding site proximity does not necessarily imply cooperative binding, as TFs can also bind independently to each of their neighboring target sites. Currently, the rules that drive cooperative TF binding are not well understood. In addition, it is oftentimes difficult to infer direct TF-TF cooperativity from existing DNA-binding data. Here, we show that in vitro binding assays using DNA libraries of a few thousand genomic sequences with putative cooperative TF-binding events can be used to develop accurate models of cooperativity and to gain insights into cooperative binding mechanisms. Using factors ETS1 and RUNX1 as our case study, we show that the distance and orientation between ETS1 sites are critical determinants of cooperative ETS1-ETS1 binding, while cooperative ETS1-RUNX1 interactions show more flexibility in distance and orientation and can be accurately predicted based on the affinity and sequence/shape features of the binding sites. The approach described here, combining custom experimental design with machine-learning modeling, can be easily applied to study the cooperative DNA-binding patterns of any TFs.
    DOI:  https://doi.org/10.1093/nar/gkad872
  5. Nat Commun. 2023 Oct 21. 14(1): 6678
      In mammals, insulators contribute to the regulation of loop extrusion to organize chromatin into topologically associating domains. In Drosophila the role of insulators in 3D genome organization is, however, under current debate. Here, we addressed this question by combining bioinformatics analysis and multiplexed chromatin imaging. We describe a class of Drosophila insulators enriched at regions forming preferential chromatin interactions genome-wide. Notably, most of these 3D interactions do not involve TAD borders. Multiplexed imaging shows that these interactions occur infrequently, and only rarely involve multiple genomic regions coalescing together in space in single cells. Finally, we show that non-border preferential 3D interactions enriched in this class of insulators are present before TADs and transcription during Drosophila development. Our results are inconsistent with insulators forming stable hubs in single cells, and instead suggest that they fine-tune existing 3D chromatin interactions, providing an additional regulatory layer for transcriptional regulation.
    DOI:  https://doi.org/10.1038/s41467-023-42485-y
  6. Development. 2023 Oct 26. pii: dev.202111. [Epub ahead of print]
      The node and notochord are important signaling centers organizing dorso-ventral patterning of cells arising from neuro-mesodermal progenitors forming the embryonic body anlage. Due to the scarcity of notochord progenitors and notochord cells, a comprehensive identification of regulatory elements driving notochord-specific gene expression has been lacking. Here we have used ATAC-seq analysis of FACS-purified notochord cells from TS12-13 mouse embryos to identify 8921 putative notochord enhancers. In addition, we established a new model for generating notochord-like cells in culture, and found 3728 of these enhancers occupied by the essential notochord control factors Brachyury (T) and/or Foxa2. We describe the regulatory landscape of the T locus comprising 10 putative enhancers occupied by these factors and confirmed the regulatory activity of 3 of these elements. Moreover, we characterized 7 new elements via knockout analysis in embryos and identified one new notochord enhancer, termed TNE2. TNE2 cooperates with TNE in the trunk notochord, and is essential for notochord differentiation in the tail. Our data emphasize the essential role of Foxa2 in directing T expressing cells towards the notochord lineage.
    Keywords:   Brachyury ; Development; Embryo; Enhancer; Mouse; Notochord
    DOI:  https://doi.org/10.1242/dev.202111
  7. Cell Genom. 2023 Oct 11. 3(10): 100411
      Intergenic transcription in normal and cancerous tissues is pervasive but incompletely understood. To investigate this, we constructed an atlas of over 180,000 consensus RNA polymerase II (RNAPII)-bound intergenic regions from 900 RNAPII chromatin immunoprecipitation sequencing (ChIP-seq) experiments in normal and cancer samples. Through unsupervised analysis, we identified 51 RNAPII consensus clusters, many of which mapped to specific biotypes and revealed tissue-specific regulatory signatures. We developed a meta-clustering methodology to integrate our RNAPII atlas with active transcription across 28,797 RNA sequencing (RNA-seq) samples from The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), and Encyclopedia of DNA Elements (ENCODE). This analysis revealed strong tissue- and disease-specific interconnections between RNAPII occupancy and transcriptional activity. We demonstrate that intergenic transcription at RNAPII-bound regions is a novel per-cancer and pan-cancer biomarker. This biomarker displays genomic and clinically relevant characteristics, distinguishing cancer subtypes and linking to overall survival. Our results demonstrate the effectiveness of coherent data integration to uncover intergenic transcriptional activity in normal and cancer tissues.
    Keywords:  RNA Polymerase II; cancer genomics; enhancers; gene regulation; intergenic; non-coding DNA; non-coding transcription; regulatory genomics
    DOI:  https://doi.org/10.1016/j.xgen.2023.100411
  8. PLoS Biol. 2023 Oct;21(10): e3002354
      The N-terminal tails of eukaryotic histones are frequently posttranslationally modified. The role of these modifications in transcriptional regulation is well-documented. However, the extent to which the enzymatic processes of histone posttranslational modification might affect metabolic regulation is less clear. Here, we investigated how histone methylation might affect metabolism using metabolomics, proteomics, and RNA-seq data from cancer cell lines, primary tumour samples and healthy tissue samples. In cancer, the expression of histone methyltransferases (HMTs) was inversely correlated to the activity of NNMT, an enzyme previously characterised as a methyl sink that disposes of excess methyl groups carried by the universal methyl donor S-adenosyl methionine (SAM or AdoMet). In healthy tissues, histone methylation was inversely correlated to the levels of an alternative methyl sink, PEMT. These associations affected the levels of multiple histone marks on chromatin genome-wide but had no detectable impact on transcriptional regulation. We show that HMTs with a variety of different associations to transcription are co-regulated by the Retinoblastoma (Rb) tumour suppressor in human cells. Rb-mutant cancers show increased total HMT activity and down-regulation of NNMT. Together, our results suggest that the total activity of HMTs affects SAM metabolism, independent of transcriptional regulation.
    DOI:  https://doi.org/10.1371/journal.pbio.3002354
  9. Genome Biol. 2023 Oct 24. 24(1): 244
       BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) measures gene expression in single cells, while single-nucleus ATAC-sequencing (snATAC-seq) quantifies chromatin accessibility in single nuclei. These two data types provide complementary information for deciphering cell types and states. However, when analyzed individually, they sometimes produce conflicting results regarding cell type/state assignment. The power is compromised since the two modalities reflect the same underlying biology. Recently, it has become possible to measure both gene expression and chromatin accessibility from the same nucleus. Such paired data enable the direct modeling of the relationships between the two modalities. Given the availability of the vast amount of single-modality data, it is desirable to integrate the paired and unpaired single-modality datasets to gain a comprehensive view of the cellular complexity.
    RESULTS: We benchmark nine existing single-cell multi-omic data integration methods. Specifically, we evaluate to what extent the multiome data provide additional guidance for analyzing the existing single-modality data, and whether these methods uncover peak-gene associations from single-modality data. Our results indicate that multiome data are helpful for annotating single-modality data. However, we emphasize that the availability of an adequate number of nuclei in the multiome dataset is crucial for achieving accurate cell type annotation. Insufficient representation of nuclei may compromise the reliability of the annotations. Additionally, when generating a multiome dataset, the number of cells is more important than sequencing depth for cell type annotation.
    CONCLUSIONS: Seurat v4 is the best currently available platform for integrating scRNA-seq, snATAC-seq, and multiome data even in the presence of complex batch effects.
    DOI:  https://doi.org/10.1186/s13059-023-03073-x
  10. Science. 2023 Oct 27. 382(6669): 451-458
      Enteroendocrine cells (EECs) are hormone-producing cells residing in the epithelium of stomach, small intestine (SI), and colon. EECs regulate aspects of metabolic activity, including insulin levels, satiety, gastrointestinal secretion, and motility. The generation of different EEC lineages is not completely understood. In this work, we report a CRISPR knockout screen of the entire repertoire of transcription factors (TFs) in adult human SI organoids to identify dominant TFs controlling EEC differentiation. We discovered ZNF800 as a master repressor for endocrine lineage commitment, which particularly restricts enterochromaffin cell differentiation by directly controlling an endocrine TF network centered on PAX4. Thus, organoid models allow unbiased functional CRISPR screens for genes that program cell fate.
    DOI:  https://doi.org/10.1126/science.adi2246
  11. Nucleic Acids Res. 2023 Oct 23. pii: gkad901. [Epub ahead of print]
      Annotating genetic variants to their target genes is of great importance in unraveling the causal variants and genetic mechanisms that underlie complex diseases. However, disease-associated genetic variants are often located in non-coding regions and manifest context-specific effects, making it challenging to accurately identify the target genes and regulatory mechanisms. Here, we present TargetGene (https://ngdc.cncb.ac.cn/targetgene/), a comprehensive database reporting target genes for human genetic variants from various aspects. Specifically, we collected a comprehensive catalog of multi-omics data at the single-cell and bulk levels and from various human tissues, cell types and developmental stages. To facilitate the identification of Single Nucleotide Polymorphism (SNP)-to-gene connections, we have implemented multiple analytical tools based on chromatin co-accessibility, 3D interaction, enhancer activities and quantitative trait loci, among others. We applied the pipeline to evaluate variants from nearly 1300 Genome-wide association studies (GWAS) and assembled a comprehensive atlas of multiscale regulation of genetic variants. TargetGene is equipped with user-friendly web interfaces that enable intuitive searching, navigation and browsing through the results. Overall, TargetGene provides a unique resource to empower researchers to study the regulatory mechanisms of genetic variants in complex human traits.
    DOI:  https://doi.org/10.1093/nar/gkad901
  12. Plant Cell. 2023 Oct 25. pii: koad271. [Epub ahead of print]
      The nuclear pore complex (NPC) has multiple functions beyond the nucleo-cytoplasmic transport of large molecules. Sub-nuclear compartmentalization of chromatin is critical for gene expression in animals and yeast. However, the mechanism by which the NPC regulates gene expression is poorly understood in plants. Here we report that the Y-complex (Nup107-160 complex, a subcomplex of the NPC) self-maintains its nucleoporin homeostasis and modulates FLOWERING LOCUS C (FLC) transcription via changing histone modifications at this locus. We show that Y-complex nucleoporins are intimately associated with FLC chromatin through their interactions with histone H2A at the nuclear membrane. Fluorescence in situ hybridization assays revealed that Nup96, a Y-complex nucleoporin, enhances FLC positioning at the nuclear periphery. Nup96 interacted with HISTONE DEACETYLASE 6 (HDA6), a key repressor of FLC expression via histone modification, at the nuclear membrane to attenuate HDA6-catalyzed deposition at the FLC locus and change histone modifications. Moreover, we demonstrate that Y-complex nucleoporins interact with RNA polymerase II to increase its occupancy at the FLC locus, facilitating transcription. Collectively, our findings identify an attractive mechanism for the Y-complex in regulating FLC expression via tethering the locus at the nuclear periphery and altering its histone modification.
    DOI:  https://doi.org/10.1093/plcell/koad271