bims-crepig Biomed News
on Chromatin regulation and epigenetics in cell fate and cancer
Issue of 2021–02–07
29 papers selected by
Connor Rogerson, University of Cambridge, MRC Cancer Unit



  1. Nat Commun. 2021 01 27. 12(1): 626
      Master transcription factors reprogram cell fate in multicellular eukaryotes. Pioneer transcription factors have prominent roles in this process because of their ability to contact their cognate binding motifs in closed chromatin. Reprogramming is pervasive in plants, whose development is plastic and tuned by the environment, yet little is known about pioneer transcription factors in this kingdom. Here, we show that the master transcription factor LEAFY (LFY), which promotes floral fate through upregulation of the floral commitment factor APETALA1 (AP1), is a pioneer transcription factor. In vitro, LFY binds to the endogenous AP1 target locus DNA assembled into a nucleosome. In vivo, LFY associates with nucleosome occupied binding sites at the majority of its target loci, including AP1. Upon binding, LFY 'unlocks' chromatin locally by displacing the H1 linker histone and by recruiting SWI/SNF chromatin remodelers, but broad changes in chromatin accessibility occur later. Our study provides a mechanistic framework for patterning of inflorescence architecture and uncovers striking similarities between LFY and animal pioneer transcription factor.
    DOI:  https://doi.org/10.1038/s41467-020-20883-w
  2. Mol Cell. 2021 Jan 26. pii: S1097-2765(21)00002-2. [Epub ahead of print]
      Gene transcription occurs via a cycle of linked events, including initiation, promoter-proximal pausing, and elongation of RNA polymerase II (Pol II). A key question is how transcriptional enhancers influence these events to control gene expression. Here, we present an approach that evaluates the level and change in promoter-proximal transcription (initiation and pausing) in the context of differential gene expression, genome-wide. This combinatorial approach shows that in primary cells, control of gene expression during differentiation is achieved predominantly via changes in transcription initiation rather than via release of Pol II pausing. Using genetically engineered mouse models, deleted for functionally validated enhancers of the α- and β-globin loci, we confirm that these elements regulate Pol II recruitment and/or initiation to modulate gene expression. Together, our data show that gene expression during differentiation is regulated predominantly at the level of initiation and that enhancers are key effectors of this process.
    Keywords:  Poll II recruitment; enhancers; gene regulation; promoter proximal pausing; transcription
    DOI:  https://doi.org/10.1016/j.molcel.2021.01.002
  3. BMC Bioinformatics. 2021 Jan 30. 22(1): 35
       BACKGROUND: Assigning chromatin states genome-wide (e.g. promoters, enhancers, etc.) is commonly performed to improve functional interpretation of these states. However, computational methods to assign chromatin state suffer from the following drawbacks: they typically require data from multiple assays, which may not be practically feasible to obtain, and they depend on peak calling algorithms, which require careful parameterization and often exclude the majority of the genome. To address these drawbacks, we propose a novel learning technique built upon the Self-Organizing Map (SOM), Self-Organizing Map with Variable Neighborhoods (SOM-VN), to learn a set of representative shapes from a single, genome-wide, chromatin accessibility dataset to associate with a chromatin state assignment in which a particular RE is prevalent. These shapes can then be used to assign chromatin state using our workflow.
    RESULTS: We validate the performance of the SOM-VN workflow on 14 different samples of varying quality, namely one assay each of A549 and GM12878 cell lines and two each of H1 and HeLa cell lines, primary B-cells, and brain, heart, and stomach tissue. We show that SOM-VN learns shapes that are (1) non-random, (2) associated with known chromatin states, (3) generalizable across sets of chromosomes, and (4) associated with magnitude and multimodality. We compare the accuracy of SOM-VN chromatin states against the Clustering Aggregation Tool (CAGT), an unsupervised method that learns chromatin accessibility signal shapes but does not associate these shapes with REs, and we show that overall precision and recall is increased when learning shapes using SOM-VN as compared to CAGT. We further compare enhancer state assignments from SOM-VN in signals above a set threshold to enhancer state assignments from Predicting Enhancers from ATAC-seq Data (PEAS), a deep learning method that assigns enhancer chromatin states to peaks. We show that the precision-recall area under the curve for the assignment of enhancer states is comparable to PEAS.
    CONCLUSIONS: Our work shows that the SOM-VN workflow can learn relationships between REs and chromatin accessibility signal shape, which is an important step toward the goal of assigning and comparing enhancer state across multiple experiments and phenotypic states.
    Keywords:  ATAC-seq; Chromatin accessibility; Chromatin state assignment; DNase-seq; Enhancers; Machine learning; Promoters; RPKM signal shape; Regulatory elements; Self-organizing maps
    DOI:  https://doi.org/10.1186/s12859-021-03976-1
  4. Cell Rep. 2021 Feb 02. pii: S2211-1247(21)00016-4. [Epub ahead of print]34(5): 108703
      Using chromatin conformation capture, we show that an enhancer cluster in the STARD10 type 2 diabetes (T2D) locus forms a defined 3-dimensional (3D) chromatin domain. A 4.1-kb region within this locus, carrying 5 T2D-associated variants, physically interacts with CTCF-binding regions and with an enhancer possessing strong transcriptional activity. Analysis of human islet 3D chromatin interaction maps identifies the FCHSD2 gene as an additional target of the enhancer cluster. CRISPR-Cas9-mediated deletion of the variant region, or of the associated enhancer, from human pancreas-derived EndoC-βH1 cells impairs glucose-stimulated insulin secretion. Expression of both STARD10 and FCHSD2 is reduced in cells harboring CRISPR deletions, and lower expression of STARD10 and FCHSD2 is associated, the latter nominally, with the possession of risk variant alleles in human islets. Finally, CRISPR-Cas9-mediated loss of STARD10 or FCHSD2, but not ARAP1, impairs regulated insulin secretion. Thus, multiple genes at the STARD10 locus influence β cell function.
    Keywords:  FCHSD2; GWAS; STARD10; T2D; chromatin structure; enhancer cluster; gene regulation; genetic variant; insulin secretion; type 2 diabetes
    DOI:  https://doi.org/10.1016/j.celrep.2021.108703
  5. Sci Adv. 2021 Jan;pii: eabd4413. [Epub ahead of print]7(2):
      The chromatin-modifying histone deacetylases (HDACs) remove acetyl groups from acetyl-lysine residues in histone amino-terminal tails, thereby mediating transcriptional repression. Structural makeup and mechanisms by which multisubunit HDAC complexes recognize nucleosomes remain elusive. Our cryo-electron microscopy structures of the yeast class II HDAC ensembles show that the HDAC protomer comprises a triangle-shaped assembly of stoichiometry Hda12-Hda2-Hda3, in which the active sites of the Hda1 dimer are freely accessible. We also observe a tetramer of protomers, where the nucleosome binding modules are inaccessible. Structural analysis of the nucleosome-bound complexes indicates how positioning of Hda1 adjacent to histone H2B affords HDAC catalysis. Moreover, it reveals how an intricate network of multiple contacts between a dimer of protomers and the nucleosome creates a platform for expansion of the HDAC activities. Our study provides comprehensive insight into the structural plasticity of the HDAC complex and its functional mechanism of chromatin modification.
    DOI:  https://doi.org/10.1126/sciadv.abd4413
  6. Bioinformatics. 2021 Feb 03. pii: btab075. [Epub ahead of print]
      With the advance of genomic sequencing techniques, chromatin accessible regions, transcription factor binding sites and epigenetic modifications can be identified at genome-wide scale. Conventional analyses focus on the gene regulation at proximal regions; however, distal regions are usually less focused, largely due to the lack of reliable tools to link these regions to coding genes. In this study, we introduce RAD (Region Associated Differentially expressed genes), a user-friendly web tool to identify both proximal and distal region associated differentially expressed genes (DEGs). With DEGs and genomic regions of interest (gROI) as input, RAD maps the up- and down-regulated genes associated with any gROI and helps researchers to infer the regulatory function of these regions based on the distance of gROI to differentially expressed genes. RAD includes visualization of the results and statistical inference for significance.
    AVAILABILITY: RAD is implemented with Python 3.7 and run on a Nginx server. RAD is freely available at http://labw.org/rad as online web service.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab075
  7. Elife. 2021 Feb 02. pii: e65905. [Epub ahead of print]10
      Dysregulated gene expression contributes to most prevalent features in human cancers. Here, we show that most subtypes of acute myeloid leukemia (AML) depend on the aberrant assembly of MYB transcriptional co-activator complex. By rapid and selective peptidomimetic interference with the binding of CBP/P300 to MYB, but not CREB or MLL1, we find that the leukemic functions of MYB are mediated by CBP/P300 co-activation of a distinct set of transcription factor complexes. These MYB complexes assemble aberrantly with LYL1, E2A, C/EBP family members, LMO2 and SATB1. They are organized convergently in genetically diverse subtypes of AML, and are at least in part associated with inappropriate transcription factor co-expression. Peptidomimetic remodeling of oncogenic MYB complexes is accompanied by specific proteolysis and dynamic redistribution of CBP/P300 with alternative transcription factors such as RUNX1 to induce myeloid differentiation and apoptosis. Thus, aberrant assembly and sequestration of MYB:CBP/P300 complexes provide a unifying mechanism of oncogenic gene expression in AML. This work establishes a compelling strategy for their pharmacologic reprogramming and therapeutic targeting for diverse leukemias and possibly other human cancers caused by dysregulated gene control.
    Keywords:  cancer biology; human
    DOI:  https://doi.org/10.7554/eLife.65905
  8. Nat Commun. 2021 02 04. 12(1): 795
      Epigenetic modifications of DNA play important roles in many biological processes. Identifying readers of these epigenetic marks is a critical step towards understanding the underlying mechanisms. Here, we present an all-to-all approach, dubbed digital affinity profiling via proximity ligation (DAPPL), to simultaneously profile human TF-DNA interactions using mixtures of random DNA libraries carrying different epigenetic modifications (i.e., 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxylcytosine) on CpG dinucleotides. Many proteins that recognize consensus sequences carrying these modifications in symmetric and/or hemi-modified forms are identified. We further demonstrate that the modifications in different sequence contexts could either enhance or suppress TF binding activity. Moreover, many modifications can affect TF binding specificity. Furthermore, symmetric modifications show a stronger effect in either enhancing or suppressing TF-DNA interactions than hemi-modifications. Finally, in vivo evidence suggests that USF1 and USF2 might regulate transcription via hydroxymethylcytosine-binding activity in weak enhancers in human embryonic stem cells.
    DOI:  https://doi.org/10.1038/s41467-021-20950-w
  9. Nat Commun. 2021 02 02. 12(1): 734
      Driver genes with a mutually exclusive mutation pattern across tumor genomes are thought to have overlapping roles in tumorigenesis. In contrast, we show here that mutually exclusive prostate cancer driver alterations involving the ERG transcription factor and the ubiquitin ligase adaptor SPOP are synthetic sick. At the molecular level, the incompatible cancer pathways are driven by opposing functions in SPOP. ERG upregulates wild type SPOP to dampen androgen receptor (AR) signaling and sustain ERG activity through degradation of the bromodomain histone reader ZMYND11. Conversely, SPOP-mutant tumors stabilize ZMYND11 to repress ERG-function and enable oncogenic androgen receptor signaling. This dichotomy regulates the response to therapeutic interventions in the AR pathway. While mutant SPOP renders tumor cells susceptible to androgen deprivation therapies, ERG promotes sensitivity to high-dose androgen therapy and pharmacological inhibition of wild type SPOP. More generally, these results define a distinct class of antagonistic cancer drivers and a blueprint toward their therapeutic exploitation.
    DOI:  https://doi.org/10.1038/s41467-020-20820-x
  10. Science. 2021 Feb 05. pii: eabb4776. [Epub ahead of print]371(6529):
      During development, cells progress from a pluripotent state to a more restricted fate within a particular germ layer. However, cranial neural crest cells (CNCCs), a transient cell population that generates most of the craniofacial skeleton, have much broader differentiation potential than their ectodermal lineage of origin. Here, we identify a neuroepithelial precursor population characterized by expression of canonical pluripotency transcription factors that gives rise to CNCCs and is essential for craniofacial development. Pluripotency factor Oct4 is transiently reactivated in CNCCs and is required for the subsequent formation of ectomesenchyme. Furthermore, open chromatin landscapes of Oct4+ CNCC precursors resemble those of epiblast stem cells, with additional features suggestive of priming for mesenchymal programs. We propose that CNCCs expand their developmental potential through a transient reacquisition of molecular signatures of pluripotency.
    DOI:  https://doi.org/10.1126/science.abb4776
  11. Stem Cells. 2021 Feb 02.
      The LIF-JAK2-STAT3 pathway is the central signal transducer that maintains undifferentiated mouse ESCs (mESCs), which is achieved by the recruitment of activated STAT3 to the master pluripotency genes and activation of the gene transcriptions. It remains unclear, however, how the epigenetic status required for the master gene transcriptions is built into LIF-treated mESC cultures. In this study, Jak2, but not Stat3, in the LIF canonical pathway, establishes an open epigenetic status in the pluripotency gene promoter regions. Upon LIF activation, cytosolic JAK2 was translocalized into the nucleus of mESCs, and reduced DNA methylation (5mC levels) along with increasing DNA hydroxymethylation (5hmC) in the pluripotent gene (Nanog/Pou5f1) promoter regions. In addition, the repressive histone codes H3K9m3/H3K27m3 were reduced by JAK2. Activated JAK2 directly interacted with the core epigenetic enzymes TET1 and JMJD2, modulating its activity and promotes the DNA and histone demethylation, respectively. The JAK2 effects were attained by tyrosine phosphorylation on the epigenetic enzymes. The effects of JAK2 phosphorylation on the enzymes were diverse, but all were merged to the epigenetic signatures associated with open DNA/chromatin structures. Taken together, these results reveal a previously unrecognized epigenetic regulatory role of JAK2 as an important mediator of mESC maintenance. © AlphaMed Press 2021 SIGNIFICANCE STATEMENT: This study reveals underappreciated JAK2-mediated epigenetic control in maintaining mESC pluripotency. JAK2 activation by LIF induce JAK2 translocation to nucleus where it directly interacts with epigenetic regulator protein which ultimately affect the DNA and histone methylation of pluripotent genes. Briefly, JAK2 primed DNMT2 for degradation, while inducing activation of TET1 and JMJD2 that ultimately open the epigenetic status in the pluripotent genes promoter regions.
    Keywords:  Embryonic Stem Cells (ESCs); Epigenetics; Janus kinase (JAK); LIF
    DOI:  https://doi.org/10.1002/stem.3345
  12. Am J Hum Genet. 2021 Feb 04. pii: S0002-9297(21)00009-4. [Epub ahead of print]108(2): 257-268
      Genome-wide chromatin conformation capture technologies such as Hi-C are commonly employed to study chromatin spatial organization. In particular, to identify statistically significant long-range chromatin interactions from Hi-C data, most existing methods such as Fit-Hi-C/FitHiC2 and HiCCUPS assume that all chromatin interactions are statistically independent. Such an independence assumption is reasonable at low resolution (e.g., 40 kb bin) but is invalid at high resolution (e.g., 5 or 10 kb bins) because spatial dependency of neighboring chromatin interactions is non-negligible at high resolution. Our previous hidden Markov random field-based methods accommodate spatial dependency but are computationally intensive. It is urgent to develop approaches that can model spatial dependence in a computationally efficient and scalable manner. Here, we develop HiC-ACT, an aggregated Cauchy test (ACT)-based approach, to improve the detection of chromatin interactions by post-processing results from methods assuming independence. To benchmark the performance of HiC-ACT, we re-analyzed deeply sequenced Hi-C data from a human lymphoblastoid cell line, GM12878, and mouse embryonic stem cells (mESCs). Our results demonstrate advantages of HiC-ACT in improving sensitivity with controlled type I error. By leveraging information from neighboring chromatin interactions, HiC-ACT enhances the power to detect interactions with lower signal-to-noise ratio and similar (if not stronger) epigenetic signatures that suggest regulatory roles. We further demonstrate that HiC-ACT peaks show higher overlap with known enhancers than Fit-Hi-C/FitHiC2 peaks in both GM12878 and mESCs. HiC-ACT, effectively a summary statistics-based approach, is computationally efficient (∼6 min and ∼2 GB memory to process 25,000 pairwise interactions).
    Keywords:  chromatin interactions, Hi-C, HiC-ACT, aggregated Cauchy test, summary statistics-based approach
    DOI:  https://doi.org/10.1016/j.ajhg.2021.01.009
  13. Aging Cell. 2021 Feb 04. e13315
      Chromatin organization and transcriptional profiles undergo tremendous reordering during senescence. However, uncovering the regulatory mechanisms between chromatin reconstruction and gene expression in senescence has been elusive. Here, we depicted the landscapes of both chromatin accessibility and gene expression to reveal gene regulatory networks in human umbilical vein endothelial cell (HUVEC) senescence and found that chromatin accessibilities are redistributed during senescence. Particularly, the intergenic chromatin was massively shifted with the increased accessibility regions (IARs) or decreased accessibility regions (DARs), which were mainly enhancer elements. We defined AP-1 transcription factor family as being responsible for driving chromatin accessibility reconstruction in IARs, where low DNA methylation improved binding affinity of AP-1 and further increased the chromatin accessibility. Among AP-1 transcription factors, we confirmed ATF3 was critical to reconstruct chromatin accessibility to promote cellular senescence. Our results described a dynamic landscape of chromatin accessibility whose remodeling contributes to the senescence program, we identified that AP-1 was capable of reorganizing the chromatin accessibility profile to regulate senescence.
    Keywords:  AP-1; ATF3; DARs; DNA methylation; IARs; chromatin accessibility; heterochromatin; senescence
    DOI:  https://doi.org/10.1111/acel.13315
  14. Nucleic Acids Res. 2021 Feb 01. pii: gkab032. [Epub ahead of print]
      Glucocorticoid receptor (GR) is an essential transcription factor (TF), controlling metabolism, development and immune responses. SUMOylation regulates chromatin occupancy and target gene expression of GR in a locus-selective manner, but the mechanism of regulation has remained elusive. Here, we identify the protein network around chromatin-bound GR by using selective isolation of chromatin-associated proteins and show that the network is affected by receptor SUMOylation, with several nuclear receptor coregulators and chromatin modifiers preferring interaction with SUMOylation-deficient GR and proteins implicated in transcriptional repression preferring interaction with SUMOylation-competent GR. This difference is reflected in our chromatin binding, chromatin accessibility and gene expression data, showing that the SUMOylation-deficient GR is more potent in binding and opening chromatin at glucocorticoid-regulated enhancers and inducing expression of target loci. Blockage of SUMOylation by a SUMO-activating enzyme inhibitor (ML-792) phenocopied to a large extent the consequences of GR SUMOylation deficiency on chromatin binding and target gene expression. Our results thus show that SUMOylation modulates the specificity of GR by regulating its chromatin protein network and accessibility at GR-bound enhancers. We speculate that many other SUMOylated TFs utilize a similar regulatory mechanism.
    DOI:  https://doi.org/10.1093/nar/gkab032
  15. Genome Biol. 2021 Feb 02. 22(1): 55
      A bottleneck in high-throughput functional genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Gene Ontology (GO) enrichment methods provide insight at the gene set level. Here, we introduce GeneWalk ( github.com/churchmanlab/genewalk ) that identifies individual genes and their relevant functions critical for the experimental setting under examination. After the automatic assembly of an experiment-specific gene regulatory network, GeneWalk uses representation learning to quantify the similarity between vector representations of each gene and its GO annotations, yielding annotation significance scores that reflect the experimental context. By performing gene- and condition-specific functional analysis, GeneWalk converts a list of genes into data-driven hypotheses.
    Keywords:  Differential expression; Functional analysis; GO enrichment; Gene set enrichment analysis; GeneWalk; INDRA (Integrated Network and Dynamical Reasoning Assembler); Machine learning; NET-seq; Network representation learning; Next-generation sequencing; Pathway Commons; RNA-seq
    DOI:  https://doi.org/10.1186/s13059-021-02264-8
  16. Nucleic Acids Res. 2021 Feb 03. pii: gkab053. [Epub ahead of print]
      The ubiquitous family of dimeric transcription factors AP-1 is made up of Fos and Jun family proteins. It has long been thought to operate principally at gene promoters and how it controls transcription is still ill-understood. The Fos family protein Fra-1 is overexpressed in triple negative breast cancers (TNBCs) where it contributes to tumor aggressiveness. To address its transcriptional actions in TNBCs, we combined transcriptomics, ChIP-seqs, machine learning and NG Capture-C. Additionally, we studied its Fos family kin Fra-2 also expressed in TNBCs, albeit much less. Consistently with their pleiotropic effects, Fra-1 and Fra-2 up- and downregulate individually, together or redundantly many genes associated with a wide range of biological processes. Target gene regulation is principally due to binding of Fra-1 and Fra-2 at regulatory elements located distantly from cognate promoters where Fra-1 modulates the recruitment of the transcriptional co-regulator p300/CBP and where differences in AP-1 variant motif recognition can underlie preferential Fra-1- or Fra-2 bindings. Our work also shows no major role for Fra-1 in chromatin architecture control at target gene loci, but suggests collaboration between Fra-1-bound and -unbound enhancers within chromatin hubs sometimes including promoters for other Fra-1-regulated genes. Our work impacts our view of AP-1.
    DOI:  https://doi.org/10.1093/nar/gkab053
  17. Nucleic Acids Res. 2021 Feb 03. pii: gkaa1287. [Epub ahead of print]
      Interferon regulatory factor 4 (IRF4) is a key transcription factor (TF) in the regulation of immune cells, including B and T cells. It acts by binding DNA as both a homodimer and, in conjunction with other TFs, as a heterodimer. The choice of homo and heterodimeric/ DNA interactions is a critical aspect in the control of the transcriptional program and cell fate outcome. To characterize the nature of this interaction in the homodimeric complex, we have determined the crystal structure of the IRF4/ISRE homodimeric complex. We show that the complex formation is aided by a substantial DNA deformation with co-operative binding achieved exclusively through protein-DNA contact. This markedly contrasts with the heterodimeric form where DNA bound IRF4 is shown to physically interact with PU.1 TF to engage EICE1. We also show that the hotspot residues (Arg98, Cys99 and Asn102) contact both consensus and non-consensus sequences with the L1 loop exhibiting marked flexibility. Additionally, we identified that IRF4L116R, a mutant associated with chronic lymphocytic leukemia, binds more robustly to DNA thereby providing a rationale for the observed gain of function. Together, we demonstrate key structural differences between IRF4 homo and heterodimeric complexes, thereby providing molecular insights into IRF4-mediated transcriptional regulation.
    DOI:  https://doi.org/10.1093/nar/gkaa1287
  18. Nat Genet. 2021 Feb;53(2): 215-229
      Naive epiblast and embryonic stem cells (ESCs) give rise to all cells of adults. Such developmental plasticity is associated with genome hypomethylation. Here, we show that LIF-Stat3 signaling induces genomic hypomethylation via metabolic reconfiguration. Stat3-/- ESCs show decreased α-ketoglutarate production from glutamine, leading to increased Dnmt3a and Dnmt3b expression and DNA methylation. Notably, genome methylation is dynamically controlled through modulation of α-ketoglutarate availability or Stat3 activation in mitochondria. Alpha-ketoglutarate links metabolism to the epigenome by reducing the expression of Otx2 and its targets Dnmt3a and Dnmt3b. Genetic inactivation of Otx2 or Dnmt3a and Dnmt3b results in genomic hypomethylation even in the absence of active LIF-Stat3. Stat3-/- ESCs show increased methylation at imprinting control regions and altered expression of cognate transcripts. Single-cell analyses of Stat3-/- embryos confirmed the dysregulated expression of Otx2, Dnmt3a and Dnmt3b as well as imprinted genes. Several cancers display Stat3 overactivation and abnormal DNA methylation; therefore, the molecular module that we describe might be exploited under pathological conditions.
    DOI:  https://doi.org/10.1038/s41588-020-00770-2
  19. Nucleic Acids Res. 2021 Feb 01. pii: gkab038. [Epub ahead of print]
      Genome-wide localization of chromatin and transcription regulators can be detected by a variety of techniques. Here, we describe a novel method 'greenCUT&RUN' for genome-wide profiling of transcription regulators, which has a very high sensitivity, resolution, accuracy and reproducibility, whilst assuring specificity. Our strategy begins with tagging of the protein of interest with GFP and utilizes a GFP-specific nanobody fused to MNase to profile genome-wide binding events. By using a GFP-nanobody the greenCUT&RUN approach eliminates antibody dependency and variability. Robust genomic profiles were obtained with greenCUT&RUN, which are accurate and unbiased towards open chromatin. By integrating greenCUT&RUN with nanobody-based affinity purification mass spectrometry, 'piggy-back' DNA binding events can be identified on a genomic scale. The unique design of greenCUT&RUN grants target protein flexibility and yields high resolution footprints. In addition, greenCUT&RUN allows rapid profiling of mutants of chromatin and transcription proteins. In conclusion, greenCUT&RUN is a widely applicable and versatile genome-mapping technique.
    DOI:  https://doi.org/10.1093/nar/gkab038
  20. BMC Bioinformatics. 2021 Feb 01. 22(1): 38
       BACKGROUND: Due to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors.
    RESULTS: In this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN.
    CONCLUSIONS: DeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.
    Keywords:  Attention mechanism; DNA binding site prediction; Transcription factor
    DOI:  https://doi.org/10.1186/s12859-020-03952-1
  21. Nature. 2021 Feb 03.
      Annotating the molecular basis of human disease remains an unsolved challenge, as 93% of disease loci are non-coding and gene-regulatory annotations are highly incomplete1-3. Here we present EpiMap, a compendium comprising 10,000 epigenomic maps across 800 samples, which we used to define chromatin states, high-resolution enhancers, enhancer modules, upstream regulators and downstream target genes. We used this resource to annotate 30,000 genetic loci that were associated with 540 traits4, predicting trait-relevant tissues, putative causal nucleotide variants in enriched tissue enhancers and candidate tissue-specific target genes for each. We partitioned multifactorial traits into tissue-specific contributing factors with distinct functional enrichments and disease comorbidity patterns, and revealed both single-factor monotropic and multifactor pleiotropic loci. Top-scoring loci frequently had multiple predicted driver variants, converging through multiple enhancers with a common target gene, multiple genes in common tissues, or multiple genes and multiple tissues, indicating extensive pleiotropy. Our results demonstrate the importance of dense, rich, high-resolution epigenomic annotations for the investigation of complex traits.
    DOI:  https://doi.org/10.1038/s41586-020-03145-z
  22. Proc Natl Acad Sci U S A. 2021 Feb 09. pii: e2016742118. [Epub ahead of print]118(6):
      Serotonylation of glutamine 5 on histone H3 (H3Q5ser) was recently identified as a permissive posttranslational modification that coexists with adjacent lysine 4 trimethylation (H3K4me3). While the resulting dual modification, H3K4me3Q5ser, is enriched at regions of active gene expression in serotonergic neurons, the molecular outcome underlying H3K4me3-H3Q5ser crosstalk remains largely unexplored. Herein, we examine the impact of H3Q5ser on the readers, writers, and erasers of H3K4me3. All tested H3K4me3 readers retain binding to the H3K4me3Q5ser dual modification. Of note, the PHD finger of TAF3 favors H3K4me3Q5ser, and this binding preference is dependent on the Q5ser modification regardless of H3K4 methylation states. While the activity of the H3K4 methyltransferase, MLL1, is unaffected by H3Q5ser, the corresponding H3K4me3/2 erasers, KDM5B/C and LSD1, are profoundly inhibited by the presence of the mark. Collectively, this work suggests that adjacent H3Q5ser potentiates H3K4me3 function by either stabilizing H3K4me3 from dynamic turnover or enhancing its physical readout by downstream effectors, thereby potentially providing a mechanism for fine-tuning critical gene expression programs.
    Keywords:  H3K4me3; H3Q5 serotonylation; designer chromatin; histone modification; modification crosstalk
    DOI:  https://doi.org/10.1073/pnas.2016742118
  23. Nat Commun. 2021 02 04. 12(1): 784
      In adult tissue, stem and progenitor cells must tightly regulate the balance between proliferation and differentiation to sustain homeostasis. How this exquisite balance is achieved is an area of active investigation. Here, we show that epidermal genes, including ~30% of induced differentiation genes already contain stalled Pol II at the promoters in epidermal stem and progenitor cells which is then released into productive transcription elongation upon differentiation. Central to this process are SPT6 and PAF1 which are necessary for the elongation of these differentiation genes. Upon SPT6 or PAF1 depletion there is a loss of human skin differentiation and stratification. Unexpectedly, loss of SPT6 also causes the spontaneous transdifferentiation of epidermal cells into an intestinal-like phenotype due to the stalled transcription of the master regulator of epidermal fate P63. Our findings suggest that control of transcription elongation through SPT6 plays a prominent role in adult somatic tissue differentiation and the inhibition of alternative cell fate choices.
    DOI:  https://doi.org/10.1038/s41467-021-21067-w
  24. Sci Adv. 2021 Jan;pii: eabd3568. [Epub ahead of print]7(1):
      Light-inducible gene switches represent a key strategy for the precise manipulation of cellular events in fundamental and applied research. However, the performance of widely used gene switches is limited due to low tissue penetrance and possible phototoxicity of the light stimulus. To overcome these limitations, we engineer optogenetic synthetic transcription factors to undergo liquid-liquid phase separation in close spatial proximity to promoters. Phase separation of constitutive and optogenetic synthetic transcription factors was achieved by incorporation of intrinsically disordered regions. Supported by a quantitative mathematical model, we demonstrate that engineered transcription factor droplets form at target promoters and increase gene expression up to fivefold. This increase in performance was observed in multiple mammalian cells lines as well as in mice following in situ transfection. The results of this work suggest that the introduction of intrinsically disordered domains is a simple yet effective means to boost synthetic transcription factor activity.
    DOI:  https://doi.org/10.1126/sciadv.abd3568
  25. Mol Syst Biol. 2021 Feb;17(2): e9866
      Core promoter types differ in the extent to which RNA polymerase II (Pol II) pauses after initiation, but how this affects their tissue-specific gene expression characteristics is not well understood. While promoters with Pol II pausing elements are active throughout development, TATA promoters are highly active in differentiated tissues. We therefore used a genomics approach on late-stage Drosophila embryos to analyze the properties of promoter types. Using tissue-specific Pol II ChIP-seq, we found that paused promoters have high levels of paused Pol II throughout the embryo, even in tissues where the gene is not expressed, while TATA promoters only show Pol II occupancy when the gene is active. The promoter types are associated with different chromatin accessibility in ATAC-seq data and have different expression characteristics in single-cell RNA-seq data. The two promoter types may therefore be optimized for different properties: paused promoters show more consistent expression when active, while TATA promoters have lower background expression when inactive. We propose that tissue-specific genes have evolved to use two different strategies for their differential expression across tissues.
    Keywords:  Pol II pausing; TATA promoter; effector genes; gene expression noise; scRNA-seq
    DOI:  https://doi.org/10.15252/msb.20209866
  26. Bioinformatics. 2021 Feb 03. pii: btab072. [Epub ahead of print]
       MOTIVATION: Genome-wide association studies (GWAS) have identified thousands of common trait-associated genetic variants but interpretation of their function remains challenging. These genetic variants can overlap the binding sites of transcription factors (TFs) and therefore could alter gene expression. However, we currently lack a systematic understanding on how this mechanism contributes to phenotype.
    RESULTS: We present Motif-Raptor, a TF-centric computational tool that integrates sequence-based predictive models, chromatin accessibility, gene expression datasets and GWAS summary statistics to systematically investigate how TF function is affected by genetic variants. Given trait associated non-coding variants, Motif-Raptor can recover relevant cell types and critical TFs to drive hypotheses regarding their mechanism of action. We tested Motif-Raptor on complex traits such as rheumatoid arthritis and red blood cell count and demonstrated its ability to prioritize relevant cell types, potential regulatory TFs and non-coding SNPs which have been previously characterized and validated.
    AVAILABILITY: Motif-Raptor is freely available as a Python package at: https://github.com/pinellolab/MotifRaptor.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab072
  27. Sci Adv. 2021 Jan;pii: eaaz8836. [Epub ahead of print]7(3):
      Monocytes and monocyte-derived macrophages originate through a multistep differentiation process. First, hematopoietic stem cells generate lineage-restricted progenitors that eventually develop into peripheral, postmitotic monocytes. Second, blood-circulating monocytes undergo differentiation into macrophages, which are specialized phagocytic cells capable of tissue infiltration. While monocytes mediate some level of inflammation and cell toxicity, macrophages boast the widest set of defense mechanisms against pathogens and elicit robust inflammatory responses. Here, we analyze the molecular determinants of monocytic and macrophagic commitment by profiling the EGR1 transcription factor. EGR1 is essential for monopoiesis and binds enhancers that regulate monocytic developmental genes such as CSF1R However, differentiating macrophages present a very different EGR1 binding pattern. We identify novel binding sites of EGR1 at a large set of inflammatory enhancers, even in the absence of its binding motif. We show that EGR1 repressive activity results in suppression of inflammatory genes and is mediated by the NuRD corepressor complex.
    DOI:  https://doi.org/10.1126/sciadv.aaz8836
  28. Cancer Cell. 2021 Jan 29. pii: S1535-6108(21)00049-0. [Epub ahead of print]
      Diffuse intrinsic pontine glioma (DIPG) is an aggressive childhood tumor of the brainstem with currently no curative treatment available. The vast majority of DIPGs carry a histone H3 mutation leading to a lysine 27-to-methionine exchange (H3K27M). We engineered human induced pluripotent stem cells (iPSCs) to carry an inducible H3.3-K27M allele in the endogenous locus and studied the effects of the mutation in different disease-relevant neural cell types. H3.3-K27M upregulated bivalent promoter-associated developmental genes, producing diverse outcomes in different cell types. While being fatal for iPSCs, H3.3-K27M increased proliferation in neural stem cells (NSCs) and to a lesser extent in oligodendrocyte progenitor cells (OPCs). Only NSCs gave rise to tumors upon induction of H3.3-K27M and TP53 inactivation in an orthotopic xenograft model recapitulating human DIPGs. In NSCs, H3.3-K27M leads to maintained expression of stemness and proliferative genes and a premature activation of OPC programs that together may cause tumor initiation.
    Keywords:  DIPG; H3.3-K27M; H3K27me3; H3K4me3; NSC; OPC; bivalent chromatin; glioma; iPSC; orthotopic xenograft
    DOI:  https://doi.org/10.1016/j.ccell.2021.01.005
  29. Nature. 2021 Feb 03.
      Tissue damage increases the risk of cancer through poorly understood mechanisms1. In mouse models of pancreatic cancer, pancreatitis associated with tissue injury collaborates with activating mutations in the Kras oncogene to markedly accelerate the formation of early neoplastic lesions and, ultimately, adenocarcinoma2,3. Here, by integrating genomics, single-cell chromatin assays and spatiotemporally controlled functional perturbations in autochthonous mouse models, we show that the combination of Kras mutation and tissue damage promotes a unique chromatin state in the pancreatic epithelium that distinguishes neoplastic transformation from normal regeneration and is selected for throughout malignant evolution. This cancer-associated epigenetic state emerges within 48 hours of pancreatic injury, and involves an 'acinar-to-neoplasia' chromatin switch that contributes to the early dysregulation of genes that define human pancreatic cancer. Among the factors that are most rapidly activated after tissue damage in the pre-malignant pancreatic epithelium is the alarmin cytokine interleukin 33, which recapitulates the effects of injury in cooperating with mutant Kras to unleash the epigenetic remodelling program of early neoplasia and neoplastic transformation. Collectively, our study demonstrates how gene-environment interactions can rapidly produce gene-regulatory programs that dictate early neoplastic commitment, and provides a molecular framework for understanding the interplay between genetic and environmental cues in the initiation of cancer.
    DOI:  https://doi.org/10.1038/s41586-020-03147-x