bims-crepig Biomed News
on Chromatin regulation and epigenetics in cell fate and cancer
Issue of 2021–01–24
forty-one papers selected by
Connor Rogerson, University of Cambridge, MRC Cancer Unit



  1. Cell Rep. 2021 Jan 19. pii: S2211-1247(20)31627-2. [Epub ahead of print]34(3): 108638
      Histone acetylation levels are regulated by histone acetyltransferases (HATs) and histone deacetylases (HDACs) that antagonistically control the overall balance of this post-translational modification. HDAC inhibitors (HDACi) are potent agents that disrupt this balance and are used clinically to treat diseases including cancer. Despite their use, little is known about their effects on chromatin regulators, particularly those that signal through lysine acetylation. We apply quantitative genomic and proteomic approaches to demonstrate that HDACi robustly increases a low-abundance histone 4 polyacetylation state, which serves as a preferred binding substrate for several bromodomain-containing proteins, including BRD4. Increased H4 polyacetylation occurs in transcribed genes and correlates with the targeting of BRD4. Collectively, these results suggest that HDAC inhibition functions, at least in part, through expansion of a rare histone acetylation state, which then retargets lysine-acetyl readers associated with changes in gene expression, partially mimicking the effect of bromodomain inhibition.
    Keywords:  BRD4; HDAC; bromodomain; chromatin immunoprecipitation; histone acetylation; mass spectrometry; peptide microarray; superenhancer; transcription
    DOI:  https://doi.org/10.1016/j.celrep.2020.108638
  2. Dev Biol. 2021 Jan 13. pii: S0012-1606(21)00007-5. [Epub ahead of print]
      Correct vascular differentiation requires distinct patterns of gene expression in different subtypes of endothelial cells. Members of the ETS transcription factor family are essential for the transcriptional activation of arterial and angiogenesis-specific gene regulatory elements, leading to the hypothesis that they play lineage-defining roles in arterial and angiogenic differentiation directly downstream of VEGFA signalling. However, an alternative explanation is that ETS binding at enhancers and promoters is a general requirement for activation of many endothelial genes regardless of expression pattern, with subtype-specificity provided by additional factors. Here we use analysis of Ephb4 and Coup-TFII (Nr2f2) vein-specific enhancers to demonstrate that ETS factors are equally essential for vein, arterial and angiogenic-specific enhancer activity patterns. Further, we show that ETS factor binding at these vein-specific enhancers is enriched by VEGFA signalling, similar to that seen at arterial and angiogenic enhancers. However, while arterial and angiogenic enhancers can be activated by VEGFA in vivo, the Ephb4 and Coup-TFII venous enhancers are not, suggesting that the specificity of VEGFA-induced arterial and angiogenic enhancer activity occurs via non-ETS transcription factors. These results support a model in which ETS factors are not the primary regulators of specific patterns of gene expression in different endothelial subtypes.
    Keywords:  Arterio-venous differentiation; Arterio-venous specification; Artery; Blood vessels; ETS; Endothelial cell; Enhancer; Transcription; Vein
    DOI:  https://doi.org/10.1016/j.ydbio.2021.01.002
  3. Cell Rep. 2021 Jan 19. pii: S2211-1247(20)31632-6. [Epub ahead of print]34(3): 108643
      Transcription through noncoding regions of the genome is pervasive. How these transcription events regulate gene expression remains poorly understood. Here, we report that, in S. cerevisiae, the levels of transcription through a noncoding region, IRT2, located upstream in the promoter of the inducer of meiosis, IME1, regulate opposing chromatin and transcription states. At low levels, the act of IRT2 transcription promotes histone exchange, delivering acetylated histone H3 lysine 56 to chromatin locally. The subsequent open chromatin state directs transcription factor recruitment and induces downstream transcription to repress the IME1 promoter and meiotic entry. Conversely, increasing transcription turns IRT2 into a repressor by promoting transcription-coupled chromatin assembly. The two opposing functions of IRT2 transcription shape a regulatory circuit, which ensures a robust cell-type-specific control of IME1 expression and yeast meiosis. Our data illustrate how intergenic transcription levels are key to controlling local chromatin state, gene expression, and cell fate outcomes.
    Keywords:  H3K56ac; IME1; Rme1; Rtt109; cell fate; chromatin; lncRNA; meiosis; transcription; yeast
    DOI:  https://doi.org/10.1016/j.celrep.2020.108643
  4. J Biol Chem. 2020 Dec 18. pii: S0021-9258(17)50653-5. [Epub ahead of print]295(51): 17738-17751
      Distinct cell types emerge from embryonic stem cells through a precise and coordinated execution of gene expression programs during lineage commitment. This is established by the action of lineage specific transcription factors along with chromatin complexes. Numerous studies have focused on epigenetic factors that affect embryonic stem cells (ESC) self-renewal and pluripotency. However, the contribution of chromatin to lineage decisions at the exit from pluripotency has not been as extensively studied. Using a pooled epigenetic shRNA screen strategy, we identified chromatin-related factors critical for differentiation toward mesodermal and endodermal lineages. Here we reveal a critical role for the chromatin protein, ARID4B. Arid4b-deficient mESCs are similar to WT mESCs in the expression of pluripotency factors and their self-renewal. However, ARID4B loss results in defects in up-regulation of the meso/endodermal gene expression program. It was previously shown that Arid4b resides in a complex with SIN3A and HDACS 1 and 2. We identified a physical and functional interaction of ARID4B with HDAC1 rather than HDAC2, suggesting functionally distinct Sin3a subcomplexes might regulate cell fate decisions Finally, we observed that ARID4B deficiency leads to increased H3K27me3 and a reduced H3K27Ac level in key developmental gene loci, whereas a subset of genomic regions gain H3K27Ac marks. Our results demonstrate that epigenetic control through ARID4B plays a key role in the execution of lineage-specific gene expression programs at pluripotency exit.
    Keywords:  cell differentiation; chromatin modification embryonic stem cell; embryonic stem cell; epigenetics; gene expression
    DOI:  https://doi.org/10.1074/jbc.RA120.015534
  5. Mol Reprod Dev. 2021 Jan 20.
      BRDT, a member of the BET family of double bromodomain-containing proteins, is essential for spermatogenesis in the mouse and has been postulated to be a key regulator of transcription in meiotic and post-meiotic cells. To understand the function of BRDT in these processes, we first characterized the genome-wide distribution of the BRDT binding sites, in particular within gene units, by ChIP-Seq analysis of enriched fractions of pachytene spermatocytes and round spermatids. In both cell types, BRDT binding sites were mainly located in promoters, first exons, and introns of genes. BRDT binding sites in promoters overlapped with several histone modifications and histone variants associated with active transcription, and were enriched for consensus sequences for specific transcription factors, including MYB, RFX, ETS, and ELF1 in pachytene spermatocytes, and JunD, c-Jun, CRE, and RFX in round spermatids. Subsequent integration of the ChIP-seq data with available transcriptome data revealed that stage-specific gene expression programs are associated with BRDT binding to their gene promoters, with most of the BDRT-bound genes being upregulated. Gene Ontology analysis further identified unique sets of genes enriched in diverse biological processes essential for meiosis and spermiogenesis between the two cell types, suggesting distinct developmentally stage-specific functions for BRDT. Taken together, our data suggest that BRDT cooperates with different transcription factors at distinctive chromatin regions within gene units to regulate diverse downstream target genes that function in male meiosis and spermiogenesis.
    Keywords:  BRDT; male meiosis; spermiogenesis; transcription
    DOI:  https://doi.org/10.1002/mrd.23449
  6. Nat Commun. 2021 Jan 22. 12(1): 537
      Targeting chromatin regulators to specific genomic locations for gene control is emerging as a powerful method in basic research and synthetic biology. However, many chromatin regulators are large, making them difficult to deliver and combine in mammalian cells. Here, we develop a strategy for gene control using small nanobodies that bind and recruit endogenous chromatin regulators to a gene. We show that an antiGFP nanobody can be used to simultaneously visualize GFP-tagged chromatin regulators and control gene expression, and that nanobodies against HP1 and DNMT1 can silence a reporter gene. Moreover, combining nanobodies together or with other regulators, such as DNMT3A or KRAB, can enhance silencing speed and epigenetic memory. Finally, we use the slow silencing speed and high memory of antiDNMT1 to build a signal duration timer and recorder. These results set the basis for using nanobodies against chromatin regulators for controlling gene expression and epigenetic memory.
    DOI:  https://doi.org/10.1038/s41467-020-20757-1
  7. Nat Commun. 2021 01 18. 12(1): 410
      Active DNA demethylation is required for sexual reproduction in plants but the molecular determinants underlying this epigenetic control are not known. Here, we show in Arabidopsis thaliana that the DNA glycosylases DEMETER (DME) and REPRESSOR OF SILENCING 1 (ROS1) act semi-redundantly in the vegetative cell of pollen to demethylate DNA and ensure proper pollen tube progression. Moreover, we identify six pollen-specific genes with increased DNA methylation as well as reduced expression in dme and dme;ros1. We further show that for four of these genes, reinstalling their expression individually in mutant pollen is sufficient to improve male fertility. Our findings demonstrate an essential role of active DNA demethylation in regulating genes involved in pollen function.
    DOI:  https://doi.org/10.1038/s41467-020-20606-1
  8. PLoS One. 2021 ;16(1): e0245618
      Skeletal muscle gene expression is governed by the myogenic regulatory family (MRF) which includes MyoD (MYOD1) and myogenin (MYOG). MYOD1 and MYOG are known to regulate an overlapping set of muscle genes, but MYOD1 cannot compensate for the absence of MYOG in vivo. In vitro, late muscle genes have been shown to be bound by both factors, but require MYOG for activation. The molecular basis for this requirement was unclear. We show here that MYOG is required for the recruitment of TBP and RNAPII to muscle gene promoters, indicating that MYOG is essential in assembling the transcription machinery. Genes regulated by MYOD1 and MYOG include genes required for muscle fusion, myomaker and myomerger, and we show that myomaker is fully dependent on activation by MYOG. We also sought to determine the role of MYOD1 in MYOG dependent gene activation and unexpectedly found that MYOG is required to maintain Myod1 expression. However, we also found that exogenous MYOD1 was unable to compensate for the loss of Myog and activate muscle gene expression. Thus, our results show that MYOD1 and MYOG act in a feed forward loop to maintain each other's expression and also show that it is MYOG, and not MYOD1, that is required to load TBP and activate gene expression on late muscle gene promoters bound by both factors.
    DOI:  https://doi.org/10.1371/journal.pone.0245618
  9. Nat Commun. 2021 01 21. 12(1): 494
      Mast cells are critical effectors of allergic inflammation and protection against parasitic infections. We previously demonstrated that transcription factors GATA2 and MITF are the mast cell lineage-determining factors. However, it is unclear whether these lineage-determining factors regulate chromatin accessibility at mast cell enhancer regions. In this study, we demonstrate that GATA2 promotes chromatin accessibility at the super-enhancers of mast cell identity genes and primes both typical and super-enhancers at genes that respond to antigenic stimulation. We find that the number and densities of GATA2- but not MITF-bound sites at the super-enhancers are several folds higher than that at the typical enhancers. Our studies reveal that GATA2 promotes robust gene transcription to maintain mast cell identity and respond to antigenic stimulation by binding to super-enhancer regions with dense GATA2 binding sites available at key mast cell genes.
    DOI:  https://doi.org/10.1038/s41467-020-20766-0
  10. Mol Cell. 2021 Jan 18. pii: S1097-2765(20)30989-8. [Epub ahead of print]
      Many genes are regulated by multiple enhancers that often simultaneously activate their target gene. However, how individual enhancers collaborate to activate transcription is not well understood. Here, we dissect the functions and interdependencies of five enhancer elements that together activate Fgf5 expression during exit from naive murine pluripotency. Four intergenic elements form a super-enhancer, and most of the elements contribute to Fgf5 induction at distinct time points. A fifth, poised enhancer located in the first intron contributes to Fgf5 expression at every time point by amplifying overall Fgf5 expression levels. Despite low individual enhancer activity, together these elements strongly induce Fgf5 expression in a super-additive fashion that involves strong accumulation of RNA polymerase II at the intronic enhancer. Finally, we observe a strong anti-correlation between RNA polymerase II levels at enhancers and their distance to the closest promoter, and we identify candidate elements with properties similar to the intronic enhancer.
    Keywords:  RNA Pol II; differentiation; enhancer; pluripotency; super-enhancer; transcription
    DOI:  https://doi.org/10.1016/j.molcel.2020.12.047
  11. Bioinformatics. 2021 Jan 18. pii: btab029. [Epub ahead of print]
       MOTIVATION: Single-cell DNA methylation sequencing detects methylation levels with single-cell resolution, while this technology is upgrading our understanding of the regulation of gene expression through epigenetic modifications. Meanwhile, almost all current technologies suffer from the inherent problem of detecting low coverage of the number of CpGs. Therefore, addressing the inherent sparsity of raw data is essential for quantitative analysis of the whole genome.
    RESULTS: Here, we reported CaMelia, a CatBoost gradient boosting method for predicting the missing methylation states based on the locally paired similarity of intercellular methylation patterns. On real single-cell methylation data sets, CaMelia yielded significant imputation performance gains over previous methods. Furthermore, applying the imputed data to the downstream analysis of cell-type identification, we found that CaMelia helped to discover more intercellular differentially methylated loci that were masked by the sparsity in raw data, and the clustering results demonstrated that CaMelia could preserve cell-cell relationships and improve the identification of cell types and cell subpopulations.
    AVAILABILITY: Python code is available at https://github.com/JxTang-bioinformatics/CaMelia.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab029
  12. Mol Cell. 2021 Jan 15. pii: S1097-2765(20)30956-4. [Epub ahead of print]
      The MYC oncoprotein globally affects the function of RNA polymerase II (RNAPII). The ability of MYC to promote transcription elongation depends on its ubiquitylation. Here, we show that MYC and PAF1c (polymerase II-associated factor 1 complex) interact directly and mutually enhance each other's association with active promoters. PAF1c is rapidly transferred from MYC onto RNAPII. This transfer is driven by the HUWE1 ubiquitin ligase and is required for MYC-dependent transcription elongation. MYC and HUWE1 promote histone H2B ubiquitylation, which alters chromatin structure both for transcription elongation and double-strand break repair. Consistently, MYC suppresses double-strand break accumulation in active genes in a strictly PAF1c-dependent manner. Depletion of PAF1c causes transcription-dependent accumulation of double-strand breaks, despite widespread repair-associated DNA synthesis. Our data show that the transfer of PAF1c from MYC onto RNAPII efficiently couples transcription elongation with double-strand break repair to maintain the genomic integrity of MYC-driven tumor cells.
    Keywords:  E3 ligase; HUWE1; MYC; PAF1c; RNAPII; double-strand break repair; histone H2B; ubiquitylation
    DOI:  https://doi.org/10.1016/j.molcel.2020.12.035
  13. Elife. 2021 Jan 22. pii: e62387. [Epub ahead of print]10
      Cranial neural crest (CNC) cells give rise to bone, cartilage, tendons, and ligaments of the vertebrate craniofacial musculoskeletal complex, as well as regulate mesoderm-derived craniofacial muscle development through cell-cell interactions. Using the mouse soft palate as a model, we performed an unbiased single-cell RNA-seq analysis to investigate the heterogeneity and lineage commitment of CNC derivatives during craniofacial muscle development. We show that Runx2, a known osteogenic regulator, is expressed in the CNC-derived perimysial and progenitor populations. Loss of Runx2 in CNC-derivatives results in reduced expression of perimysial markers (Aldh1a2 and Hic1) as well as soft palate muscle defects in Osr2-Cre;Runx2fl/fl mice. We further reveal that Runx2 maintains perimysial marker expression through suppressing Twist1, and that myogenesis is restored in Osr2-Cre;Runx2fl/fl;Twist1fl/+ mice. Collectively, our findings highlight the roles of Runx2, Twist1, and their interaction in regulating the fate of CNC-derived cells as they guide craniofacial muscle development through cell-cell interactions.
    Keywords:  Runx2; cell-cell interaction; cleft palate; cranial neural crest cells; developmental biology; mouse; muscle development
    DOI:  https://doi.org/10.7554/eLife.62387
  14. EMBO J. 2021 Jan 18. e106309
      The N6-methyladenosine (m6 A) RNA modification serves crucial functions in RNA metabolism; however, the molecular mechanisms underlying the regulation of m6 A are not well understood. Here, we establish arginine methylation of METTL14, a component of the m6 A methyltransferase complex, as a novel pathway that controls m6 A deposition in mammalian cells. Specifically, protein arginine methyltransferase 1 (PRMT1) interacts with, and methylates the intrinsically disordered C terminus of METTL14, which promotes its interaction with RNA substrates, enhances its RNA methylation activity, and is crucial for its interaction with RNA polymerase II (RNAPII). Mouse embryonic stem cells (mESCs) expressing arginine methylation-deficient METTL14 exhibit significantly reduced global m6 A levels. Transcriptome-wide m6 A analysis identified 1,701 METTL14 arginine methylation-dependent m6 A sites located in 1,290 genes involved in various cellular processes, including stem cell maintenance and DNA repair. These arginine methylation-dependent m6 A sites are associated with enhanced translation of genes essential for the repair of DNA interstrand crosslinks; thus, METTL14 arginine methylation-deficient mESCs are hypersensitive to DNA crosslinking agents. Collectively, these findings reveal important aspects of m6 A regulation and new functions of arginine methylation in RNA metabolism.
    Keywords:  DNA repair; PRMT1; RGG motif; RNA m6A methylation; arginine methylation
    DOI:  https://doi.org/10.15252/embj.2020106309
  15. Nat Commun. 2021 01 18. 12(1): 420
      Adult stem cell identity, plasticity, and homeostasis are precisely orchestrated by lineage-restricted epigenetic and transcriptional regulatory networks. Here, by integrating super-enhancer and chromatin accessibility landscapes, we delineate core transcription regulatory circuitries (CRCs) of limbal stem/progenitor cells (LSCs) and find that RUNX1 and SMAD3 are required for maintenance of corneal epithelial identity and homeostasis. RUNX1 or SMAD3 depletion inhibits PAX6 and induces LSCs to differentiate into epidermal-like epithelial cells. RUNX1, PAX6, and SMAD3 (RPS) interact with each other and synergistically establish a CRC to govern the lineage-specific cis-regulatory atlas. Moreover, RUNX1 shapes LSC chromatin architecture via modulating H3K27ac deposition. Disturbance of RPS cooperation results in cell identity switching and dysfunction of the corneal epithelium, which is strongly linked to various human corneal diseases. Our work highlights CRC TF cooperativity for establishment of stem cell identity and lineage commitment, and provides comprehensive regulatory principles for human stratified epithelial homeostasis and pathogenesis.
    DOI:  https://doi.org/10.1038/s41467-020-20713-z
  16. EMBO Rep. 2021 Jan 22. e49651
      Molecular switches are essential modules in signaling networks and transcriptional reprogramming. Here, we describe a role for small ubiquitin-related modifier SUMO as a molecular switch in epidermal growth factor receptor (EGFR) signaling. Using quantitative mass spectrometry, we compare the endogenous SUMO proteomes of HeLa cells before and after EGF stimulation. Thereby, we identify a small group of transcriptional coregulators including IRF2BP1, IRF2BP2, and IRF2BPL as novel players in EGFR signaling. Comparison of cells expressing wild type or SUMOylation-deficient IRF2BP1 indicates that transient deSUMOylation of IRF2BP proteins is important for appropriate expression of immediate early genes including dual specificity phosphatase 1 (DUSP1, MKP-1) and the transcription factor ATF3. We find that IRF2BP1 is a repressor, whose transient deSUMOylation on the DUSP1 promoter allows-and whose timely reSUMOylation restricts-DUSP1 transcription. Our work thus provides a paradigm how comparative SUMO proteome analyses serve to reveal novel regulators in signal transduction and transcription.
    Keywords:  ATF3; DUSP1; EGFR; IRF2BP1; SUMO
    DOI:  https://doi.org/10.15252/embr.201949651
  17. Mol Cell Biol. 2021 Jan 19. pii: MCB.00515-20. [Epub ahead of print]
      Susceptibility to breast cancer is significantly increased in individuals with germ line mutations in RECQ1, a gene encoding a DNA helicase essential for genome maintenance. We previously reported that RECQ1 expression predicts clinical outcomes for sporadic breast cancer patients stratified by estrogen receptor (ER) status. Here, we utilized an unbiased integrative genomics approach to delineate a cross talk between RECQ1 and ERα, a known master regulatory transcription factor in breast cancer. We found that expression of ESR1, the gene encoding ERα, is directly activated by RECQ1. More than 35% of RECQ1 binding sites were co-bound by ERα genome-wide. Mechanistically, RECQ1 cooperates with FOXA1, the pioneer transcription factor for ERα, to enhance chromatin accessibility at the ESR1-regulatory regions in a helicase activity-dependent manner. In clinical ERα-positive breast cancers treated with endocrine therapy, high RECQ1 and high FOXA1 co-expressing tumors were associated with better survival. Collectively, these results identify RECQ1 as a novel cofactor for ERα and uncover a previously unknown mechanism by which RECQ1 regulates disease-driving gene expression in ER-positive breast cancer cells.
    DOI:  https://doi.org/10.1128/MCB.00515-20
  18. Bioinformatics. 2021 Jan 20. pii: btaa1039. [Epub ahead of print]
       SUMMARY: scATAC-seq is a powerful approach for characterizing cell-type-specific regulatory landscapes. However, it is difficult to benchmark the performance of various scATAC-seq analysis techniques (such as clustering and deconvolution) without having a priori a known set of gold-standard cell types. To simulate scATAC-seq experiments with known cell-type labels, we introduce an efficient and scalable scATAC-seq simulation method (SCAN-ATAC-Sim) that down-samples bulk ATAC-seq data (e.g., from representative cell lines or tissues). Our protocol uses a consistent but tunable signal-to-noise ratio across cell types in a scATAC-seq simulation for integrating bulk experiments with different levels of background noise, and it independently samples twice without replacement to account for the diploid genome. Because it uses an efficient weighted reservoir sampling algorithm and is highly parallelizable with OpenMP, our implementation in C ++ allows millions of cells to be simulated in less than an hour on a laptop computer.
    AVAILABILITY: SCAN-ATAC-Sim is available at scan-atac-sim.gersteinlab.org.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btaa1039
  19. Cell. 2021 Jan 15. pii: S0092-8674(20)31747-5. [Epub ahead of print]
      The most common genetic cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) is a GGGGCC repeat expansion in the C9orf72 gene. We developed a platform to interrogate the chromatin accessibility landscape and transcriptional program within neurons during degeneration. We provide evidence that neurons expressing the dipeptide repeat protein poly(proline-arginine), translated from the C9orf72 repeat expansion, activate a highly specific transcriptional program, exemplified by a single transcription factor, p53. Ablating p53 in mice completely rescued neurons from degeneration and markedly increased survival in a C9orf72 mouse model. p53 reduction also rescued axonal degeneration caused by poly(glycine-arginine), increased survival of C9orf72 ALS/FTD-patient-induced pluripotent stem cell (iPSC)-derived motor neurons, and mitigated neurodegeneration in a C9orf72 fly model. We show that p53 activates a downstream transcriptional program, including Puma, which drives neurodegeneration. These data demonstrate a neurodegenerative mechanism dynamically regulated through transcription-factor-binding events and provide a framework to apply chromatin accessibility and transcription program profiles to neurodegeneration.
    Keywords:  ATAC-seq; C9orf72; TDP-43; amyotrophic lateral sclerosis; axonal degeneration; neurodegeneration; p53; puma
    DOI:  https://doi.org/10.1016/j.cell.2020.12.025
  20. Nucleic Acids Res. 2021 Jan 21. pii: gkab002. [Epub ahead of print]
      Super-enhancers (SEs) mediate high transcription levels of target genes. Previous studies have shown that SEs recruit transcription complexes and generate enhancer RNAs (eRNAs). We characterized transcription at the human and murine β-globin locus control region (LCR) SE. We found that the human LCR is capable of recruiting transcription complexes independently from linked globin genes in transgenic mice. Furthermore, LCR hypersensitive site 2 (HS2) initiates the formation of bidirectional transcripts in transgenic mice and in the endogenous β-globin gene locus in murine erythroleukemia (MEL) cells. HS2 3'eRNA is relatively unstable and remains in close proximity to the globin gene locus. Reducing the abundance of HS2 3'eRNA leads to a reduction in β-globin gene transcription and compromises RNA polymerase II (Pol II) recruitment at the promoter. The Integrator complex has been shown to terminate eRNA transcription. We demonstrate that Integrator interacts downstream of LCR HS2. Inducible ablation of Integrator function in MEL or differentiating primary human CD34+ cells causes a decrease in expression of the adult β-globin gene and accumulation of Pol II and eRNA at the LCR. The data suggest that transcription complexes are assembled at the LCR and transferred to the globin genes by mechanisms that involve Integrator mediated release of Pol II and eRNA from the LCR.
    DOI:  https://doi.org/10.1093/nar/gkab002
  21. J Biol Chem. 2021 Jan 13. pii: S0021-9258(21)00059-4. [Epub ahead of print] 100291
      Androglobin (ADGB) represents the latest addition to the globin superfamily in metazoans. The chimeric protein comprises a calpain domain and a unique circularly permutated globin domain. ADGB expression levels are most abundant in mammalian testis, but its cell type-specific expression, regulation and function have remained unexplored. Analyzing bulk and single-cell mRNA-Seq data from mammalian tissues, we found that -in addition to testes- ADGB is prominently expressed in the female reproductive tract, lungs and brain, specifically being associated with cell types forming motile cilia. Correlation analysis suggested co-regulation of ADGB with FOXJ1, a crucial transcription factor of ciliogenesis. Investigating the transcriptional regulation of the ADGB gene, we characterized its promoter using epigenomic datasets, exogenous promoter-dependent luciferase assays and CRISPR/dCas9-VPR-mediated activation approaches. Reporter gene assays revealed that FOXJ1 indeed substantially enhanced luciferase activity driven by the ADGB promoter. ChIP assays confirmed binding of FOXJ1 to the endogenous ADGB promoter region. We dissected the minimal sequence required for FOXJ1-dependent regulation and fine mapped the FOXJ1 binding site to two evolutionarily conserved regions within the ADGB promoter. FOXJ1 overexpression significantly increased endogenous ADGB mRNA levels in HEK293 and MCF-7 cells. Similar results were observed upon RFX2 overexpression, another key transcription factor in ciliogenesis. The complex transcriptional regulation of the ADGB locus was illustrated by identifying a distal enhancer, responsible for synergistic regulation by RFX2 and FOXJ1. Finally, cell culture studies indicated an ADGB-dependent increase in the number of ciliated cells upon overexpression of the full-length protein, confirming a ciliogenesis-associated role of ADGB in mammals.
    Keywords:  CRISPR/Cas; bioinformatics; cilia; gene expression; hemoglobin myoglobin; transcription enhancer; transcription factor; transcription regulation; transcriptomics
    DOI:  https://doi.org/10.1016/j.jbc.2021.100291
  22. Nat Protoc. 2021 Jan 18.
      Digested genome sequencing (Digenome-seq) is a highly sensitive, easy-to-carry-out, cell-free method for experimentally identifying genome-wide off-target sites of programmable nucleases and deaminases (also known as base editors). Genomic DNA is digested in vitro using clustered regularly interspaced short palindromic repeats ribonucleoproteins (RNPs; plus DNA-modifying enzymes to cleave both strands of DNA at sites containing deaminated base products, in the case of base editors) and subjected to whole-genome sequencing (WGS) with a typical sequencing depth of 30×. A web-based program is available to map in vitro cleavage sites corresponding to on- and off-target sites. Chromatin DNA, in parallel with histone-free genomic DNA, can also be used to account for the effects of chromatin structure on off-target nuclease activity. Digenome-seq is more sensitive and comprehensive than cell-based methods for identifying off-target sites. Unlike other cell-free methods, Digenome-seq does not involve enrichment of DNA ends through PCR amplification. The entire process other than WGS, which takes ~1-2 weeks, including purification and preparation of RNPs, digestion of genomic DNA and bioinformatic analysis after WGS, takes about several weeks.
    DOI:  https://doi.org/10.1038/s41596-020-00453-6
  23. Proc Natl Acad Sci U S A. 2021 Jan 26. pii: e2019554118. [Epub ahead of print]118(4):
      Chemical modifications of histones, such as lysine acetylation and ubiquitination, play pivotal roles in epigenetic regulation of gene expression. Methods to alter the epigenome thus hold promise as tools for elucidating epigenetic mechanisms and as therapeutics. However, an entirely chemical method to introduce histone modifications in living cells without genetic manipulation is unprecedented. Here, we developed a chemical catalyst, PEG-LANA-DSSMe 11, that binds with nucleosome's acidic patch and promotes regioselective, synthetic histone acetylation at H2BK120 in living cells. The size of polyethylene glycol in the catalyst was a critical determinant for its in-cell metabolic stability, binding affinity to histones, and high activity. The synthetic acetylation promoted by 11 without genetic manipulation competed with and suppressed physiological H2B ubiquitination, a mark regulating chromatin functions, such as transcription and DNA damage response. Thus, the chemical catalyst will be a useful tool to manipulate epigenome for unraveling epigenetic mechanisms in living cells.
    Keywords:  acetylation; catalyst; epigenome; histone; ubiquitination
    DOI:  https://doi.org/10.1073/pnas.2019554118
  24. Dev Biol. 2021 Jan 15. pii: S0012-1606(21)00002-6. [Epub ahead of print]
      Mice possess two types of teeth that differ in their cusp patterns; incisors have one cusp and molars have multiple cusps. The patterning of these two types of teeth relies on fine-tuning of the reciprocal molecular signaling between dental epithelial and mesenchymal tissues during embryonic development. The AP-2 transcription factors, particularly Tfap2a and Tfap2b, are essential components of such epithelial-mesenchymal signaling interactions that coordinate craniofacial development in mice and other vertebrates, but little is known about their roles in the regulation of tooth development and shape. Here we demonstrate that incisors and molars differ in their temporal and spatial expression of Tfap2a and Tfap2b. At the bud stage, Tfap2a is expressed in both the epithelium and mesenchyme of the incisors and molars, but Tfap2b expression is restricted to the molar mesenchyme, only later appearing in the incisor epithelium. Tissue-specific deletions show that loss of the epithelial domain of Tfap2a and Tfap2b affects the number and spatial arrangement of the incisors, notably resulting in duplicated lower incisors. In contrast, deletion of these two genes in the mesenchymal domain has little effect on tooth development. Collectively these results implicate epithelial expression of Tfap2a and Tfap2b in regulating the extent of the dental lamina associated with patterning the incisors and suggest that these genes contribute to morphological differences between anterior (incisor) and posterior (molar) teeth within the mammalian dentition.
    Keywords:  AP-2; Incisor; Molar; Odontogenesis; Tfap2
    DOI:  https://doi.org/10.1016/j.ydbio.2020.12.017
  25. Science. 2021 01 22. pii: eabc3393. [Epub ahead of print]371(6527):
      Polycomb repressive complexes 1 and 2 (PRC1 and PRC2) cooperate to determine cell identity by epigenetic gene expression regulation. However, the mechanism of PRC2 recruitment by means of recognition of PRC1-mediated H2AK119ub1 remains poorly understood. Our PRC2 cryo-electron microscopy structure with cofactors JARID2 and AEBP2 bound to a H2AK119ub1-containing nucleosome reveals a bridge helix in EZH2 that connects the SET domain, H3 tail, and nucleosomal DNA. JARID2 and AEBP2 each interact with one ubiquitin and the H2A-H2B surface. JARID2 stimulates PRC2 through interactions with both the polycomb protein EED and the H2AK119-ubiquitin, whereas AEBP2 has an additional scaffolding role. The presence of these cofactors partially overcomes the inhibitory effect that H3K4me3 and H3K36me3 exert on core PRC2 (in the absence of cofactors). Our results support a key role for JARID2 and AEBP2 in the cross-talk between histone modifications and PRC2 activity.
    DOI:  https://doi.org/10.1126/science.abc3393
  26. Nucleic Acids Res. 2021 Jan 21. pii: gkab014. [Epub ahead of print]
      Methylglyoxal (MG) is a byproduct of glycolysis that functions in diverse mammalian developmental processes and diseases and in plant responses to various stresses, including salt stress. However, it is unknown whether MG-regulated gene expression is associated with an epigenetic modification. Here we report that MG methylglyoxalates H3 including H3K4 and increases chromatin accessibility, consistent with the result that H3 methylglyoxalation positively correlates with gene expression. Salt stress also increases H3 methylglyoxalation at salt stress responsive genes correlated to their higher expression. Following exposure to salt stress, salt stress responsive genes were expressed at higher levels in the Arabidopsis glyI2 mutant than in wild-type plants, but at lower levels in 35S::GLYI2 35S::GLYII4 plants, consistent with the higher and lower MG accumulation and H3 methylglyoxalation of target genes in glyI2 and 35S::GLYI2 35S::GLYII4, respectively. Further, ABI3 and MYC2, regulators of salt stress responsive genes, affect the distribution of H3 methylglyoxalation at salt stress responsive genes. Thus, MG functions as a histone-modifying group associated with gene expression that links glucose metabolism and epigenetic regulation.
    DOI:  https://doi.org/10.1093/nar/gkab014
  27. Cell Rep. 2021 Jan 19. pii: S2211-1247(20)31629-6. [Epub ahead of print]34(3): 108640
      In multicellular eukaryotes, RNA polymerase (Pol) II pauses transcription ~30-50 bp after initiation. While the budding yeast Saccharomyces has its transcription mechanisms mostly conserved with other eukaryotes, it appears to lack this fundamental promoter-proximal pausing. However, we now report that nearly all yeast genes, including constitutive and inducible genes, manifest two distinct transcriptional stall sites that are brought on by acute environmental signaling (e.g., peroxide stress). Pol II first stalls at the pre-initiation stage before promoter clearance, but after DNA melting and factor acquisition, and may involve inhibited dephosphorylation. The second stall occurs at the +2 nucleosome. It acquires most, but not all, elongation factor interactions. Its regulation may include Bur1/Spt4/5. Our results suggest that a double Pol II stall is a mechanism to downregulate essentially all genes in concert.
    Keywords:  DSIF; Pol II pausing; peroxide stress; promoter-proximal pausing; transcription elongation
    DOI:  https://doi.org/10.1016/j.celrep.2020.108640
  28. Nat Metab. 2021 Jan;3(1): 75-89
      NADPH has long been recognized as a key cofactor for antioxidant defence and reductive biosynthesis. Here we report a metabolism-independent function of NADPH in modulating epigenetic status and transcription. We find that the reduction of cellular NADPH levels, achieved by silencing malic enzyme or glucose-6-phosphate dehydrogenase, impairs global histone acetylation and transcription in both adipocytes and tumour cells. These effects can be reversed by supplementation with exogenous NADPH or by inhibition of histone deacetylase 3 (HDAC3). Mechanistically, NADPH directly interacts with HDAC3 and interrupts the association between HDAC3 and its co-activator nuclear receptor corepressor 2 (Ncor2; SMRT) or Ncor1, thereby impairing HDAC3 activation. Interestingly, NADPH and the inositol tetraphosphate molecule Ins(1,4,5,6)P4 appear to bind to the same domains on HDAC3, with NADPH having a higher affinity towards HDAC3 than Ins(1,4,5,6)P4. Thus, while Ins(1,4,5,6)P4 promotes formation of the HDAC3-Ncor complex, NADPH inhibits it. Collectively, our findings uncover a previously unidentified and metabolism-independent role of NADPH in controlling epigenetic change and gene expression by acting as an endogenous inhibitor of HDAC3.
    DOI:  https://doi.org/10.1038/s42255-020-00330-2
  29. Nat Commun. 2021 01 21. 12(1): 499
      The human genome is partitioned into a collection of genomic features, inclusive of genes, transposable elements, lamina interacting regions, early replicating control elements and cis-regulatory elements, such as promoters, enhancers, and anchors of chromatin interactions. Uneven distribution of these features within chromosomes gives rise to clusters, such as topologically associating domains (TADs), lamina-associated domains, clusters of cis-regulatory elements or large organized chromatin lysine (K) domains (LOCKs). Here we show that LOCKs from diverse histone modifications discriminate primitive from differentiated cell types. Active LOCKs (H3K4me1, H3K4me3 and H3K27ac) cover a higher fraction of the genome in primitive compared to differentiated cell types while repressive LOCKs (H3K9me3, H3K27me3 and H3K36me3) do not. Active LOCKs in differentiated cells lie proximal to highly expressed genes while active LOCKs in primitive cells tend to be bivalent. Genes proximal to bivalent LOCKs are minimally expressed in primitive cells. Furthermore, bivalent LOCKs populate TAD boundaries and are preferentially bound by regulators of chromatin interactions, including CTCF, RAD21 and ZNF143. Together, our results argue that LOCKs discriminate primitive from differentiated cell populations.
    DOI:  https://doi.org/10.1038/s41467-020-20830-9
  30. Nat Commun. 2021 01 20. 12(1): 490
      Short H2A (sH2A) histone variants are primarily expressed in the testes of placental mammals. Their incorporation into chromatin is associated with nucleosome destabilization and modulation of alternate splicing. Here, we show that sH2As innately possess features similar to recurrent oncohistone mutations associated with nucleosome instability. Through analyses of existing cancer genomics datasets, we find aberrant sH2A upregulation in a broad array of cancers, which manifest splicing patterns consistent with global nucleosome destabilization. We posit that short H2As are a class of "ready-made" oncohistones, whose inappropriate expression contributes to chromatin dysfunction in cancer.
    DOI:  https://doi.org/10.1038/s41467-020-20707-x
  31. Cell Death Dis. 2021 Jan 18. 12(1): 89
      Glioblastoma is the most lethal brain tumor and its pathogenesis remains incompletely understood. KDM4C is a histone H3K9 demethylase that contributes to epigenetic regulation of both oncogene and tumor suppressor genes and is often overexpressed in human tumors, including glioblastoma. However, KDM4C's roles in glioblastoma and the underlying molecular mechanisms remain unclear. Here, we show that KDM4C knockdown significantly represses proliferation and tumorigenesis of glioblastoma cells in vitro and in vivo that are rescued by overexpressing wild-type KDM4C but not a catalytic dead mutant. KDM4C protein expression is upregulated in glioblastoma, and its expression correlates with c-Myc expression. KDM4C also binds to the c-Myc promoter and induces c-Myc expression. Importantly, KDM4C suppresses the pro-apoptotic functions of p53 by demethylating p53K372me1, which is pivotal for the stability of chromatin-bound p53. Conversely, depletion or inhibition of KDM4C promotes p53 target gene expression and induces apoptosis in glioblastoma. KDM4C may serve as an oncogene through the dual functions of inactivation of p53 and activation of c-Myc in glioblastoma. Our study demonstrates KDM4C inhibition as a promising therapeutic strategy for targeting glioblastoma.
    DOI:  https://doi.org/10.1038/s41419-020-03380-2
  32. Nat Commun. 2021 01 20. 12(1): 481
      T helper type 17 (Th17) cells have important functions in the pathogenesis of inflammatory and autoimmune diseases. Retinoid-related orphan receptor-γt (RORγt) is necessary for Th17 cell differentiation and functions. However, the transcriptional regulation of RORγt expression, especially at the enhancer level, is still poorly understood. Here we identify a novel enhancer of RORγt gene in Th17 cells, RORCE2. RORCE2 deficiency suppresses RORγt expression and Th17 differentiation, leading to reduced severity of experimental autoimmune encephalomyelitis. Mechanistically, RORCE2 is looped to RORγt promoter through SRY-box transcription factor 5 (SOX-5) in Th17 cells, and the loss of SOX-5 binding site in RORCE abolishes RORCE2 function and affects the binding of signal transducer and activator of transcription 3 (STAT3) to the RORγt locus. Taken together, our data highlight a molecular mechanism for the regulation of Th17 differentiation and functions, which may represent a new intervening clue for Th17-related diseases.
    DOI:  https://doi.org/10.1038/s41467-020-20786-w
  33. Nat Commun. 2021 Jan 22. 12(1): 531
      Chromosome conformation capture (3C) provides an adaptable tool for studying diverse biological questions. Current 3C methods generally provide either low-resolution interaction profiles across the entire genome, or high-resolution interaction profiles at limited numbers of loci. Due to technical limitations, generation of reproducible high-resolution interaction profiles has not been achieved at genome-wide scale. Here, to overcome this barrier, we systematically test each step of 3C and report two improvements over current methods. We show that up to 30% of reporter events generated using the popular in situ 3C method arise from ligations between two individual nuclei, but this noise can be almost entirely eliminated by isolating intact nuclei after ligation. Using Nuclear-Titrated Capture-C, we generate reproducible high-resolution genome-wide 3C interaction profiles by targeting 8055 gene promoters in erythroid cells. By pairing high-resolution 3C interaction calls with nascent gene expression we interrogate the role of promoter hubs and super-enhancers in gene regulation.
    DOI:  https://doi.org/10.1038/s41467-020-20809-6
  34. Science. 2021 01 22. pii: eabc6663. [Epub ahead of print]371(6527):
      Dot1 (disruptor of telomeric silencing-1), the histone H3 lysine 79 (H3K79) methyltransferase, is conserved throughout evolution, and its deregulation is found in human leukemias. Here, we provide evidence that acetylation of histone H4 allosterically stimulates yeast Dot1 in a manner distinct from but coordinating with histone H2B ubiquitination (H2BUb). We further demonstrate that this stimulatory effect is specific to acetylation of lysine 16 (H4K16ac), a modification central to chromatin structure. We provide a mechanism of this histone cross-talk and show that H4K16ac and H2BUb play crucial roles in H3K79 di- and trimethylation in vitro and in vivo. These data reveal mechanisms that control H3K79 methylation and demonstrate how H4K16ac, H3K79me, and H2BUb function together to regulate gene transcription and gene silencing to ensure optimal maintenance and propagation of an epigenetic state.
    DOI:  https://doi.org/10.1126/science.abc6663
  35. Elife. 2021 Jan 20. pii: e62994. [Epub ahead of print]10
      Active DNA demethylation has emerged as an important regulatory process of plant and mammalian immunity. However, very little is known about the mechanisms by which active demethylation controls transcriptional immune reprogramming and disease resistance. Here, we first show that the Arabidopsis active demethylase ROS1 promotes basal resistance towards Pseudomonas syringae by antagonizing RNA-directed DNA methylation (RdDM). Furthermore, we find that ROS1 facilitates the flagellin-triggered induction of the disease resistance gene RMG1 by limiting RdDM at the 3' boundary of a remnant RC/Helitron transposable element (TE) embedded in its promoter. We further identify flagellin-responsive ROS1 putative primary targets, and show that at a subset of promoters, ROS1 erases methylation at discrete regions exhibiting WRKY transcription factors (TFs) binding. In particular, we demonstrate that ROS1 removes methylation at the orphan immune receptor RLP43 promoter, to ensure DNA binding of WRKY TFs. Finally, we show that ROS1-directed demethylation of the RMG1 and RLP43 promoters is causal for both flagellin responsiveness of these genes and for basal resistance. Overall, these findings significantly advance our understanding of how active demethylases shape transcriptional immune reprogramming to enable antibacterial resistance.
    Keywords:  A. thaliana; genetics; genomics; plant biology
    DOI:  https://doi.org/10.7554/eLife.62994
  36. Genome Biol. 2021 Jan 19. 22(1): 24
      Although genome-wide DNA methylomes have demonstrated their clinical value as reliable biomarkers for tumor detection, subtyping, and classification, their direct biological impacts at the individual gene level remain elusive. Here we present MethylationToActivity (M2A), a machine learning framework that uses convolutional neural networks to infer promoter activities based on H3K4me3 and H3K27ac enrichment, from DNA methylation patterns for individual genes. Using publicly available datasets in real-world test scenarios, we demonstrate that M2A is highly accurate and robust in revealing promoter activity landscapes in various pediatric and adult cancers, including both solid and hematologic malignant neoplasms.
    Keywords:  Convolutional neural network; DNA methylation; Histone modifications; Transfer learning
    DOI:  https://doi.org/10.1186/s13059-020-02220-y
  37. Cell Death Differ. 2021 Jan 18.
      Ubiquitination by serving as a major degradation signal of proteins, but also by controlling protein functioning and localization, plays critical roles in most key cellular processes. Here, we show that MITF, the master transcription factor in melanocytes, controls ubiquitination in melanoma cells. We identified FBXO32, a component of the SCF E3 ligase complex as a new MITF target gene. FBXO32 favors melanoma cell migration, proliferation, and tumor development in vivo. Transcriptomic analysis shows that FBXO32 knockdown induces a global change in melanoma gene expression profile. These include the inhibition of CDK6 in agreement with an inhibition of cell proliferation and invasion upon FBXO32 silencing. Furthermore, proteomic analysis identifies SMARC4, a component of the chromatin remodeling complexes BAF/PBAF, as a FBXO32 partner. FBXO32 and SMARCA4 co-localize at loci regulated by FBXO32, such as CDK6 suggesting that FBXO32 controls transcription through the regulation of chromatin remodeling complex activity. FBXO32 and SMARCA4 are the components of a molecular cascade, linking MITF to epigenetics, in melanoma cells.
    DOI:  https://doi.org/10.1038/s41418-020-00710-x
  38. Nat Commun. 2021 Jan 22. 12(1): 520
      The fusion oncogene RUNX1/RUNX1T1 encodes an aberrant transcription factor, which plays a key role in the initiation and maintenance of acute myeloid leukemia. Here we show that the RUNX1/RUNX1T1 oncogene is a regulator of alternative RNA splicing in leukemic cells. The comprehensive analysis of RUNX1/RUNX1T1-associated splicing events identifies two principal mechanisms that underlie the differential production of RNA isoforms: (i) RUNX1/RUNX1T1-mediated regulation of alternative transcription start site selection, and (ii) direct or indirect control of the expression of genes encoding splicing factors. The first mechanism leads to the expression of RNA isoforms with alternative structure of the 5'-UTR regions. The second mechanism generates alternative transcripts with new junctions between internal cassettes and constitutive exons. We also show that RUNX1/RUNX1T1-mediated differential splicing affects several functional groups of genes and produces proteins with unique conserved domain structures. In summary, this study reveals alternative splicing as an important component of transcriptome re-organization in leukemia by an aberrant transcriptional regulator.
    DOI:  https://doi.org/10.1038/s41467-020-20848-z
  39. Proc Natl Acad Sci U S A. 2021 Jan 26. pii: e2019655118. [Epub ahead of print]118(4):
      Runt domain-related (Runx) transcription factors are essential for early T cell development in mice from uncommitted to committed stages. Single and double Runx knockouts via Cas9 show that target genes responding to Runx activity are not solely controlled by the dominant factor, Runx1. Instead, Runx1 and Runx3 are coexpressed in single cells; bind to highly overlapping genomic sites; and have redundant, collaborative functions regulating genes pivotal for T cell development. Despite stable combined expression levels across pro-T cell development, Runx1 and Runx3 preferentially activate and repress genes that change expression dynamically during lineage commitment, mostly activating T-lineage genes and repressing multipotent progenitor genes. Furthermore, most Runx target genes are sensitive to Runx perturbation only at one stage and often respond to Runx more for expression transitions than for maintenance. Contributing to this highly stage-dependent gene regulation function, Runx1 and Runx3 extensively shift their binding sites during commitment. Functionally distinct Runx occupancy sites associated with stage-specific activation or repression are also distinguished by different patterns of partner factor cobinding. Finally, Runx occupancies change coordinately at numerous clustered sites around positively or negatively regulated targets during commitment. This multisite binding behavior may contribute to a developmental "ratchet" mechanism making commitment irreversible.
    Keywords:  DNA binding site choice; Runx transcription factors; early T lymphocyte development; functional genomics; transcriptional regulation
    DOI:  https://doi.org/10.1073/pnas.2019655118
  40. Nat Commun. 2021 01 20. 12(1): 484
      The tumor suppressor p53 integrates stress response pathways by selectively engaging one of several potential transcriptomes, thereby triggering cell fate decisions (e.g., cell cycle arrest, apoptosis). Foundational to this process is the binding of tetrameric p53 to 20-bp response elements (REs) in the genome (RRRCWWGYYYN0-13RRRCWWGYYY). In general, REs at cell cycle arrest targets (e.g. p21) are of higher affinity than those at apoptosis targets (e.g., BAX). However, the RE sequence code underlying selectivity remains undeciphered. Here, we identify molecular mechanisms mediating p53 binding to high- and low-affinity REs by showing that key determinants of the code are embedded in the DNA shape. We further demonstrate that differences in minor/major groove widths, encoded by G/C or A/T bp content at positions 3, 8, 13, and 18 in the RE, determine distinct p53 DNA-binding modes by inducing different Arg248 and Lys120 conformations and interactions. The predictive capacity of this code was confirmed in vivo using genome editing at the BAX RE to interconvert the DNA-binding modes, transcription pattern, and cell fate outcome.
    DOI:  https://doi.org/10.1038/s41467-020-20783-z
  41. PLoS Comput Biol. 2021 Jan 19. 17(1): e1007814
      DNA topoisomerase II-β (TOP2B) is fundamental to remove topological problems linked to DNA metabolism and 3D chromatin architecture, but its cut-and-reseal catalytic mechanism can accidentally cause DNA double-strand breaks (DSBs) that can seriously compromise genome integrity. Understanding the factors that determine the genome-wide distribution of TOP2B is therefore not only essential for a complete knowledge of genome dynamics and organization, but also for the implications of TOP2-induced DSBs in the origin of oncogenic translocations and other types of chromosomal rearrangements. Here, we conduct a machine-learning approach for the prediction of TOP2B binding using publicly available sequencing data. We achieve highly accurate predictions, with accessible chromatin and architectural factors being the most informative features. Strikingly, TOP2B is sufficiently explained by only three features: DNase I hypersensitivity, CTCF and cohesin binding, for which genome-wide data are widely available. Based on this, we develop a predictive model for TOP2B genome-wide binding that can be used across cell lines and species, and generate virtual probability tracks that accurately mirror experimental ChIP-seq data. Our results deepen our knowledge on how the accessibility and 3D organization of chromatin determine TOP2B function, and constitute a proof of principle regarding the in silico prediction of sequence-independent chromatin-binding factors.
    DOI:  https://doi.org/10.1371/journal.pcbi.1007814