bims-crepig Biomed News
on Chromatin regulation and epigenetics in cell fate and cancer
Issue of 2021‒01‒03
twenty-nine papers selected by
Connor Rogerson
University of Cambridge, MRC Cancer Unit


  1. Sci Adv. 2020 Dec;pii: eaba9031. [Epub ahead of print]6(51):
      Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric "TF activity score" to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.
    DOI:  https://doi.org/10.1126/sciadv.aba9031
  2. Mol Cell. 2020 Dec 11. pii: S1097-2765(20)30887-X. [Epub ahead of print]
      Active DNA demethylation via ten-eleven translocation (TET) family enzymes is essential for epigenetic reprogramming in cell state transitions. TET enzymes catalyze up to three successive oxidations of 5-methylcytosine (5mC), generating 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), or 5-carboxycytosine (5caC). Although these bases are known to contribute to distinct demethylation pathways, the lack of tools to uncouple these sequential oxidative events has constrained our mechanistic understanding of the role of TETs in chromatin reprogramming. Here, we describe the first application of biochemically engineered TET mutants that unlink 5mC oxidation steps, examining their effects on somatic cell reprogramming. We show that only TET enzymes proficient for oxidation to 5fC/5caC can rescue the reprogramming potential of Tet2-deficient mouse embryonic fibroblasts. This effect correlated with rapid DNA demethylation at reprogramming enhancers and increased chromatin accessibility later in reprogramming. These experiments demonstrate that DNA demethylation through 5fC/5caC has roles distinct from 5hmC in somatic reprogramming to pluripotency.
    Keywords:  5-carboxycytosine; 5-formylcytosine; 5-hydroxymethylcytosine; 5caC; 5fC; 5hmC; DNA demethylation; TET; bACE-seq; epigenetics; iPSCs; induced pluripotent stem cells; reprogramming; ten-eleven translocation
    DOI:  https://doi.org/10.1016/j.molcel.2020.11.045
  3. Cancer Res. 2020 Dec 15. pii: canres.2588.2020. [Epub ahead of print]
      The BAF (mSWI/SNF) chromatin remodeling complex is of importance in development and has been linked to prostate oncogenesis. The oncogenic MUC1-C protein promotes lineage plasticity in the progression of neuroendocrine prostate cancer (NEPC); however, there is no known association between MUC1-C and BAF. We report here that MUC1-C binds directly to the E2F1 transcription factor and that the MUC1-C->E2F1 pathway induces expression of embryonic stem cell esBAF components BRG1, ARID1A, BAF60a, BAF155, and BAF170 in castrate-resistant (CRPC) and NEPC cells. In concert with this previously unrecognized pathway, MUC1 was associated with increased expression of E2F1 and esBAF components in NEPC tumors as compared to CRPC, supporting involvement of MUC1-C in activating the E2F1->esBAF pathway with progression to NEPC. MUC1-C formed a nuclear complex with BAF and activated cancer stem cell (CSC) gene signatures and the core pluripotency factor gene network. The MUC1-C->E2F1->BAF pathway was necessary for induction of both the NOTCH1 effector of CSC function and the NANOG pluripotency factor, and collectively, this network drove CSC self-renewal. These findings indicate that MUC1-C promotes NEPC progression by integrating activation of E2F1 and esBAF with induction of NOTCH1, NANOG, and stemness.
    DOI:  https://doi.org/10.1158/0008-5472.CAN-20-2588
  4. Bioinformatics. 2020 Dec 30. 36(Supplement_2): i659-i667
      MOTIVATION: Predictive models of DNA chromatin profile (i.e. epigenetic state), such as transcription factor binding, are essential for understanding regulatory processes and developing gene therapies. It is known that the 3D genome, or spatial structure of DNA, is highly influential in the chromatin profile. Deep neural networks have achieved state of the art performance on chromatin profile prediction by using short windows of DNA sequences independently. These methods, however, ignore the long-range dependencies when predicting the chromatin profiles because modeling the 3D genome is challenging.RESULTS: In this work, we introduce ChromeGCN, a graph convolutional network for chromatin profile prediction by fusing both local sequence and long-range 3D genome information. By incorporating the 3D genome, we relax the independent and identically distributed assumption of local windows for a better representation of DNA. ChromeGCN explicitly incorporates known long-range interactions into the modeling, allowing us to identify and interpret those important long-range dependencies in influencing chromatin profiles. We show experimentally that by fusing sequential and 3D genome data using ChromeGCN, we get a significant improvement over the state-of-the-art deep learning methods as indicated by three metrics. Importantly, we show that ChromeGCN is particularly useful for identifying epigenetic effects in those DNA windows that have a high degree of interactions with other DNA windows.
    AVAILABILITY AND IMPLEMENTATION: https://github.com/QData/ChromeGCN.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btaa793
  5. Cell Stem Cell. 2020 Dec 15. pii: S1934-5909(20)30553-1. [Epub ahead of print]
      Regulation of hematopoiesis during human development remains poorly defined. Here we applied single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) to over 8,000 human immunophenotypic blood cells from fetal liver and bone marrow. We inferred their differentiation trajectory and identified three highly proliferative oligopotent progenitor populations downstream of hematopoietic stem cells (HSCs)/multipotent progenitors (MPPs). Along this trajectory, we observed opposing patterns of chromatin accessibility and differentiation that coincided with dynamic changes in the activity of distinct lineage-specific transcription factors. Integrative analysis of chromatin accessibility and gene expression revealed extensive epigenetic but not transcriptional priming of HSCs/MPPs prior to their lineage commitment. Finally, we refined and functionally validated the sorting strategy for the HSCs/MPPs and achieved around 90% enrichment. Our study provides a useful framework for future investigation of human developmental hematopoiesis in the context of blood pathologies and regenerative medicine.
    Keywords:  bone marrow; fetal hematopoiesis; fetal liver; hematopoietic stem cells; scATAC-seq; scRNA-seq
    DOI:  https://doi.org/10.1016/j.stem.2020.11.015
  6. Elife. 2020 Dec 23. pii: e59073. [Epub ahead of print]9
      Small cell carcinoma of the ovary, hypercalcemic type (SCCOHT) is a rare and aggressive form of ovarian cancer. SCCOHT tumors have inactivating mutations in SMARCA4 (BRG1), one of the two mutually exclusive ATPases of the SWI/SNF chromatin remodeling complex. To address the role that BRG1 loss plays in SCCOHT tumorigenesis, we performed integrative multi-omic analyses in SCCOHT cell lines +/- BRG1 re-expression. BRG1 re-expression induced a gene and protein signature similar to an epithelial cell and gained chromatin accessibility sites correlated with other epithelial originating TCGA tumors. Gained chromatin accessibility and BRG1 recruited sites were strongly enriched for transcription factor binding motifs of AP-1 family members. Furthermore, AP-1 motifs were enriched at the promoters of highly upregulated epithelial genes. Using a dominant negative AP-1 cell line, we found that both AP-1 DNA binding activity and BRG1 re-expression are necessary for the gene and protein expression of epithelial genes. Our study demonstrates that BRG1 re-expression drives an epithelial-like gene and protein signature in SCCOHT cells that depends upon by AP-1 activity.
    Keywords:  cancer biology; human
    DOI:  https://doi.org/10.7554/eLife.59073
  7. BMC Genomics. 2020 Dec 29. 21(Suppl 11): 802
      BACKGROUND: RNA-Seq, the high-throughput sequencing (HT-Seq) of mRNAs, has become an essential tool for characterizing gene expression differences between different cell types and conditions. Gene expression is regulated by several mechanisms, including epigenetically by post-translational histone modifications which can be assessed by ChIP-Seq (Chromatin Immuno-Precipitation Sequencing). As more and more biological samples are analyzed by the combination of ChIP-Seq and RNA-Seq, the integrated analysis of the corresponding data sets becomes, theoretically, a unique option to study gene regulation. However, technically such analyses are still in their infancy.RESULTS: Here we introduce intePareto, a computational tool for the integrative analysis of RNA-Seq and ChIP-Seq data. With intePareto we match RNA-Seq and ChIP-Seq data at the level of genes, perform differential expression analysis between biological conditions, and prioritize genes with consistent changes in RNA-Seq and ChIP-Seq data using Pareto optimization.
    CONCLUSION: intePareto facilitates comprehensive understanding of high dimensional transcriptomic and epigenomic data. Its superiority to a naive differential gene expression analysis with RNA-Seq and available integrative approach is demonstrated by analyzing a public dataset.
    Keywords:  ChIP-Seq; Integrative analysis; RNA-Seq
    DOI:  https://doi.org/10.1186/s12864-020-07205-6
  8. EMBO Rep. 2020 Dec 29. e50967
      Lysine succinylation (Ksucc) is an evolutionarily conserved and widespread post-translational modification. Histone acetyltransferase 1 (HAT1) is a type B histone acetyltransferase, regulating the acetylation of both histone and non-histone proteins. However, the role of HAT1 in succinylation modulation remains unclear. Here, we employ a quantitative proteomics approach to study succinylation in HepG2 cancer cells and find that HAT1 modulates lysine succinylation on various proteins including histones and non-histones. HAT1 succinylates histone H3 on K122, contributing to epigenetic regulation and gene expression in cancer cells. Moreover, HAT1 catalyzes the succinylation of PGAM1 on K99, resulting in its increased enzymatic activity and the stimulation of glycolytic flux in cancer cells. Clinically, HAT1 is significantly elevated in liver cancer, pancreatic cancer, and cholangiocarcinoma tissues. Functionally, HAT1 succinyltransferase activity and the succinylation of PGAM1 by HAT1 play critical roles in promoting tumor progression in vitro and in vivo. Thus, we conclude that HAT1 is a succinyltransferase for histones and non-histones in tumorigenesis.
    Keywords:  HAT1; epigenetic regulation; glycolysis; succinylation; tumorigenesis
    DOI:  https://doi.org/10.15252/embr.202050967
  9. PLoS Genet. 2020 Dec;16(12): e1009252
      Growth and starvation are considered opposite ends of a spectrum. To sustain growth, cells use coordinated gene expression programs and manage biomolecule supply in order to match the demands of metabolism and translation. Global growth programs complement increased ribosomal biogenesis with sufficient carbon metabolism, amino acid and nucleotide biosynthesis. How these resources are collectively managed is a fundamental question. The role of the Gcn4/ATF4 transcription factor has been best studied in contexts where cells encounter amino acid starvation. However, high Gcn4 activity has been observed in contexts of rapid cell proliferation, and the roles of Gcn4 in such growth contexts are unclear. Here, using a methionine-induced growth program in yeast, we show that Gcn4/ATF4 is the fulcrum that maintains metabolic supply in order to sustain translation outputs. By integrating matched transcriptome and ChIP-Seq analysis, we decipher genome-wide direct and indirect roles for Gcn4 in this growth program. Genes that enable metabolic precursor biosynthesis indispensably require Gcn4; contrastingly ribosomal genes are partly repressed by Gcn4. Gcn4 directly binds promoter-regions and transcribes a subset of metabolic genes, particularly driving lysine and arginine biosynthesis. Gcn4 also globally represses lysine and arginine enriched transcripts, which include genes encoding the translation machinery. The Gcn4 dependent lysine and arginine supply thereby maintains the synthesis of the translation machinery. This is required to maintain translation capacity. Gcn4 consequently enables metabolic-precursor supply to bolster protein synthesis, and drive a growth program. Thus, we illustrate how growth and starvation outcomes are both controlled using the same Gcn4 transcriptional outputs that function in distinct contexts.
    DOI:  https://doi.org/10.1371/journal.pgen.1009252
  10. Cell Rep. 2020 Dec 22. pii: S2211-1247(20)31506-0. [Epub ahead of print]33(12): 108517
      The chromatin protein positive coactivator 4 (PC4) has multiple functions, including chromatin compaction. However, its role in immune cells is largely unknown. We show that PC4 orchestrates chromatin structure and gene expression in mature B cells. B-cell-specific PC4-deficient mice show impaired production of antibody upon antigen stimulation. The PC4 complex purified from B cells contains the transcription factors (TFs) IKAROS and IRF4. IKAROS protein is reduced in PC4-deficient mature B cells, resulting in de-repression of their target genes in part by diminished interactions with gene-silencing components. Upon activation, the amount of IRF4 protein is not increased in PC4-deficient B cells, resulting in reduction of plasma cells. Importantly, IRF4 reciprocally induces PC4 expression via a super-enhancer. PC4 knockdown in human B cell lymphoma and myeloma cells reduces IKAROS protein as an anticancer drug, lenalidomide. Our findings establish PC4 as a chromatin regulator of B cells and a possible therapeutic target adjoining IKAROS in B cell malignancies.
    Keywords:  IKAROS; IRF4; PC4; cell survival; chromatin; complex purification; human B cell malignancy; mature B cell; plasma cell differentiation
    DOI:  https://doi.org/10.1016/j.celrep.2020.108517
  11. Cell Rep. 2020 Dec 29. pii: S2211-1247(20)31550-3. [Epub ahead of print]33(13): 108561
      One key aspect of epigenetic inheritance is that chromatin structures can be stably inherited through generations after the removal of the signals that establish such structures. In fission yeast, the RNA interference (RNAi) pathway is critical for the targeting of histone methyltransferase Clr4 to pericentric repeats to establish heterochromatin. However, pericentric heterochromatin cannot be properly inherited in the absence of RNAi, suggesting the existence of mechanisms that counteract chromatin structure inheritance. Here, we show that mutations of components of the INO80 chromatin-remodeling complex allow pericentric heterochromatin inheritance in RNAi mutants. The ability of INO80 to counter heterochromatin inheritance is attributed to one subunit, Iec5, which promotes histone turnover at heterochromatin but has little effects on nucleosome positioning at heterochromatin, gene expression, or the DNA damage response. These analyses demonstrate the importance of the INO80 chromatin-remodeling complex in controlling heterochromatin inheritance and maintaining the proper heterochromatin landscape of the genome.
    Keywords:  Clr4, nucleosome, chromatin-remodeling complex; H3K9 methylation; INO80; RNAi; epigenetic inheritance; heterochromatin; histone turnover
    DOI:  https://doi.org/10.1016/j.celrep.2020.108561
  12. Bioinformatics. 2020 Dec 30. 36(Supplement_2): i692-i699
      MOTIVATION: Despite the fact that structural variants (SVs) play an important role in cancer, methods to predict their effect, especially for SVs in non-coding regions, are lacking, leaving them often overlooked in the clinic. Non-coding SVs may disrupt the boundaries of Topologically Associated Domains (TADs), thereby affecting interactions between genes and regulatory elements such as enhancers. However, it is not known when such alterations are pathogenic. Although machine learning techniques are a promising solution to answer this question, representing the large number of interactions that an SV can disrupt in a single feature matrix is not trivial.RESULTS: We introduce svMIL: a method to predict pathogenic TAD boundary-disrupting SV effects based on multiple instance learning, which circumvents the need for a traditional feature matrix by grouping SVs into bags that can contain any number of disruptions. We demonstrate that svMIL can predict SV pathogenicity, measured through same-sample gene expression aberration, for various cancer types. In addition, our approach reveals that somatic pathogenic SVs alter different regulatory interactions than somatic non-pathogenic SVs and germline SVs.
    AVAILABILITY AND IMPLEMENTATION: All code for svMIL is publicly available on GitHub: https://github.com/UMCUGenetics/svMIL.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btaa802
  13. Elife. 2020 Dec 29. pii: e64588. [Epub ahead of print]9
      Changes in available nutrients are inevitable events for most living organisms. Upon nutritional stress, several signaling pathways cooperate to change the transcription program through chromatin regulation to rewire cellular metabolism. In budding yeast, histone H3 threonine 11 phosphorylation (H3pT11) acts as a marker of low glucose stress and regulates the transcription of nutritional stress responsive genes. Understanding how this histone modification 'senses' external glucose changes remains elusive. Here, we show that Tda1, the yeast orthologue of human Nuak1, is a direct kinase for H3pT11 upon low glucose stress. Yeast AMPK directly phosphorylates Tda1 to govern Tda1 activity, while CK2 regulates Tda1 nuclear localization. Collectively, AMPK and CK2 signaling converge on histone kinase Tda1 to link external low glucose stress to chromatin regulation.
    Keywords:  S. cerevisiae; chromosomes; gene expression
    DOI:  https://doi.org/10.7554/eLife.64588
  14. Nature. 2020 Dec 23.
      Histone methyltransferases of the nuclear receptor-binding SET domain protein (NSD) family, including NSD1, NSD2 and NSD3, have crucial roles in chromatin regulation and are implicated in oncogenesis1,2. NSD enzymes exhibit an autoinhibitory state that is relieved by binding to nucleosomes, enabling dimethylation of histone H3 at Lys36 (H3K36)3-7. However, the molecular basis that underlies this mechanism is largely unknown. Here we solve the cryo-electron microscopy structures of NSD2 and NSD3 bound to mononucleosomes. We find that binding of NSD2 and NSD3 to mononucleosomes causes DNA near the linker region to unwrap, which facilitates insertion of the catalytic core between the histone octamer and the unwrapped segment of DNA. A network of DNA- and histone-specific contacts between NSD2 or NSD3 and the nucleosome precisely defines the position of the enzyme on the nucleosome, explaining the specificity of methylation to H3K36. Intermolecular contacts between NSD proteins and nucleosomes are altered by several recurrent cancer-associated mutations in NSD2 and NSD3. NSDs that contain these mutations are catalytically hyperactive in vitro and in cells, and their ectopic expression promotes the proliferation of cancer cells and the growth of xenograft tumours. Together, our research provides molecular insights into the nucleosome-based recognition and histone-modification mechanisms of NSD2 and NSD3, which could lead to strategies for therapeutic targeting of proteins of the NSD family.
    DOI:  https://doi.org/10.1038/s41586-020-03069-8
  15. Cancer Res. 2020 Dec 21. pii: canres.1417.2020. [Epub ahead of print]
      Switch/sucrose-non-fermentable (SWI/SNF) chromatin remodeling complexes are critical regulators of chromatin dynamics during transcription, DNA replication, and DNA repair. A recently identified SWI/SNF subcomplex termed GLTSCR1/1L-BAF (GBAF; or "non-canonical BAF", ncBAF) uniquely contains bromodomain-containing protein BRD9 and glioma tumor suppressor candidate region 1 (GLTSCR1) or its paralog GLTSCR1-like (GLTSCR1L). Recent studies have identified a unique dependency on GBAF (ncBAF) complexes in synovial sarcoma and malignant rhabdoid tumors, both of which possess aberrations in canonical BAF (cBAF) and Polybromo-BAF (PBAF) complexes. Dependencies on GBAF in malignancies without SWI/SNF aberrations, however, are less defined. Here, we show that GBAF, particularly its BRD9 subunit, is required for the viability of prostate cancer cell lines in vitro and for optimal xenograft tumor growth in vivo. BRD9 interacts with androgen receptor (AR) and CCCTC-binding factor (CTCF), and modulates AR-dependent gene expression. The GBAF complex exhibits overlapping genome localization and transcriptional targets as bromodomain and extraterminal domain containing (BET) proteins, which are established AR-coregulators. Our results demonstrate that GBAF is critical for coordinating SWI/SNF - BET cooperation and uncover a new druggable target for AR-positive prostate cancers, including those resistant to androgen deprivation or antiandrogen therapies.
    DOI:  https://doi.org/10.1158/0008-5472.CAN-20-1417
  16. Mol Cell. 2020 Dec 23. pii: S1097-2765(20)30906-0. [Epub ahead of print]
      Termination of RNA polymerase II (RNAPII) transcription in metazoans relies largely on the cleavage and polyadenylation (CPA) and integrator (INT) complexes originally found to act at the ends of protein-coding and small nuclear RNA (snRNA) genes, respectively. Here, we monitor CPA- and INT-dependent termination activities genome-wide, including at thousands of previously unannotated transcription units (TUs), producing unstable RNA. We verify the global activity of CPA occurring at pA sites indiscriminately of their positioning relative to the TU promoter. We also identify a global activity of INT, which is largely sequence-independent and restricted to a ~3-kb promoter-proximal region. Our analyses suggest two functions of genome-wide INT activity: it dampens transcriptional output from weak promoters, and it provides quality control of RNAPII complexes that are unfavorably configured for transcriptional elongation. We suggest that the function of INT in stable snRNA production is an exception from its general cellular role, the attenuation of non-productive transcription.
    Keywords:  cleavage and polyadenylation complex; genome-wide transcription termination; integrator complex; pervasive transcription
    DOI:  https://doi.org/10.1016/j.molcel.2020.12.014
  17. Bone. 2020 Dec 28. pii: S8756-3282(20)30624-4. [Epub ahead of print] 115836
      Osteoclasts (OCs) have been well-known involved in the exacerbation of bone-related diseases. However, the role of metabolites on osteoclastogenesis has not been well characterized. Herein, we found osteoclastogenesis was negatively regulated by α-ketoglutarate (αKG) in vitro and in vivo (C57BL/6 mouse). Kinetic transcriptome analysis revealed the upregulation of solute carrier family 7 member 11 (Slc7a11), a subunit of the cysteine/glutamate antiporter, as well as the downregulation of typical OC maker genes through αKG treatment. Given that Slc7a11 could control ROS level through glutathione import, we measured intracellular ROS, then RANKL-induced ROS production was inhibited by αKG. Notably, we highlight that αKG plays an epigenetic co-factor at the Slc7a11 promoter by demethylating repressive histone H3K9 methylation and simultaneously increasing the nuclear factor erythroid 2-related factor (Nrf2) binding, a critical transcription factor through chromatin immunoprecipitation (ChIP) analysis. Together, we suggested that αKG could be a therapeutic strategy for OC activated diseases.
    Keywords:  Epigenetics; Nrf2; Osteoclast; Osteoporosis; Slc7a11; alpha-ketoglutarate
    DOI:  https://doi.org/10.1016/j.bone.2020.115836
  18. Cancer Res. 2020 Dec 23. pii: canres.1323.2020. [Epub ahead of print]
      Targeting epigenetics in cancer has emerged as a promising anticancer strategy. p300/CBP is a central regulator of epigenetics and plays an important role in hepatocellular carcinoma (HCC) progression. Tumor-associated metabolic alterations contribute to the establishment and maintenance of the tumorigenic state. In this study, we used a novel p300 inhibitor, B029-2, to investigate the effect of targeting p300/CBP in HCC and tumor metabolism. p300/CBP-mediated acetylation of H3K18 and H3K27 increased in HCC tissues compared to surrounding noncancerous tissues. Conversely, treatment with B029-2 specifically decreased H3K18Ac and H3K27Ac and displayed significant antitumor effects in HCC cells in vitro and in vivo. Importantly, ATAC-seq and RNA-seq integrated analysis revealed that B029-2 disturbed metabolic reprogramming in HCC cells. Moreover, B029-2 decreased glycolytic function and nucleotide synthesis in Huh-7 cells by reducing H3K18Ac and H3K27Ac levels at the promoter regions of amino acid metabolism and nucleotide synthesis enzyme genes, including PSPH, PSAT1, ALDH18A1, TALDO1, ATIC, and DTYMK. Overexpression of PSPH and DTYMK partially reversed the inhibitory effect of B029-2 on HCC cells. These findings suggested that p300/CBP epigenetically regulates the expression of glycolysis-related metabolic enzymes through modulation of histone acetylation in HCC and highlight the value of targeting the histone acetyltransferase activity of p300/CBP for HCC therapy.
    DOI:  https://doi.org/10.1158/0008-5472.CAN-20-1323
  19. Genome Res. 2020 Dec 21.
      Transposable elements (TEs) are an integral part of the host transcriptome. TE-containing noncoding RNAs (ncRNAs) show considerable tissue specificity and play important roles during development, including stem cell maintenance and cell differentiation. Recent advances in single-cell RNA-seq (scRNA-seq) revolutionized cell type-specific gene expression analysis. However, effective scRNA-seq quantification tools tailored for TEs are lacking, limiting our ability to dissect TE expression dynamics at single-cell resolution. To address this issue, we established a TE expression quantification pipeline that is compatible with scRNA-seq data generated across multiple technology platforms. We constructed TE-containing ncRNA references using bulk RNA-seq data and showed that quantifying TE expression at the transcript level effectively reduces noise. As proof of principle, we applied this strategy to mouse embryonic stem cells and successfully captured the expression profile of endogenous retroviruses in single cells. We further expanded our analysis to scRNA-seq data from early stages of mouse embryogenesis. Our results illustrated the dynamic TE expression at preimplantation stages and revealed 146 TE-containing ncRNA transcripts with substantial tissue specificity during gastrulation and early organogenesis.
    DOI:  https://doi.org/10.1101/gr.265173.120
  20. iScience. 2021 Jan 22. 24(1): 101913
      Cell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification of cell clusters by considering the rich hierarchical structure of known cell types. Furthermore, CellO comes pre-trained on a comprehensive data set of human, healthy, untreated primary samples in the Sequence Read Archive. CellO's comprehensive training set enables it to run out of the box on diverse cell types and achieves competitive or even superior performance when compared to existing state-of-the-art methods. Lastly, CellO's linear models are easily interpreted, thereby enabling exploration of cell-type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO's models across the ontology.
    Keywords:  Classification of Bioinformatical Subject; Genomic Analysis; Genomics
    DOI:  https://doi.org/10.1016/j.isci.2020.101913
  21. Proc Natl Acad Sci U S A. 2021 Jan 12. pii: e2021171118. [Epub ahead of print]118(2):
      A transcription factor (TF) is a sequence-specific DNA-binding protein that modulates the transcription of a set of particular genes, and thus regulates gene expression in the cell. TFs have commonly been predicted by analyzing sequence homology with the DNA-binding domains of TFs already characterized. Thus, TFs that do not show homologies with the reported ones are difficult to predict. Here we report the development of a deep learning-based tool, DeepTFactor, that predicts whether a protein in question is a TF. DeepTFactor uses a convolutional neural network to extract features of a protein. It showed high performance in predicting TFs of both eukaryotic and prokaryotic origins, resulting in F1 scores of 0.8154 and 0.8000, respectively. Analysis of the gradients of prediction score with respect to input suggested that DeepTFactor detects DNA-binding domains and other latent features for TF prediction. DeepTFactor predicted 332 candidate TFs in Escherichia coli K-12 MG1655. Among them, 84 candidate TFs belong to the y-ome, which is a collection of genes that lack experimental evidence of function. We experimentally validated the results of DeepTFactor prediction by further characterizing genome-wide binding sites of three predicted TFs, YqhC, YiaU, and YahB. Furthermore, we made available the list of 4,674,808 TFs predicted from 73,873,012 protein sequences in 48,346 genomes. DeepTFactor will serve as a useful tool for predicting TFs, which is necessary for understanding the regulatory systems of organisms of interest. We provide DeepTFactor as a stand-alone program, available at https://bitbucket.org/kaistsystemsbiology/deeptfactor.
    Keywords:  ChIP-exo; deep learning; transcription factor; transcription regulation; y-ome
    DOI:  https://doi.org/10.1073/pnas.2021171118
  22. Genome Res. 2020 Dec 18. pii: gr.266239.120. [Epub ahead of print]
      TSA-seq mapping suggests that gene distance to nuclear speckles is more deterministic and predictive of gene expression levels than gene radial positioning. Gene expression correlates inversely with distance to nuclear speckles, with chromosome regions of unusually high expression located at the apex of chromosome loops protruding from the nuclear periphery into the interior. Genomic distances to the nearest lamina-associated domain are larger for loop apexes mapping closest to nuclear speckles, suggesting the possibility of conservation of speckle-associated regions. To facilitate comparison of genome organization by TSA-seq, we reduced required cell numbers 10-20-fold for TSA-seq by deliberately saturating protein-labeling while preserving distance mapping by the still unsaturated DNA-labeling. Only ~10% of the genome shows statistically significant shifts in relative nuclear speckle distances in pair-wise comparisons between human cell lines (H1, HFF, HCT116, K562); however, these moderate shifts in nuclear speckle distances tightly correlate with changes in cell type-specific gene expression. Similarly, half of heat-shock induced gene loci already preposition very close to nuclear speckles, with the remaining positioned near or at intermediate distance (HSPH1) to nuclear speckles but shifting even closer with transcriptional induction. Speckle association together with chromatin decondensation correlates with expression amplification upon HSPH1 activation. Our results demonstrate a largely "hardwired" genome organization with specific genes moving small mean distances relative to speckles during cell differentiation or physiological transition, suggesting an important role of nuclear speckles in gene expression regulation.
    DOI:  https://doi.org/10.1101/gr.266239.120
  23. Genome Res. 2020 Dec 18.
      Adenosine (A) to inosine (I) RNA editing contributes to transcript diversity and modulates gene expression in a dynamic, cell type-specific manner. During mammalian brain development, editing of specific adenosines increases, whereas the expression of A-to-I editing enzymes remains unchanged, suggesting molecular mechanisms that mediate spatiotemporal regulation of RNA editing exist. Herein, by using a combination of biochemical and genomic approaches, we uncover a molecular mechanism that regulates RNA editing in a neural- and development-specific manner. Comparing editomes during development led to the identification of neural transcripts that were edited only in one life stage. The stage-specific editing is largely regulated by differential gene expression during neural development. Proper expression of nearly one-third of the neurodevelopmentally regulated genes is dependent on adr-2, the sole A-to-I editing enzyme in C. elegans However, we also identified a subset of neural transcripts that are edited and expressed throughout development. Despite a neural-specific down-regulation of adr-2 during development, the majority of these sites show increased editing in adult neural cells. Biochemical data suggest that ADR-1, a deaminase-deficient member of the adenosine deaminase acting on RNA (ADAR) family, is competing with ADR-2 for binding to specific transcripts early in development. Our data suggest a model in which during neural development, ADR-2 levels overcome ADR-1 repression, resulting in increased ADR-2 binding and editing of specific transcripts. Together, our findings reveal tissue- and development-specific regulation of RNA editing and identify a molecular mechanism that regulates ADAR substrate recognition and editing efficiency.
    DOI:  https://doi.org/10.1101/gr.267575.120
  24. Genome Res. 2020 Dec 23. pii: gr.266213.120. [Epub ahead of print]
      RNA sequencing is widely used to measure gene expression across a vast range of animal and plant tissues and conditions. Most studies of computational methods for gene expression analysis use simulated data to evaluate the accuracy of these methods. These simulations typically include reads generated from known genes at varying levels of expression. Until now, simulations did not include reads from noisy transcripts, including erroneous transcription, erroneous splicing, and other processes that affect transcription in living cells. Here we examine the effects of realistic amounts of transcriptional noise on the ability of leading computational methods to assemble and quantify the genes and transcripts in an RNA-sequencing experiment. We demonstrate that the inclusion of noise leads to systematic errors in the ability of these programs to measure expression, including systematic underestimates of transcript abundance levels and large increases in the number of false positive genes and transcripts. Our results also suggest that alignment-free computational methods sometimes fail to detect transcripts expressed at relatively low levels.
    DOI:  https://doi.org/10.1101/gr.266213.120
  25. Mol Cell. 2020 Dec 21. pii: S1097-2765(20)30897-2. [Epub ahead of print]
      Transcription factors regulate gene networks controlling normal hematopoiesis and are frequently deregulated in acute myeloid leukemia (AML). Critical to our understanding of the mechanism of cellular transformation by oncogenic transcription factors is the ability to define their direct gene targets. However, gene network cascades can change within minutes to hours, making it difficult to distinguish direct from secondary or compensatory transcriptional changes by traditional methodologies. To overcome this limitation, we devised cell models in which the AML1-ETO protein could be quickly degraded upon addition of a small molecule. The rapid kinetics of AML1-ETO removal, when combined with analysis of transcriptional output by nascent transcript analysis and genome-wide AML1-ETO binding by CUT&RUN, enabled the identification of direct gene targets that constitute a core AML1-ETO regulatory network. Moreover, derepression of this gene network was associated with RUNX1 DNA binding and triggered a transcription cascade ultimately resulting in myeloid differentiation.
    Keywords:  AML1-ETO; PRO-seq; PROTAC; RUNX1; RUNX1T1; degron tag; myeloid leukemia; nascent transcription
    DOI:  https://doi.org/10.1016/j.molcel.2020.12.005
  26. Nature. 2020 Dec 23.
      Focal chromosomal amplification contributes to the initiation of cancer by mediating overexpression of oncogenes1-3, and to the development of cancer therapy resistance by increasing the expression of genes whose action diminishes the efficacy of anti-cancer drugs. Here we used whole-genome sequencing of clonal cell isolates that developed chemotherapeutic resistance to show that chromothripsis is a major driver of circular extrachromosomal DNA (ecDNA) amplification (also known as double minutes) through mechanisms that depend on poly(ADP-ribose) polymerases (PARP) and the catalytic subunit of DNA-dependent protein kinase (DNA-PKcs). Longitudinal analyses revealed that a further increase in drug tolerance is achieved by structural evolution of ecDNAs through additional rounds of chromothripsis. In situ Hi-C sequencing showed that ecDNAs preferentially tether near chromosome ends, where they re-integrate when DNA damage is present. Intrachromosomal amplifications that formed initially under low-level drug selection underwent continuing breakage-fusion-bridge cycles, generating amplicons more than 100 megabases in length that became trapped within interphase bridges and then shattered, thereby producing micronuclei whose encapsulated ecDNAs are substrates for chromothripsis. We identified similar genome rearrangement profiles linked to localized gene amplification in human cancers with acquired drug resistance or oncogene amplifications. We propose that chromothripsis is a primary mechanism that accelerates genomic DNA rearrangement and amplification into ecDNA and enables rapid acquisition of tolerance to altered growth conditions.
    DOI:  https://doi.org/10.1038/s41586-020-03064-z
  27. PLoS Biol. 2020 Dec;18(12): e3001001
      Histone variants expand chromatin functions in eukaryote genomes. H2A.B genes are testis-expressed short histone H2A variants that arose in placental mammals. Their biological functions remain largely unknown. To investigate their function, we generated a knockout (KO) model that disrupts all 3 H2A.B genes in mice. We show that H2A.B KO males have globally altered chromatin structure in postmeiotic germ cells. Yet, they do not show impaired spermatogenesis or testis function. Instead, we find that H2A.B plays a crucial role postfertilization. Crosses between H2A.B KO males and females yield embryos with lower viability and reduced size. Using a series of genetic crosses that separate parental and zygotic contributions, we show that the H2A.B status of both the father and mother, but not of the zygote, affects embryonic viability and growth during gestation. We conclude that H2A.B is a novel parental-effect gene, establishing a role for short H2A histone variants in mammalian development. We posit that parental antagonism over embryonic growth drove the origin and ongoing diversification of short histone H2A variants in placental mammals.
    DOI:  https://doi.org/10.1371/journal.pbio.3001001
  28. BMC Med Genomics. 2020 Dec 28. 13(Suppl 11): 190
      BACKGROUND: Renal cell carcinoma (RCC) is a complex disease and is comprised of several histological subtypes, the most frequent of which are clear cell renal cell carcinoma (ccRCC), papillary renal cell carcinoma (PRCC) and chromophobe renal cell carcinoma (ChRCC). While lots of studies have been performed to investigate the molecular characterizations of different subtypes of RCC, our knowledge regarding the underlying mechanisms are still incomplete. As molecular alterations are eventually reflected on the pathway level to execute certain biological functions, characterizing the pathway perturbations is crucial for understanding tumorigenesis and development of RCC.METHODS: In this study, we investigated the pathway perturbations of various RCC subtype against normal tissue based on differential expressed genes within a certain pathway. We explored the potential upstream regulators of subtype-specific pathways with Ingenuity Pathway Analysis (IPA). We also evaluated the relationships between subtype-specific pathways and clinical outcome with survival analysis.
    RESULTS: In this study, we carried out a pathway-based analysis to explore the mechanisms of various RCC subtypes with TCGA RNA-seq data. Both commonly altered pathways and subtype-specific pathways were detected. To identify the distinctive characteristics of each subtype, we focused on subtype-specific perturbed pathways. Specifically, we observed that some of the altered pathways were regulated by several recurrent upstream regulators which presenting different expression patterns among distinct RCC subtypes. We also noticed that a large number of perturbed pathways were controlled by the subtype-specific upstream regulators. Moreover, we also evaluated the relationships between perturbed pathways and clinical outcome. Prognostic pathways were identified and their roles in tumor development and progression were inferred.
    CONCLUSIONS: In summary, we evaluated the relationships among pathway perturbations, upstream regulators and clinical outcome for differential subtypes in RCC. We hypothesized that the alterations of common upstream regulators as well as subtype-specific upstream regulators work together to affect the downstream pathway perturbations and drive cancer initialization and prognosis. Our findings not only increase our understanding of the mechanisms of various RCC subtypes, but also provide targets for personalized therapeutic intervention.
    Keywords:  ChRCC; PRCC; Pathway perturbation; Prognostic pathway; RCC; Upstream regulator; ccRCC
    DOI:  https://doi.org/10.1186/s12920-020-00827-5
  29. Bioinformatics. 2020 Dec 26. pii: btaa1075. [Epub ahead of print]
      MOTIVATION: Histone post-translational modifications (PTMs) are involved in a variety of essential regulatory processes in the cell, including transcription control. Recent studies have shown that histone PTMs can be accurately predicted from the knowledge of transcription factor binding or DNase hypersensitivity data. Similarly, it has been shown that one can predict PTMs from the underlying DNA primary sequence.RESULTS: In this study, we introduce a deep learning architecture called DeepPTM for predicting histone PTMs from transcription factor binding data and the primary DNA sequence. Extensive experimental results show that our deep learning model outperforms the prediction accuracy of the model proposed in Benveniste et al. (PNAS 2014) and DeepHistone (BMC Genomics 2019). The competitive advantage of our framework lies in the synergistic use of deep learning combined with an effective pre-processing step. Our classification framework has also enabled the discovery that the knowledge of a small subset of transcription factors (which are histone-PTM and cell-type specific) can provide almost the same prediction accuracy that can be obtained using all the transcription factors data.
    AVAILABILITY: https://github.com/dDipankar/DeepPTM.
    DOI:  https://doi.org/10.1093/bioinformatics/btaa1075