bims-ectoca Biomed News
on Epigenetic control of tolerance in cancer
Issue of 2021–07–18
63 papers selected by
Ankita Daiya, Birla Institute of Technology and Science



  1. Bioinformatics. 2021 07 12. 37(Suppl_1): i349-i357
       MOTIVATION: Recent advances in single-cell RNA-sequencing (scRNA-seq) technologies promise to enable the study of gene regulatory associations at unprecedented resolution in diverse cellular contexts. However, identifying unique regulatory associations observed only in specific cell types or conditions remains a key challenge; this is particularly so for rare transcriptional states whose sample sizes are too small for existing gene regulatory network inference methods to be effective.
    RESULTS: We present ShareNet, a Bayesian framework for boosting the accuracy of cell type-specific gene regulatory networks by propagating information across related cell types via an information sharing structure that is adaptively optimized for a given single-cell dataset. The techniques we introduce can be used with a range of general network inference algorithms to enhance the output for each cell type. We demonstrate the enhanced accuracy of our approach on three benchmark scRNA-seq datasets. We find that our inferred cell type-specific networks also uncover key changes in gene associations that underpin the complex rewiring of regulatory networks across cell types, tissues and dynamic biological processes. Our work presents a path toward extracting deeper insights about cell type-specific gene regulation in the rapidly growing compendium of scRNA-seq datasets.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    AVAILABILITY AND IMPLEMENTATION: The code for ShareNet is available at http://sharenet.csail.mit.edu and https://github.com/alexw16/sharenet.
    DOI:  https://doi.org/10.1093/bioinformatics/btab269
  2. Methods Mol Biol. 2021 ;2328 171-182
      With the advent of recent next-generation sequencing (NGS) technologies in genomics, transcriptomics, and epigenomics, profiling single-cell sequencing became possible. The single-cell RNA sequencing (scRNA-seq) is widely used to characterize diverse cell populations and ascertain cell type-specific regulatory mechanisms. The gene regulatory network (GRN) mainly consists of genes and their regulators-transcription factors (TF). Here, we describe the lightning-fast Python implementation of the SCENIC (Single-Cell reEgulatory Network Inference and Clustering) pipeline called pySCENIC. Using single-cell RNA-seq data, it maps TFs onto gene regulatory networks and integrates various cell types to infer cell-specific GRNs. There are two fast and efficient GRN inference algorithms, GRNBoost2 and GENIE3, optionally available with pySCENIC. The pipeline has three steps: (1) identification of potential TF targets based on co-expression; (2) TF-motif enrichment analysis to identify the direct targets (regulons); and (3) scoring the activity of regulons (or other gene sets) on single cell types.
    Keywords:  Gene co-expression network; Gene regulatory network; RNA-Seq count data; scRNA-seq
    DOI:  https://doi.org/10.1007/978-1-0716-1534-8_10
  3. Biocell. 2021 ;45(5): 1167-1170
      Single-cell sequencing data has transformed the understanding of biological heterogeneity. While many flavors of single-cell sequencing have been developed, single-cell RNA sequencing (scRNA-seq) is currently the most prolific form in published literature. Bioinformatic analysis of differential biology within the population of cells studied relies on inferences and grouping of cells due to the spotty nature of data within individual cell scRNA-seq gene counts. One biologically relevant variable is readily inferred from scRNA-seq gene count tables regardless of individual gene representation within single cells: aneuploidy. Since hundreds of genes are present on chromosome arms, high-quality inferences of aneuploidy can be made from scRNA-seq datasets. This viewpoint summarizes how utilization of these bioinformatic pipelines can benefit scRNA-seq studies, particularly in oncology wherein aneuploidy is both rampant and a hallmark of the studied disease. Awareness and use of these analytical pipelines will improve each field's ability to understand the studied diseases. Authors are encouraged to attempt these aneuploid analyses when reporting scRNA-seq data, much like copy-number variants are commonly reported in bulk genome sequencing data.
    Keywords:  Aneuploidy; Cancer; Copy-number alterations; scRNA-seq
  4. Nucleic Acids Res. 2021 Jul 09. pii: gkab581. [Epub ahead of print]
      Though single cell RNA sequencing (scRNA-seq) technologies have been well developed, the acquisition of large-scale single cell expression data may still lead to high costs. Single cell expression profile has its inherent sparse properties, which makes it compressible, thus providing opportunities for solutions. Here, by computational simulation as well as experiment of 54 single cells, we propose that expression profiles can be compressed from the dimension of samples by overlapped assigning each cell into plenty of pools. And we prove that expression profiles can be inferred from these pool expression data with overlapped pooling design and compressed sensing strategy. We also show that by combining this approach with plate-based scRNA-seq measurement, it can maintain its superiorities in gene detection sensitivity and individual identity and recover the expression profile with high precision, while saving about half of the library cost. This method can inspire novel conceptions on the measurement, storage or computation improvements for other compressible signals in many biological areas.
    DOI:  https://doi.org/10.1093/nar/gkab581
  5. World J Stem Cells. 2021 Jun 26. 13(6): 542-567
      Aberrant epigenetic alterations play a decisive role in cancer initiation and propagation via the regulation of key tumor suppressor genes and oncogenes or by modulation of essential signaling pathways. Autophagy is a highly regulated mechanism required for the recycling and degradation of surplus and damaged cytoplasmic constituents in a lysosome dependent manner. In cancer, autophagy has a divergent role. For instance, autophagy elicits tumor promoting functions by facilitating metabolic adaption and plasticity in cancer stem cells (CSCs) and cancer cells. Moreover, autophagy exerts pro-survival mechanisms to these cancerous cells by influencing survival, dormancy, immunosurveillance, invasion, metastasis, and resistance to anti-cancer therapies. In addition, recent studies have demonstrated that various tumor suppressor genes and oncogenes involved in autophagy, are tightly regulated via different epigenetic modifications, such as DNA methylation, histone modifications and non-coding RNAs. The impact of epigenetic regulation of autophagy in cancer cells and CSCs is not well-understood. Therefore, uncovering the complex mechanism of epigenetic regulation of autophagy provides an opportunity to improve and discover novel cancer therapeutics. Subsequently, this would aid in improving clinical outcome for cancer patients. In this review, we provide a comprehensive overview of the existing knowledge available on epigenetic regulation of autophagy and its importance in the maintenance and homeostasis of CSCs and cancer cells.
    Keywords:  Autophagy; Cancer cells; Cancer stem cells; DNA methylation; Epigenetics; Histone remodeling; Non-coding RNA
    DOI:  https://doi.org/10.4252/wjsc.v13.i6.542
  6. Bioinformatics. 2021 07 12. 37(Suppl_1): i51-i58
       MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technology has been widely applied to capture the heterogeneity of different cell types within complex tissues. An essential step in scRNA-seq data analysis is the annotation of cell types. Traditional cell-type annotation is mainly clustering the cells first, and then using the aggregated cluster-level expression profiles and the marker genes to label each cluster. Such methods are greatly dependent on the clustering results, which are insufficient for accurate annotation.
    RESULTS: In this article, we propose a semi-supervised learning method for cell-type annotation called CALLR. It combines unsupervised learning represented by the graph Laplacian matrix constructed from all the cells and supervised learning using sparse logistic regression. By alternately updating the cell clusters and annotation labels, high annotation accuracy can be achieved. The model is formulated as an optimization problem, and a computationally efficient algorithm is developed to solve it. Experiments on 10 real datasets show that CALLR outperforms the compared (semi-)supervised learning methods, and the popular clustering methods.
    AVAILABILITY AND IMPLEMENTATION: The implementation of CALLR is available at https://github.com/MathSZhang/CALLR.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab286
  7. Bioinformatics. 2021 07 12. 37(Suppl_1): i358-i366
       MOTIVATION: Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data.
    RESULTS: Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data.
    AVAILABILITY AND IMPLEMENTATION: The R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab273
  8. Lab Invest. 2021 Jul 09.
      Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated-a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently.
    DOI:  https://doi.org/10.1038/s41374-021-00639-w
  9. Nucleic Acids Res. 2021 Jul 09. pii: gkab457. [Epub ahead of print]
      Dynamic regulation of gene expression is often governed by progression through transient cell states. Bulk RNA-seq analysis can only detect average change in expression levels and is unable to identify this dynamics. Single cell RNA-seq presents an unprecedented opportunity that helps in placing the cells on a hypothetical time trajectory that reflects gradual transition of their transcriptomes. This continuum trajectory or 'pseudotime', may reveal the developmental pathway and provide us with information on dynamic transcriptomic changes and other biological processes. Existing approaches to build pseudotime heavily depend on reducing huge dimension to extremely low dimensional subspaces and may lead to loss of information. We propose PseudoGA, a genetic algorithm based approach to order cells assuming that gene expressions vary according to a smooth curve along the pseudotime trajectory. We observe superior accuracy of our method in simulated as well as benchmarking real datasets. Generality of the assumption behind PseudoGA and no dependence on dimensionality reduction technique make it a robust choice for pseudotime estimation from single cell transcriptome data. PseudoGA is also time efficient when applied to a large single cell RNA-seq data and adaptable to parallel computing. R code for PseudoGA is freely available at https://github.com/indranillab/pseudoga.
    DOI:  https://doi.org/10.1093/nar/gkab457
  10. Bioinformatics. 2021 07 12. 37(Suppl_1): i214-i221
       MOTIVATION: While single-cell DNA sequencing (scDNA-seq) has enabled the study of intratumor heterogeneity at an unprecedented resolution, current technologies are error-prone and often result in doublets where two or more cells are mistaken for a single cell. Not only do doublets confound downstream analyses, but the increase in doublet rate is also a major bottleneck preventing higher throughput with current single-cell technologies. Although doublet detection and removal are standard practice in scRNA-seq data analysis, options for scDNA-seq data are limited. Current methods attempt to detect doublets while also performing complex downstream analyses tasks, leading to decreased efficiency and/or performance.
    RESULTS: We present doubletD, the first standalone method for detecting doublets in scDNA-seq data. Underlying our method is a simple maximum likelihood approach with a closed-form solution. We demonstrate the performance of doubletD on simulated data as well as real datasets, outperforming current methods for downstream analysis of scDNA-seq data that jointly infer doublets as well as standalone approaches for doublet detection in scRNA-seq data. Incorporating doubletD in scDNA-seq analysis pipelines will reduce complexity and lead to more accurate results.
    AVAILABILITY AND IMPLEMENTATION: https://github.com/elkebir-group/doubletD.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab266
  11. Brief Bioinform. 2021 Jul 13. pii: bbab267. [Epub ahead of print]
      Epigenetic aberrations have played a significant role in affecting the pathophysiological state of colorectal cancer, and global DNA hypomethylation mainly occurs in partial methylation domains (PMDs). However, the distribution of PMDs in individual cells and the heterogeneity between cells are still unclear. In this study, the DNA methylation profiles of colorectal cancer detected by WGBS and scBS-seq were used to depict PMDs in individual cells for the first time. We found that more than half of the entire genome is covered by PMDs. Three subclasses of PMDS have distinct characteristics, and Gain-PMDs cover a higher proportion of protein coding genes. Gain-PMDs have extensive epigenetic heterogeneity between different cells of the same tumor, and the DNA methylation in cells is affected by the tumor microenvironment. In addition, abnormally elevated promoter methylation in Gain-PMDs may further promote the growth, proliferation and metastasis of tumor cells through silent transcription. The PMDs detected in this study have the potential as epigenetic biomarkers and provide a new insight for colorectal cancer research based on single-cell methylation data.
    Keywords:  heterogeneity; partial methylation domains; single cell
    DOI:  https://doi.org/10.1093/bib/bbab267
  12. Methods Mol Biol. 2021 ;2328 153-170
      Single-cell RNAseq is an emerging technology that allows the quantification of gene expression in individual cells. In plants, single-cell sequencing technology has been applied to generate root cell expression maps under many experimental conditions. DAP-seq and ATAC-seq have also been used to generate genome-scale maps of protein-DNA interactions and open chromatin regions in plants. In this protocol, we describe a multistep computational pipeline for the integration of single-cell RNAseq data with DAP-seq and ATAC-seq data to predict regulatory networks and key regulatory genes. Our approach utilizes machine learning methods including feature selection and stability selection to identify candidate regulatory genes. The network generated by this pipeline can be used to provide a putative annotation of gene regulatory modules and to identify candidate transcription factors that could play a key role in specific cell types.
    Keywords:  ATAC-seq; DAP-seq; Machine learning; Single-cell RNAseq
    DOI:  https://doi.org/10.1007/978-1-0716-1534-8_9
  13. Front Genet. 2021 ;12 689406
      Pigs are a valuable human biomedical model and an important protein source supporting global food security. The transcriptomes of peripheral blood immune cells in pigs were defined at the bulk cell-type and single cell levels. First, eight cell types were isolated in bulk from peripheral blood mononuclear cells (PBMCs) by cell sorting, representing Myeloid, NK cells and specific populations of T and B-cells. Transcriptomes for each bulk population of cells were generated by RNA-seq with 10,974 expressed genes detected. Pairwise comparisons between cell types revealed specific expression, while enrichment analysis identified 1,885 to 3,591 significantly enriched genes across all 8 cell types. Gene Ontology analysis for the top 25% of significantly enriched genes (SEG) showed high enrichment of biological processes related to the nature of each cell type. Comparison of gene expression indicated highly significant correlations between pig cells and corresponding human PBMC bulk RNA-seq data available in Haemopedia. Second, higher resolution of distinct cell populations was obtained by single-cell RNA-sequencing (scRNA-seq) of PBMC. Seven PBMC samples were partitioned and sequenced that produced 28,810 single cell transcriptomes distributed across 36 clusters and classified into 13 general cell types including plasmacytoid dendritic cells (DC), conventional DCs, monocytes, B-cell, conventional CD4 and CD8 αβ T-cells, NK cells, and γδ T-cells. Signature gene sets from the human Haemopedia data were assessed for relative enrichment in genes expressed in pig cells and integration of pig scRNA-seq with a public human scRNA-seq dataset provided further validation for similarity between human and pig data. The sorted porcine bulk RNAseq dataset informed classification of scRNA-seq PBMC populations; specifically, an integration of the datasets showed that the pig bulk RNAseq data helped define the CD4CD8 double-positive T-cell populations in the scRNA-seq data. Overall, the data provides deep and well-validated transcriptomic data from sorted PBMC populations and the first single-cell transcriptomic data for porcine PBMCs. This resource will be invaluable for annotation of pig genes controlling immunogenetic traits as part of the porcine Functional Annotation of Animal Genomes (FAANG) project, as well as further study of, and development of new reagents for, porcine immunology.
    Keywords:  FAANG; bulkRNA-seq; immune cells; pig; single-cell RNA-seq; transcriptome
    DOI:  https://doi.org/10.3389/fgene.2021.689406
  14. Pathol Oncol Res. 2021 ;27 604228
      Teratoma is a type of germ cell tumor that originates from totipotential germ cells that are present in gonads, which can differentiate into any of the cell types found in adult tissues. Ovarian teratomas are usually mature cystic teratomas (OMCTs, also known as dermoid cysts). Chromosome studies in OMCTs show that the chromosomes are uniformly homozygous with karyotype of 46, XX, indicating that they may be parthenogenic tumors that arise from a single ovum after thefirst meiotic division. However, the tissues in OMCTs have been known to be morphologically and immunophenotypically identical to the orthotopic tissues. Currently, expression profiles of tissue components in OMCTs are not known. To identify whether OMCT tissues are expressionally similar to or different from the orthotopic tissues, we adopted single-cell RNA-sequencing (scRNA-seq), and analyzed transcriptomes of individual cells in heterogenous tissues of two OMCTs. We found that transcriptome profiles of the OMCTs at single cell level were not significantly different from those of normal cells in orthotopic locations. The present data suggest that parthenogeneticlly altered OMCTs may not alter expression profiles of inrivirual tissue components in OMCTs.
    Keywords:  expression profile; ovarian tumor; single cell; teratoma; transcriptome
    DOI:  https://doi.org/10.3389/pore.2021.604228
  15. Inflamm Regen. 2021 Jul 16. 41(1): 22
      Even within a single type of cancer, cells of various types exist and play interrelated roles. Each of the individual cells resides in a distinct microenvironment and behaves differently. Such heterogeneity is the most cumbersome nature of cancers, which is occasionally uncountable when effective prevention or total elimination of cancers is attempted. To understand the heterogeneous nature of each cell, the use of conventional methods for the analysis of "bulk" cells is insufficient. Although some methods are high-throughput and compressive regarding the genes being detected, the obtained data would be from the cell mass, and the average of a large number of the component cells would no longer be measured. Single-cell analysis, which has developed rapidly in recent years, is causing a drastic change. Genome, transcriptome, and epigenome analyses at single-cell resolution currently target cancer cells, cancer-associated fibroblasts, endothelial cells of vessels, and circulating and infiltrating immune cells. In fact, surprisingly diverse features of clonal evolution of cancer cells, during the development of cancer or acquisition of drug resistance, accompanied by corresponding gene expression changes in the circumstantial stromal cells, appeared in recent single-cell analyses. Based on the obtained novel insights, better optimal drug selection and new drug administration sequences were started. Even a remaining concern of the single cell analyses is being addressed. Until very recently, it was impossible to obtain positional information of cells in cancer via single-cell analysis because such information is lost during preparation of single-cell suspensions. A new method, collectively called spatial transcriptome (ST) analysis, has been developed and rapidly applied to various clinical specimens. In this review, we first outline the recent achievements of single-cell cancer analysis in analyzing the molecular basis underlying the acquisition of drug resistance, particularly focusing on the latest anti-epidermal growth factor receptor tyrosine kinase inhibitor, osimertinib. Further, we review the currently available ST analysis methods and introduce our recent attempts regarding the respective topics.
    Keywords:  Anticancer drug resistance; Single-cell RNA-seq; Single-cell multiome analysis; Spatial transcriptome analysis
    DOI:  https://doi.org/10.1186/s41232-021-00170-x
  16. Nat Rev Cardiol. 2021 Jul 15.
      Inflammation is intimately involved at all stages of atherosclerosis and remains a substantial residual cardiovascular risk factor in optimally treated patients. The proof of concept that targeting inflammation reduces cardiovascular events in patients with a history of myocardial infarction has highlighted the urgent need to identify new immunotherapies to treat patients with atherosclerotic cardiovascular disease. Importantly, emerging data from new clinical trials show that successful immunotherapies for atherosclerosis need to be tailored to the specific immune alterations in distinct groups of patients. In this Review, we discuss how single-cell technologies - such as single-cell mass cytometry, single-cell RNA sequencing and cellular indexing of transcriptomes and epitopes by sequencing - are ideal for mapping the cellular and molecular composition of human atherosclerotic plaques and how these data can aid in the discovery of new precise immunotherapies. We also argue that single-cell data from studies in humans need to be rigorously validated in relevant experimental models, including rapidly emerging single-cell CRISPR screening technologies and mouse models of atherosclerosis. Finally, we discuss the importance of implementing single-cell immune monitoring tools in early phases of drug development to aid in the precise selection of the target patient population for data-driven translation into randomized clinical trials and the successful translation of new immunotherapies into the clinic.
    DOI:  https://doi.org/10.1038/s41569-021-00589-2
  17. Genomics Proteomics Bioinformatics. 2021 Jul 09. pii: S1672-0229(21)00145-5. [Epub ahead of print]
      A system-level understanding of the regulation and coordination mechanisms of gene expression is essential to studying the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell-type-specific manner. Here we propose the scLink method, which uses statistical network modeling to understand the co-expression relationships among genes and construct sparse gene co-expression networks from single-cell gene expression data. We use both simulation and real data studies to demonstrate the advantages of scLink and its ability to improve single-cell gene network analysis. The scLink R package is available at https://github.com/Vivianstats/scLink.
    Keywords:  Gene co-expression networks; Network modeling; Robust correlation; Single cell RNA sequencing
    DOI:  https://doi.org/10.1016/j.gpb.2020.11.006
  18. Bioinformatics. 2021 07 12. 37(Suppl_1): i317-i326
       MOTIVATION: Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) provides new opportunities to dissect epigenomic heterogeneity and elucidate transcriptional regulatory mechanisms. However, computational modeling of scATAC-seq data is challenging due to its high dimension, extreme sparsity, complex dependencies and high sensitivity to confounding factors from various sources.
    RESULTS: Here, we propose a new deep generative model framework, named SAILER, for analyzing scATAC-seq data. SAILER aims to learn a low-dimensional nonlinear latent representation of each cell that defines its intrinsic chromatin state, invariant to extrinsic confounding factors like read depth and batch effects. SAILER adopts the conventional encoder-decoder framework to learn the latent representation but imposes additional constraints to ensure the independence of the learned representations from the confounding factors. Experimental results on both simulated and real scATAC-seq datasets demonstrate that SAILER learns better and biologically more meaningful representations of cells than other methods. Its noise-free cell embeddings bring in significant benefits in downstream analyses: clustering and imputation based on SAILER result in 6.9% and 18.5% improvements over existing methods, respectively. Moreover, because no matrix factorization is involved, SAILER can easily scale to process millions of cells. We implemented SAILER into a software package, freely available to all for large-scale scATAC-seq data analysis.
    AVAILABILITY AND IMPLEMENTATION: The software is publicly available at https://github.com/uci-cbcl/SAILER.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab303
  19. Nat Protoc. 2021 Jul 09.
      The integration of DNA methylation and transcriptional state within single cells is of broad interest. Several single-cell dual- and multi-omics approaches have been reported that enable further investigation into cellular heterogeneity, including the discovery and in-depth study of rare cell populations. Such analyses will continue to provide important mechanistic insights into the regulatory consequences of epigenetic modifications. We recently reported a new method for profiling the DNA methylome and transcriptome from the same single cells in a cancer research study. Here, we present details of the protocol and provide guidance on its utility. Our Smart-RRBS (reduced representation bisulfite sequencing) protocol combines Smart-seq2 and RRBS and entails physically separating mRNA from the genomic DNA. It generates paired epigenetic promoter and RNA-expression measurements for ~24% of protein-coding genes in a typical single cell. It also works for micro-dissected tissue samples comprising hundreds of cells. The protocol, excluding flow sorting of cells and sequencing, takes ~3 d to process up to 192 samples manually. It requires basic molecular biology expertise and laboratory equipment, including a PCR workstation with UV sterilization, a DNA fluorometer and a microfluidic electrophoresis system.
    DOI:  https://doi.org/10.1038/s41596-021-00571-9
  20. Bioinformatics. 2021 07 12. 37(Suppl_1): i299-i307
       MOTIVATION: Single-cell RNA sequencing (scRNA-seq) techniques have revolutionized the investigation of transcriptomic landscape in individual cells. Recent advancements in spatial transcriptomic technologies further enable gene expression profiling and spatial organization mapping of cells simultaneously. Among the technologies, imaging-based methods can offer higher spatial resolutions, while they are limited by either the small number of genes imaged or the low gene detection sensitivity. Although several methods have been proposed for enhancing spatially resolved transcriptomics, inadequate accuracy of gene expression prediction and insufficient ability of cell-population identification still impede the applications of these methods.
    RESULTS: We propose stPlus, a reference-based method that leverages information in scRNA-seq data to enhance spatial transcriptomics. Based on an auto-encoder with a carefully tailored loss function, stPlus performs joint embedding and predicts spatial gene expression via a weighted k-nearest-neighbor. stPlus outperforms baseline methods with higher gene-wise and cell-wise Spearman correlation coefficients. We also introduce a clustering-based approach to assess the enhancement performance systematically. Using the data enhanced by stPlus, cell populations can be better identified than using the measured data. The predicted expression of genes unique to scRNA-seq data can also well characterize spatial cell heterogeneity. Besides, stPlus is robust and scalable to datasets of diverse gene detection sensitivity levels, sample sizes and number of spatially measured genes. We anticipate stPlus will facilitate the analysis of spatial transcriptomics.
    AVAILABILITY AND IMPLEMENTATION: stPlus with detailed documents is freely accessible at http://health.tsinghua.edu.cn/software/stPlus/ and the source code is openly available on https://github.com/xy-chen16/stPlus.
    DOI:  https://doi.org/10.1093/bioinformatics/btab298
  21. Biotechniques. 2021 Jul 16.
      Single-cell RNA sequencing (scRNA-seq) of the bronchial epithelium enables examination of cellular subtypes and their responses to viral infections. Here, an optimized method for the isolation of virally infected primary bronchial epithelial cells using a commercially available microfluidic device is presented. Using this method single cells can be rapidly isolated with minimal equipment available in most laboratories. Isolation can be carried out inside biological safety cabinets, permitting the use of virally infected cells. Both cell-line and primary cells isolated using the device retained sufficient RNA integrity for the generation of short-read sequencing-compatible cDNA libraries to facilitate scRNA-seq.
    Keywords:  bronchial epithelium; microfluidic device; scRNA-seq; single cell isolation; virus
    DOI:  https://doi.org/10.2144/btn-2021-0020
  22. Nat Commun. 2021 07 14. 12(1): 4316
      Molecular single cell analyses provide insights into physiological and pathological processes. Here, in a stepwise approach, we first evaluate 19 protocols for single cell small RNA sequencing on MCF7 cells spiked with 1 pg of 1,006 miRNAs. Second, we analyze MCF7 single cell equivalents of the eight best protocols. Third, we sequence single cells from eight different cell lines and 67 circulating tumor cells (CTCs) from seven SCLC patients. Altogether, we analyze 244 different samples. We observe high reproducibility within protocols and reads covered a broad spectrum of RNAs. For the 67 CTCs, we detect a median of 68 miRNAs, with 10 miRNAs being expressed in 90% of tested cells. Enrichment analysis suggested the lung as the most likely organ of origin and enrichment of cancer-related categories. Even the identification of non-annotated candidate miRNAs was feasible, underlining the potential of single cell small RNA sequencing.
    DOI:  https://doi.org/10.1038/s41467-021-24611-w
  23. EMBO Mol Med. 2021 Jul 13. e13189
      Advances in sequencing technology have enabled the genomic and transcriptomic characterization of human malignancies with unprecedented detail. However, this wealth of information has been slow to translate into clinically meaningful outcomes. Different models to study human cancers have been established and extensively characterized. Using these models, functional genomic screens and pre-clinical drug screening platforms have identified genetic dependencies that can be exploited with drug therapy. These genetic dependencies can also be used as biomarkers to predict response to treatment. For many cancers, the identification of such biomarkers remains elusive. In this review, we discuss the development and characterization of models used to study human cancers, RNA interference and CRISPR screens to identify genetic dependencies, large-scale pharmacogenomics studies and drug screening approaches to improve pre-clinical drug screening and biomarker discovery.
    Keywords:  biomarker discovery; cancer models; drug screening; pharmacogenomics; single-cell sequencing
    DOI:  https://doi.org/10.15252/emmm.202013189
  24. Cancer Res. 2021 Jul 09. pii: canres.2811.2020. [Epub ahead of print]
      Tumor heterogeneity underlies resistance to tyrosine kinase inhibitors (TKI) in lung cancers harboring epidermal growth factor receptor (EGFR) mutations. Previous evidence suggested that subsets of preexisting resistant cells are selected by EGFR-TKI treatment, or alternatively, that diverse acquired resistance mechanisms emerge from drug-tolerant persister (DTP) cells. Many studies have used bulk tumor specimens or subcloned resistant cell lines to identify resistance mechanism. However, intratumoral heterogeneity can result in divergent responses to therapies, requiring additional approaches to reveal the complete spectrum of resistance mechanisms. Using EGFR-TKI-resistant cell models and clinical specimens, we performed single-cell RNA-seq and single-cell ATAC-seq analyses to define the transcriptional and epigenetic landscape of parental cells, DTPs, and tumor cells in a fully resistant state. In addition to AURKA, VIM, and AXL, which are all known to induce EGFR-TKI resistance, CD74 was identified as a novel gene that plays a critical role in the drug-tolerant state. In vitro and in vivo experiments demonstrated that CD74 upregulation confers resistance to the EGFR-TKI osimertinib and blocks apoptosis, enabling tumor regrowth. Overall, this study provides new insight into the mechanisms underlying resistance to EGFR-TKIs.
    DOI:  https://doi.org/10.1158/0008-5472.CAN-20-2811
  25. Bioinformatics. 2021 Jul 13. pii: btab499. [Epub ahead of print]
       MOTIVATION: The emergence of single-cell RNA sequencing (scRNA-seq) has led to an explosion in novel methods to study biological variation among individual cells, and to classify cells into functional and biologically meaningful categories.
    RESULTS: Here, we present a new cell type projection tool, HieRFIT (Hierarchical Random Forest for Information Transfer), based on hierarchical random forests. HieRFIT uses a priori information about cell type relationships to improve classification accuracy, taking as input a hierarchical tree structure representing the class relationships, along with the reference data. We use an ensemble approach combining multiple random forest models, organized in a hierarchical decision tree structure. We show that our hierarchical classification approach improves accuracy and reduces incorrect predictions especially for inter-dataset tasks which reflect real life applications. We use a scoring scheme that adjusts probability distributions for candidate class labels and resolves uncertainties while avoiding the assignment of cells to incorrect types by labeling cells at internal nodes of the hierarchy when necessary.
    AVAILABILITY: HieRFIT is implemented as an R package, and it is available at (https://github.com/yasinkaymaz/HieRFIT/releases/tag/v1.0.0). t.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab499
  26. Nat Commun. 2021 07 09. 12(1): 4208
      The transcriptional regulators underlying induction and differentiation of dense connective tissues such as tendon and related fibrocartilaginous tissues (meniscus and annulus fibrosus) remain largely unknown. Using an iterative approach informed by developmental cues and single cell RNA sequencing (scRNA-seq), we establish directed differentiation models to generate tendon and fibrocartilage cells from mouse embryonic stem cells (mESCs) by activation of TGFβ and hedgehog pathways, achieving 90% induction efficiency. Transcriptional signatures of the mESC-derived cells recapitulate embryonic tendon and fibrocartilage signatures from the mouse tail. scRNA-seq further identify retinoic acid signaling as a critical regulator of cell fate switch between TGFβ-induced tendon and fibrocartilage lineages. Trajectory analysis by RNA sequencing define transcriptional modules underlying tendon and fibrocartilage fate induction and identify molecules associated with lineage-specific differentiation. Finally, we successfully generate 3-dimensional engineered tissues using these differentiation protocols and show activation of mechanotransduction markers with dynamic tensile loading. These findings provide a serum-free approach to generate tendon and fibrocartilage cells and tissues at high efficiency for modeling development and disease.
    DOI:  https://doi.org/10.1038/s41467-021-24535-5
  27. Sci Data. 2021 Jul 15. 8(1): 177
      Bovine mammary function at molecular level is often studied using mammary tissue or primary bovine mammary epithelial cells (pbMECs). However, bulk tissue and primary cells are heterogeneous with respect to cell populations, adding further transcriptional variation in addition to genetic background. Thus, understanding of the variation in gene expression profiles of cell populations and their effect on function are limited. To investigate the mononuclear cell composition in bovine milk, we analyzed a single-cell suspension from a milk sample. Additionally, we harvested cultured pbMECs to characterize gene expression in a homogeneous cell population. Using the Drop-seq technology, we generated single-cell RNA datasets of somatic milk cells and pbMECs. The final datasets after quality control filtering contained 7,119 and 10,549 cells, respectively. The pbMECs formed 14 indefinite clusters displaying intrapopulation heterogeneity, whereas the milk cells formed 14 more distinct clusters. Our datasets constitute a molecular cell atlas that provides a basis for future studies of milk cell composition and gene expression, and could serve as reference datasets for milk cell analysis.
    DOI:  https://doi.org/10.1038/s41597-021-00972-1
  28. Ann Appl Stat. 2021 Jun;15(2): 925-951
      There are distinguishing features or "hallmarks" of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive genetic heterogeneity as evidenced by single-cell and bulk DNA sequencing data. The goal of this work is to jointly infer the underlying genotypes of tumor subpopulations and the distribution of those subpopulations in individual tumors by integrating single-cell and bulk sequencing data. Understanding the genetic composition of the tumor at the time of treatment is important in the personalized design of targeted therapeutic combinations and monitoring for possible recurrence after treatment. We propose a hierarchical Dirichlet process mixture model that incorporates the correlation structure induced by a structured sampling arrangement and we show that this model improves the quality of inference. We develop a representation of the hierarchical Dirichlet process prior as a Gamma-Poisson hierarchy and we use this representation to derive a fast Gibbs sampling inference algorithm using the augment-and-marginalize method. Experiments with simulation data show that our model outperforms standard numerical and statistical methods for decomposing admixed count data. Analyses of real acute lymphoblastic leukemia cancer sequencing dataset shows that our model improves upon state-of-the-art bioinformatic methods. An interpretation of the results of our model on this real dataset reveals co-mutated loci across samples.
    Keywords:  Bayesian nonparametric; DNA sequencing; Dirichlet process mixture; augment-and-marginalize; tumor heterogeneity
    DOI:  https://doi.org/10.1214/20-aoas1434
  29. Am J Cancer Res. 2021 ;11(6): 2893-2910
      Mitochondria play leading roles in initiation and progression of colorectal cancer (CRC). Proteogenomic analyses of mitochondria of CRC tumor cells would likely enhance our understanding of CRC pathogenesis and reveal new independent prognostic factors and treatment targets. However, comprehensive investigations focused on mitochondria of CRC patients are lacking. Here, we investigated global profiles of structural variants, DNA methylation, chromatin accessibility, transcriptome, proteome, and phosphoproteome on human CRC. Proteomic investigations uncovered greatly diminished mitochondrial proteome size in CRC relative to that found in adjacent healthy tissues. Integrated with analysis of RNA-Seq datasets obtained from the public database containing mRNA data of 538 CRC patients, the proteomic analysis indicated that proteins encoded by 45.5% of identified prognostic CRC genes were located within mitochondria, highlighting the association between altered mitochondrial function and CRC. Subsequently, we compared structural variants, DNA methylation, and chromatin accessibility of differentially expressed genes and found that chromatin accessibility was an important factor underlying mitochondrial gene expression. Furthermore, phosphoproteomic profiling demonstrated decreased phosphorylation of most mitochondria-related kinases within CRC versus adjacent healthy tissues, while also highlighting MKK3/p38 as an essential mitochondrial regulatory pathway. Meanwhile, systems-based analyses revealed identities of key kinases, transcriptional factors, and their interconnections. This research uncovered a close relationship between mitochondrial dysfunction and poor CRC prognosis, improve our understanding of molecular mechanism underlying mitochondrial linked to human CRC, and facilitate identifies of clinically relevant CRC prognostic factors and drug targets.
    Keywords:  Mitochondria; colorectal cancer; drug targets; multi-omics; prognosis
  30. Aging (Albany NY). 2021 Jul 11. undefined(undefined):
      
    Keywords:  DNA methylation; cytochrome P450; histones; liver; xenobiotic metabolism
    DOI:  https://doi.org/10.18632/aging.203312
  31. RNA. 2021 Jul 15. pii: rna.078872.121. [Epub ahead of print]
      XRN1 is a highly conserved exoribonuclease which degrades uncapped RNAs in a 5'-3' direction. Degradation of RNAs by XRN1 is important in many cellular and developmental processes and is relevant to human disease. Studies in D. melanogaster demonstrate that XRN1 can target specific RNAs, which have important consequences for developmental pathways. Osteosarcoma is a malignancy of the bone and accounts for 2% of all paediatric cancers worldwide. 5 -year survival of patients has remained static since the 1970s and therefore furthering our molecular understanding of this disease is crucial. Previous work has shown a downregulation of XRN1 in osteosarcoma cells, however the transcripts regulated by XRN1 which might promote osteosarcoma remain elusive. Here, we confirm reduced levels of XRN1 in osteosarcoma cell lines and patient samples and identify XRN1-sensitive transcripts in human osteosarcoma cells. Using RNA-seq in XRN1-knockdown SAOS-2 cells, we show that 1178 genes are differentially regulated. Using a novel bioinformatic approach, we demonstrate that 134 transcripts show characteristics of direct post-transcriptional regulation by XRN1. Long non-coding RNAs (lncRNAs) are enriched in this group suggesting that XRN1 normally plays an important role in controlling lncRNA expression in these cells. Among potential lncRNAs targeted by XRN1 is HOTAIR, which is known to be upregulated in osteosarcoma and contribute to disease progression. We have also identified G-rich and GU motifs in post-transcriptionally regulated transcripts which appear to sensitise them to XRN1 degradation. Our results therefore provide significant insights into the specificity of XRN1 in human cells which is relevant to disease.
    Keywords:  Ewing Sarcoma; RNA degradation; RNA-seq; XRN1; lncRNAs
    DOI:  https://doi.org/10.1261/rna.078872.121
  32. Ann Transl Med. 2021 May;9(9): 810
      Deregulation of many homeobox genes has been observed in various cancers and has caused functional implications in the tumor progression. In this review, we will focus on the roles of the human muscle segment homeobox (MSX) transcription factor family in the process of tumorigenesis. The MSX transcription factors, through complex downstream regulation mechanisms, are promoters or inhibitors of diverse cancers by participating in cell proliferation, cell invasion, cell metastasis, cell apoptosis, cell differentiation, drug resistance of tumors, maintenance of tumor stemness, and tumor angiogenesis. Moreover, their upstream regulatory mechanisms in cancers may include: gene mutation and chromosome aberration; DNA methylation and chromatin modification; regulation by non-coding RNAs; regulation by other transcription factors and post-translational modification. These mechanisms may provide a better understanding of why MSX transcription factors are abnormally expressed in tumors. Notably, intermolecular interactions and post-translational modification can regulate the transcriptional activity of MSX transcription factors. It is also crucial to know what affects the transcriptional activity of MSX transcription factors in tumors for possible interventions in them in the future. This systematic summary of the regulatory patterns of the MSX transcription factor family may help to further understand the mechanisms involved in transcriptional regulation and also provide new therapeutic approaches for tumor progression.
    Keywords:  Muscle segment homeobox 1 (MSX1); cancer progression; muscle segment homeobox 2 (MSX2); transcription factor; transcriptional regulation
    DOI:  https://doi.org/10.21037/atm-21-220
  33. Front Cell Dev Biol. 2021 ;9 708038
      Src is an important oncogene that plays key roles in multiple signal transduction pathways. Csk-homologous kinase (CHK) is a kinase whose molecular roles are largely uncharacterized. We previously reported expression of CHK in normal human colon cells, and decreased levels of CHK protein in colon cancer cells leads to the activation of Src (Zhu et al., 2008). However, how CHK protein expression is downregulated in colon cancer cells has been unknown. We report herein that CHK mRNA was decreased in colon cancer cells as compared to normal colon cells, and similarly in human tissues of normal colon and colon cancer. Increased levels of DNA methylation at promotor CpG islands of CHK gene were observed in colon cancer cells and human colon cancer tissues as compared to their normal healthy counterparts. Increased levels of DNA methyltransferases (DNMTs) were also observed in colon cancer cells and tissues. DNA methylation and decreased expression of CHK mRNA were inhibited by DNMT inhibitor 5-Aza-CdR. Cell proliferation, colony growth, wound healing, and Matrigel invasion were all decreased in the presence of 5-Aza-CdR. These results suggest that increased levels of DNA methylation, possibly induced by enhanced levels of DNMT, leads to decreased expression of CHK mRNA and CHK protein, promoting increased oncogenic properties in colon cancer cells.
    Keywords:  CHK; DNA methylation; FHC; colon cancer; drug resistance
    DOI:  https://doi.org/10.3389/fcell.2021.708038
  34. Front Oncol. 2021 ;11 663961
      Cutaneous T-cell lymphomas (CTCLs) comprise a group of heterogeneous diseases involving malignant T cells. The pathogenesis and etiology of CTCL are still unclear, although a large number of genetic and epidemiological studies on CTCL have been conducted. Most CTCLs have an indolent course, making early diagnosis difficult. Once large-cell transformation occurs, CTCL progresses to more aggressive types, resulting in an overall survival of less than five years. Epigenetic drugs, which have shown certain curative effects, have been selected as third-line drugs in patients with relapsing and refractory CTCL. Many studies have also identified epigenetic biomarkers from tissues and peripheral blood of patients with CTCL and suggested that epigenetic changes play a role in malignant transformation and histone deacetylase inhibitor (HDACi) resistance in CTCL. Single-cell sequencing has been applied in CTCL studies, revealing heterogeneity in CTCL malignant T cells. The mechanisms of HDACi resistance have also been described, further facilitating the discovery of novel HDACi targets. Despite the heterogeneity of CTCL disease and its obscure pathogenesis, more epigenetic abnormalities have been gradually discovered recently, which not only enables us to understand CTCL disease further but also improves our understanding of the specific role of epigenetics in the pathogenesis and treatment. In this review, we discuss the recent discoveries concerning the pathological roles of epigenetics and epigenetic therapy in CTCL.
    Keywords:  HDACi resistance; cutaneous T-cell lymphoma; epigenetic biomarkers; epigenetic therapy; epigenetics; histone deacetylase inhibitor; histone modification
    DOI:  https://doi.org/10.3389/fonc.2021.663961
  35. Nat Commun. 2021 Jul 16. 12(1): 4369
      There is a strong demand for methods that can efficiently reconstruct valid super-resolution intact genome 3D structures from sparse and noise single-cell Hi-C data. Here, we develop Single-Cell Chromosome Conformation Calculator (Si-C) within the Bayesian theory framework and apply this approach to reconstruct intact genome 3D structures from single-cell Hi-C data of eight G1-phase haploid mouse ES cells. The inferred 100-kb and 10-kb structures consistently reproduce the known conserved features of chromatin organization revealed by independent imaging experiments. The analysis of the 10-kb resolution 3D structures reveals cell-to-cell varying domain structures in individual cells and hyperfine structures in domains, such as loops. An average of 0.2 contact reads per divided bin is sufficient for Si-C to obtain reliable structures. The valid super-resolution structures constructed by Si-C demonstrate the potential for visualizing and investigating interactions between all chromatin loci at the genome scale in individual cells.
    DOI:  https://doi.org/10.1038/s41467-021-24662-z
  36. Epigenetics. 2021 Jul 13. 1-8
      Saliva and buccal samples are popular for epigenome wide association studies (EWAS) due to their ease of collection compared and their ability to sample a different cell lineage compared to blood. As these samples contain a mix of white blood cells and buccal epithelial cells that can vary within a population, this cellular heterogeneity may confound EWAS. This has been addressed by including cellular heterogeneity obtained through cytology at the time of collection or by using cellular deconvolution algorithms built on epigenetic data from specific cell types. However, to our knowledge, the two methods have not yet been compared. Here we show that the two methods are highly correlated in saliva and buccal samples (R = 0.84, P < 0.0001) by comparing data generated from cytological staining and Infinium MethylationEPIC arrays and the EpiDISH deconvolution algorithm from buccal and saliva samples collected from twenty adults. In addition, by using an expanded dataset from both sample types, we confirmed our previous finding that age has strong, non-linear negative correlation with epithelial cell proportion in both sample types. However, children and adults showed a large within-population variation in cellular heterogeneity. Our results validate the use of the EpiDISH algorithm in estimating the effect of cellular heterogeneity in EWAS and showed DNA methylation generally underestimates the epithelial cell content obtained from cytology.
    Keywords:  Buccal; DNA methylation; EWAS; cell-type heterogeneity; cytology; epithelial cell; saliva
    DOI:  https://doi.org/10.1080/15592294.2021.1950977
  37. Methods Mol Biol. 2021 ;2328 25-46
      Chromatin accessibility is directly linked with transcription in eukaryotes. Accessible regions associated with regulatory proteins are highly sensitive to DNase I digestion and are termed DNase I hypersensitive sites (DHSs). DHSs can be identified by DNase I digestion, followed by high-throughput DNA sequencing (DNase-seq). The single-base-pair resolution digestion patterns from DNase-seq allows identifying transcription factor (TF) footprints of local DNA protection that predict TF-DNA binding. The identification of differential footprinting between two conditions allows mapping relevant TF regulatory interactions. Here, we provide step-by-step instructions to build gene regulatory networks from DNase-seq data. Our pipeline includes steps for DHSs calling, identification of differential TF footprints between treatment and control conditions, and construction of gene regulatory networks. Even though the data we used in this example was obtained from Arabidopsis thaliana, the workflow developed in this guide can be adapted to work with DNase-seq data from any organism with a sequenced genome.
    Keywords:  Chromatin; DNase-seq; Gene Regulatory Networks; Genomic Footprinting; Transcription
    DOI:  https://doi.org/10.1007/978-1-0716-1534-8_3
  38. Development. 2021 Jul 16. pii: dev.199711. [Epub ahead of print]
      The spinal cord receives input from peripheral sensory neurons and controls motor output by regulating muscle innervating motor neurons. These functions are carried out by neural circuits comprising molecularly distinct neuronal subtypes generated in a characteristic spatial-temporal arrangement from progenitors in the embryonic neural tube. To gain insight into the diversity and complexity of cells in the developing human neural tube we used single cell mRNA sequencing to profile cervical and thoracic regions in four human embryos of Carnegie Stages (CS) CS12, CS14, CS17 and CS19 from Gestational Weeks 4-7. Analysis of progenitor and neuronal populations from the neural tube and dorsal root ganglia identified dozens of distinct cell types and facilitated the reconstruction of the differentiation pathways of specific neuronal subtypes. Comparison with mouse revealed the overall similarity of mammalian neural tube development while highlighting human specific features. These data provide a catalogue of gene expression and cell type identity in the human neural tube that will support future studies of sensory and motor control systems. The data can be explored at https://shiny.crick.ac.uk/scviewer/neuraltube/.
    Keywords:  Developmental patterning; Human; Neuronal subtype identity; Single cell transcriptome; Spinal cord
    DOI:  https://doi.org/10.1242/dev.199711
  39. Osteoarthritis Cartilage. 2021 Jul 06. pii: S1063-4584(21)00831-1. [Epub ahead of print]
       OBJECTIVE: Nucleus pulposus (NP) plays a central role in disc degeneration pathogenesis, however, as a heterogeneous tissue, cell subsets in NP and their corresponding biological process in intervertebral disc degeneration (IVDD) are unreported.
    METHOD: Nucleus pulposus were isolated from normal control and IVDD, and then subjected to single-cell RNA sequencing (scRNA-seq). Unsupervised clustering of the cells based on the gene expression profiles using the Seurat package and passed to tSNE for clustering visualization. Rat model of disc degeneration was built to validate the pathways identified by scRNA-Seq.
    RESULTS: Seven chondrocyte subsets were revealed in NP based on differential gene expression, among which 4 subsets (C1-C4) were reported for the first time. Furthermore, GO and KEGG analyses discovered that ferroptosis pathways were enriched. Rat model of disc degeneration was built (n=6/group, control vs. model) to validate the pathways identified by scRNA-Seq. Iron levels of NP were significantly higher in model group than control group (means 0.712 vs. 0.248, respectively, mg/gpro, p=0.0026), and the levels of Heme Oxygenase 1 (HO-1) were also elevated in model group (means 14.33 vs. 5.16 IOD, respectively, p=0.0002). However, the levels of ferritin light chain (FTL) were significantly decreased in model group compared to control group (means 26.17 vs. 9.00 FTL+ cell number, respectively, p=0.0011).
    CONCLUSIONS: Novel chondrocyte subsets in nucleus pulposus were discovered through scRNA-Seq, which provided novel insight to understand the pathological change during the development of IVDD. Ferroptosis participated in disc degeneration pathogenesis and it might serve as a new target for intervening IVDD.
    Keywords:  chondrocyte subsets; ferroptosis; intervertebral disc degeneration; single-cell RNA sequencing
    DOI:  https://doi.org/10.1016/j.joca.2021.06.010
  40. Elife. 2021 Jul 14. pii: e67436. [Epub ahead of print]10
      The ventricular-subventricular zone (V-SVZ), on the walls of the lateral ventricles, harbors the layrgest neurogenic niche in the adult mouse brain. Previous work has shown that neural steym/progenitor cells (NSPCs) in different locations within the V-SVZ produce different subtypes of new neurons for the olfactory bulb. The molecular signatures that underlie this regional heterogeneity remain largely unknown. Here we present a single-cell RNA-sequencing dataset of the adult mouse V-SVZ revealing two populations of NSPCs that reside in largely non-overlapping domains in either the dorsal or ventral V-SVZ. These regional differences in gene expression were further validated using a single-nucleus RNA-sequencing reference dataset of regionally microdissected domains of the V-SVZ and by immunocytochemistry and RNAscope localization. We also identify two subpopulations of young neurons that have gene expression profiles consistent with a dorsal or ventral origin. Interestingly, a subset of genes are dynamically expressed, but maintained, in the ventral or dorsal lineages. The study provides novel markers and territories to understand the region-specific regulation of adult neurogenesis.
    Keywords:  mouse; neuroscience; regenerative medicine; stem cells
    DOI:  https://doi.org/10.7554/eLife.67436
  41. Proc Natl Acad Sci U S A. 2021 Jul 20. pii: e2100473118. [Epub ahead of print]118(29):
      Most high-dimensional datasets are thought to be inherently low-dimensional-that is, data points are constrained to lie on a low-dimensional manifold embedded in a high-dimensional ambient space. Here, we study the viability of two approaches from differential geometry to estimate the Riemannian curvature of these low-dimensional manifolds. The intrinsic approach relates curvature to the Laplace-Beltrami operator using the heat-trace expansion and is agnostic to how a manifold is embedded in a high-dimensional space. The extrinsic approach relates the ambient coordinates of a manifold's embedding to its curvature using the Second Fundamental Form and the Gauss-Codazzi equation. We found that the intrinsic approach fails to accurately estimate the curvature of even a two-dimensional constant-curvature manifold, whereas the extrinsic approach was able to handle more complex toy models, even when confounded by practical constraints like small sample sizes and measurement noise. To test the applicability of the extrinsic approach to real-world data, we computed the curvature of a well-studied manifold of image patches and recapitulated its topological classification as a Klein bottle. Lastly, we applied the extrinsic approach to study single-cell transcriptomic sequencing (scRNAseq) datasets of blood, gastrulation, and brain cells to quantify the Riemannian curvature of scRNAseq manifolds.
    Keywords:  Laplace-Beltrami; Riemannian curvature; data manifold; differential geometry; single-cell transcriptomics
    DOI:  https://doi.org/10.1073/pnas.2100473118
  42. Mol Cell Biochem. 2021 Jul 17.
      Glioma, as one of the most severe human malignancies, is defined as the Central Nervous System's (CNS) tumors. Glioblastoma (GBM) in this regard, is the most malignant type of gliomas. There are multiple therapeutic strategies to cure GBM, for which chemotherapy is often the first-line treatment. Still, various cellular processes, such as uncontrolled proliferation, invasion and metastasis, may disturb the treatment efficacy. Drug resistance is another process in this way, which can also cause undesirable effects. Thereupon, identifying the mechanisms, involved in developing drug resistance and the relevant mechanisms can be very helpful in GBM management. The discovery of exosomal non-coding RNAs (ncRNAs), RNA molecules that can be transferred between the cells and different tissues using the exosomes, was a milestone in this regard. It has been revealed that the key exosomal ncRNAs, including circular RNAs, microRNAs, and long ncRNAs, are able to modulate GBM drug resistance through different signaling pathways or by affecting regulatory proteins and their corresponding genes. Nowadays, researchers are trying to overcome the limitations of chemotherapy by targeting these RNA molecules. Accordingly, this review aims to clarify the substantial roles of exosomal ncRNAs in GBM drug resistance and involved mechanisms.
    Keywords:  Drug resistance; Exosomes; Glioblastoma; Noncoding RNAs
    DOI:  https://doi.org/10.1007/s11010-021-04221-2
  43. Int Rev Cell Mol Biol. 2021 ;pii: S1937-6448(21)00062-9. [Epub ahead of print]362 111-140
      Hematopoiesis is based on the existence of hematopoietic stem cells (HSC) with the capacity to self-proliferate and self-renew or to differentiate into specialized cells. The hematopoietic niche is the essential microenvironment where stem cells reside and integrate various stimuli to determine their fate. Recent studies have identified niche containing high level of calcium (Ca2+) suggesting that HSCs are sensitive to Ca2+. This is a highly versatile and ubiquitous second messenger that regulates a wide variety of cellular functions. Advanced methods for measuring its concentrations, genetic experiments, cell fate tracing data, single-cell imaging, and transcriptomics studies provide information into its specific roles to integrate signaling into an array of mechanisms that determine HSC identity, lineage potential, maintenance, and self-renewal. Accumulating and contrasting evidence, are revealing Ca2+ as a previously unacknowledged feature of HSC, involved in functional maintenance, by regulating multiple actors including transcription and epigenetic factors, Ca2+-dependent kinases and mitochondrial physiology. Mitochondria are significant participants in HSC functions and their responsiveness to cellular demands is controlled to a significant extent via Ca2+ signals. Recent reports indicate that mitochondrial Ca2+ uptake also controls HSC fate. These observations reveal a physiological feature of hematopoietic stem cells that can be harnessed to improve HSC-related disease. In this review, we discuss the current knowledge Ca2+ in hematopoietic stem cell focusing on its potential involvement in proliferation, self-renewal and maintenance of HSC and discuss future research directions.
    Keywords:  AML; Ca(2+); Hematopoietic stem cell; MDS; Mitochondria; Preleukemia; Self-renewal
    DOI:  https://doi.org/10.1016/bs.ircmb.2021.05.003
  44. Bioinformatics. 2021 Jul 09. pii: btab507. [Epub ahead of print]
       MOTIVATION: Genome-wide profiling of transcription factor binding and chromatin states is a widely-used approach for mechanistic understanding of gene regulation. Recent technology development has enabled such profiling at single-cell resolution. However, an end-to-end computational pipeline for analyzing such data is still lacking.
    RESULTS: Here, we have developed a flexible pipeline for analysis and visualization of single-cell CUT&Tag and CUT&RUN data, which provides functions for sequence alignment, quality control, dimensionality reduction, cell clustering, data aggregation, and visualization. Furthermore, it is also seamlessly integrated with the functions in original CUT&RUNTools for population-level analyses. As such, this provides a valuable toolbox for the community.
    AVAILABILITY AND IMPLEMENTATION: https://github.com/fl-yu/CUT-RUNTools-2.0.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab507
  45. Cell Biol Toxicol. 2021 Jul 17.
      Similar to epigenetic DNA and histone modifications, epitranscriptomic modifications (RNA modifications) have emerged as crucial regulators in temporal and spatial gene expression during eukaryotic development. To date, over 170 diverse types of chemical modifications have been identified upon RNA nucleobases. Some of these post-synthesized modifications can be reversibly installed, removed, and decoded by their specific cellular components and play critical roles in different biological processes. Accordingly, dysregulation of RNA modification effectors is tightly orchestrated with developmental processes. Here, we particularly focus on three well-studied RNA modifications, including N6-methyladenosine (m6A), 5-methylcytosine (m5C), and N1-methyladenosine (m1A), and summarize recent knowledge of underlying mechanisms and critical roles of these RNA modifications in stem cell fate determination, embryonic development, and cancer progression, providing a better understanding of the whole association between epitranscriptomic regulation and mammalian development.
    Keywords:  5-methylcytosine; Cancer progression; Embryonic development; N 1-methyladenosine; N 6-methyladenosine; RNA metabolism; Stem cell fate determination
    DOI:  https://doi.org/10.1007/s10565-021-09627-8
  46. Bioinformatics. 2021 07 12. 37(Suppl_1): i289-i298
       MOTIVATION: Circular RNA (circRNA) is a novel class of long non-coding RNAs that have been broadly discovered in the eukaryotic transcriptome. The circular structure arises from a non-canonical splicing process, where the donor site backspliced to an upstream acceptor site. These circRNA sequences are conserved across species. More importantly, rising evidence suggests their vital roles in gene regulation and association with diseases. As the fundamental effort toward elucidating their functions and mechanisms, several computational methods have been proposed to predict the circular structure from the primary sequence. Recently, advanced computational methods leverage deep learning to capture the relevant patterns from RNA sequences and model their interactions to facilitate the prediction. However, these methods fail to fully explore positional information of splice junctions and their deep interaction.
    RESULTS: We present a robust end-to-end framework, Junction Encoder with Deep Interaction (JEDI), for circRNA prediction using only nucleotide sequences. JEDI first leverages the attention mechanism to encode each junction site based on deep bidirectional recurrent neural networks and then presents the novel cross-attention layer to model deep interaction among these sites for backsplicing. Finally, JEDI can not only predict circRNAs but also interpret relationships among splice sites to discover backsplicing hotspots within a gene region. Experiments demonstrate JEDI significantly outperforms state-of-the-art approaches in circRNA prediction on both isoform level and gene level. Moreover, JEDI also shows promising results on zero-shot backsplicing discovery, where none of the existing approaches can achieve.
    AVAILABILITY AND IMPLEMENTATION: The implementation of our framework is available at https://github.com/hallogameboy/JEDI.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab288
  47. Genomics Inform. 2021 Jun;19(2): e17
      Breast cancer is one of the leading causes of cancer in women all over the world and accounts for ~25% of newly observed cancers in women. Epigenetic modifications influence differential expression of genes through non-coding RNA and play a crucial role in cancer regulation. In the present study, epigenetic regulation of gene expression by in-silico analysis of histone modifications using chromatin immunoprecipitation sequencing (ChIP-Seq) has been carried out. Histone modification data of H3K4me3 from one normal-like and four breast cancer cell lines were used to predict miRNA expression at the promoter level. Predicted miRNA promoters (based on ChIP-Seq) were used as a probe to identify gene targets. Five triple-negative breast cancer (TNBC)-specific miRNAs (miR153-1, miR4767, miR4487, miR6720, and miR-LET7I) were identified and corresponding 13 gene targets were predicted. Eight miRNA promoter peaks were predicted to be differentially expressed in at least three breast cancer cell lines (miR4512, miR6791, miR330, miR3180-3, miR6080, miR5787, miR6733, and miR3613). A total of 44 gene targets were identified based on the 3'-untranslated regions of downregulated mRNA genes that contain putative binding targets to these eight miRNAs. These include 17 and 15 genes in luminal-A type and TNBC respectively, that have been reported to be associated with breast cancer regulation. Of the remaining 12 genes, seven (A4GALT, C2ORF74, HRCT1, ZC4H2, ZNF512, ZNF655, and ZNF608) show similar relative expression profiles in large patient samples and other breast cancer cell lines thereby giving insight into predicted role of H3K4me3 mediated gene regulation via the miRNA-mRNA axis.
    Keywords:  ChIP-Seq; RNA-Seq; breast neoplasms; luminal-A/triple-negative; miRNA
    DOI:  https://doi.org/10.5808/gi.21020
  48. Methods Mol Biol. 2021 ;2328 115-138
      With the popularity of high-throughput transcriptomic techniques like RNAseq, models of gene regulatory networks have been important tools for understanding how genes are regulated. These transcriptomic datasets are usually assumed to reflect their associated proteins. This assumption, however, ignores post-transcriptional, translational, and post-translational regulatory mechanisms that regulate protein abundance but not transcript abundance. Here we describe a method to model cross-regulatory influences between the transcripts and proteins of a set of genes using abundance data collected from a series of transgenic experiments. The developed model can capture the effects of regulation that impacts transcription as well as regulatory mechanisms occurring after transcription. This approach uses a sparse maximum likelihood algorithm to determine relationships that influence transcript and protein abundance. An example of how to explore the network topology of this type of model is also presented. This model can be used to predict how the transcript and protein abundances will change in novel transgenic modification strategies.
    Keywords:  Cross-regulation; Multiscale modeling; Protein regulation; Transcript regulation
    DOI:  https://doi.org/10.1007/978-1-0716-1534-8_7
  49. Methods Mol Biol. 2021 ;2328 13-23
      Gene coexpression networks (GCNs) are useful tools for inferring gene functions and understanding biological processes when properly constructed. Traditional microarray analysis is being more frequently replaced by bulk-based RNA-sequencing as a method for quantifying gene expression. This new technology requires improved statistical methods for generating GCNs. This chapter explores several popular methods for constructing GCNs using bulk-based RNA-Seq data, such as distribution-based methods and normalization techniques, implemented using the statistical programming language R.
    Keywords:  Bulk-based RNA-Seq; Correlation coefficient; Count data; Gene coexpression network; Gene regulatory network
    DOI:  https://doi.org/10.1007/978-1-0716-1534-8_2
  50. Cancer Res. 2021 Jul 15. 81(14): 3749-3750
      
  51. RNA. 2021 Jul 08. pii: rna.078858.121. [Epub ahead of print]
      Stress granules (SGs) are membraneless organelles composed of mRNAs and RNA binding proteins which undergo assembly in response to stress-induced inactivation of translation initiation. In general, SG recruitment is limited to a subpopulation of a given mRNA species and RNA-seq analyses of purified SGs revealed that signal sequence-encoding (i.e. endoplasmic reticulum (ER)-targeted) transcripts are significantly under-represented, consistent with prior reports that ER localization can protect mRNAs from SG recruitment. Using translational profiling, cell fractionation, and single molecule mRNA imaging, we examined SG biogenesis following activation of the unfolded protein response (UPR) by 1,4-dithiothreitol (DTT) and report that gene-specific subsets of cytosolic and ER-targeted mRNAs can be recruited into SGs. Furthermore, we demonstrate that SGs form in close proximity to or directly associated with the ER membrane. ER-associated SG assembly was also observed during arsenite stress, suggesting broad roles for the ER in SG biogenesis. Recruitment of a given mRNA into SGs required stress-induced translational repression, though translational inhibition was not solely predictive of an mRNA's propensity for SG recruitment. SG formation was prevented by the transcriptional inhibitors actinomycin D or triptolide, suggesting a functional link between gene transcriptional state and SG biogenesis. Collectively these data demonstrate that ER-targeted and cytosolic mRNAs can be recruited into ER-associated SGs and this recruitment is sensitive to transcriptional inhibition. We propose that newly transcribed mRNAs exported under conditions of suppressed translation initiation are primary SG substrates, with the ER serving as the central subcellular site of SG formation.
    Keywords:  endoplasmic reticulum; mRNA; stress granule; translational regulation; unfolded protein response
    DOI:  https://doi.org/10.1261/rna.078858.121
  52. Adv Exp Med Biol. 2021 ;1208 289-309
      Autophagy is a catabolic process that removes aggregated proteins and damaged organelles via lysosomal degradation. Increasing evidence suggests that dysfunction of autophagy is associated with a variety of human pathologies, including aging, cancer, neurodegenerative diseases, heart diseases, diabetes, and other metabolic diseases. Current research suggests that the regulation of autophagy may be a novel target for the treatment of these diseases. For this purpose, it is essential to have a deep understanding on the molecular details of autophagy and its regulatory network in each of the disease contexts. Over the years, a variety of chemical autophagy inducers and inhibitors has been developed. The application of these autophagy regulators can assist us in the exploration of the mechanism and therapeutic potential of autophagy regulation. In this chapter, we summarize the recent advances in chemical autophagy regulators to provide methodological support for autophagy research.
    DOI:  https://doi.org/10.1007/978-981-16-2830-6_13
  53. Bioinformatics. 2021 07 12. 37(Suppl_1): i327-i333
       MOTIVATION: While promoter methylation is associated with reinforcing fundamental tissue identities, the methylation status of distant enhancers was shown by genome-wide association studies to be a powerful determinant of cell-state and cancer. With recent availability of long reads that report on the methylation status of enhancer-promoter pairs on the same molecule, we hypothesized that probing these pairs on the single-molecule level may serve the basis for detection of rare cancerous transformations in a given cell population. We explore various analysis approaches for deconvolving cell-type mixtures based on their genome-wide enhancer-promoter methylation profiles.
    RESULTS: To evaluate our hypothesis we examine long-read optical methylome data for the GM12878 cell line and myoblast cell lines from two donors. We identified over 100 000 enhancer-promoter pairs that co-exist on at least 30 individual DNA molecules. We developed a detailed methodology for mixture deconvolution and applied it to estimate the proportional cell compositions in synthetic mixtures. Analysis of promoter methylation, as well as enhancer-promoter pairwise methylation, resulted in very accurate estimates. In addition, we show that pairwise methylation analysis can be generalized from deconvolving different cell types to subtle scenarios where one wishes to resolve different cell populations of the same cell-type.
    AVAILABILITY AND IMPLEMENTATION: The code used in this work to analyze single-molecule Bionano Genomics optical maps is available via the GitHub repository https://github.com/ebensteinLab/Single_molecule_methylation_in_EP.
    DOI:  https://doi.org/10.1093/bioinformatics/btab306
  54. Methods Mol Biol. 2021 ;2328 67-97
      Diverse cellular phenotypes are determined by groups of transcription factors (TFs) and other regulators that influence each others' gene expression, forming transcriptional gene regulatory networks (GRNs). In many biological contexts, especially in development and associated diseases, the expression of the genes in GRNs is not static but evolves in time. Modeling the dynamics of GRN state is an important approach for understanding diverse cellular phenomena such as cell-fate specification, pluripotency and cell-fate reprogramming, oncogenesis, and tissue regeneration. In this protocol, we describe how to model GRNs using a data-driven dynamic modeling methodology, gene circuits. Gene circuits do not require knowledge of the GRN topology and connectivity but instead learn them from training data, making them very general and applicable to diverse biological contexts. We utilize the MATLAB-based gene circuit modeling software Fast Inference of Gene Regulation (FIGR) for training the model on quantitative gene expression data and simulating the GRN. We describe all the steps in the modeling life cycle, from formulating the model, training the model using FIGR, simulating the GRN, to analyzing and interpreting the model output. This protocol highlights these steps with the example of a dynamical model of the gap gene GRN involved in Drosophila segmentation and includes example MATLAB statements for each step.
    Keywords:  Binary classification; Cell fate; Development; Differential equations; Differentiation; Dynamical modeling; Gene regulatory networks; Parameter inference; Pattern formation; Transcriptional networks
    DOI:  https://doi.org/10.1007/978-1-0716-1534-8_5
  55. Bioinformatics. 2021 07 12. 37(Suppl_1): i222-i230
       MOTIVATION: Increasing evidence suggests that post-transcriptional ribonucleic acid (RNA) modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches for predicting RNA modifications have been developed, most of which were based on strong supervision enabled by base-resolution epitranscriptome data. However, high-resolution data may not be available.
    RESULTS: We propose WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their respective sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. WeakRM outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution.
    AVAILABILITY IMPLEMENTATION: The source code for the WeakRM algorithm, along with the datasets used, are freely accessible at: https://github.com/daiyun02211/WeakRM.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab278
  56. J Cancer Immunol (Wilmington). 2021 ;3(1): 47-59
      Immunogenic cell death (ICD) plays a major role in providing long lasting protective antitumor immunity by the chronic exposure of damage associated molecular patterns (DAMPs) in the tumor microenvironment (TME). DAMPs are essential for attracting immunogenic cells to the TME, maturation of DCs, and proper presentation of tumor antigens to the T cells so they can kill more cancer cells. Thus for the proper release of DAMPs, a controlled mechanism of cell death is necessary. Drug induced tumor cell killing occurs by apoptosis, wherein autophagy may act as a shield protecting the tumor cells and sometimes providing multi-drug resistance to chemotherapeutics. However, autophagy is required for the release of ATP as it remains one of the key DAMPs for the induction of ICD. In this review, we discuss the intricate balance between autophagy and apoptosis and the various strategies that we can apply to make these immunologically silent processes immunogenic. There are several steps of autophagy and apoptosis that can be regulated to generate an immune response. The genes involved in the processes can be regulated by drugs or inhibitors to amplify the effects of ICD and therefore serve as potential therapeutic targets.
    Keywords:  ATP; Apoptosis; Autophagy; Caspase; Immunogenic cell death; Multi-drug resistance
    DOI:  https://doi.org/10.33696/cancerimmunol.3.041
  57. FEBS J. 2021 Jul 10.
      Complex, multi-step biochemical reactions that routinely take place in our cells require high concentrations of enzymes, substrates and other structural components to proceed efficiently, and typically require chemical environments that can inhibit other reactions in their immediate vicinity. Eukaryotic cells solve these problems by restricting such reactions into diffusion-restricted compartments within the cell called organelles that can be separated from their environment by a lipid membrane, or into membrane-less compartments that form through liquid-liquid phase separation (LLPS). One of the most easily noticeable, and the earliest discovered organelle is the nucleus, which harbors the genetic material in cells where transcription by RNA polymerases produce most of the messenger RNAs and a plethora of noncoding RNAs, which in turn are required for translation of mRNAs in the cytoplasm. The interior of the nucleus is not a uniform soup of biomolecules, and rather consists of a variety of membraneless bodies, such as the nucleolus, nuclear speckles (NS), paraspeckles, Cajal bodies, histone locus bodies and more. In this review, we will focus on NS with an emphasis on recent developments including our own findings about the formation of NS by two large IDR-rich proteins SON and SRRM2.
    Keywords:  Nuclear speckles; Phase separation; SON; SRRM2; Splicing; Transcription
    DOI:  https://doi.org/10.1111/febs.16117
  58. J Bone Miner Metab. 2021 Jul 11.
      Osteoporosis is a common form of metabolic bone disease that is costly to treat and is primarily diagnosed on the basis of bone mineral density. As the influences of genetic lesions and environmental factors are increasingly studied in the pathological development of osteoporosis, regulated epigenetics are emerging as the important pathogenesis mechanisms in osteoporosis. Recently, osteoporosis genome-wide association studies and multi-omics technologies have revealed that susceptibility loci and the misregulation of epigenetic modifiers are key factors in osteoporosis. Over the past decade, extensive studies have demonstrated epigenetic mechanisms, such as DNA methylation, histone/chromatin modifications, and non-coding RNAs, as potential contributing factors in osteoporosis that affect disease initiation and progression. Herein, we review recent advances in epigenetics in osteoporosis, with a focus on exploring the underlying mechanisms and potential diagnostic/prognostic biomarker applications for osteoporosis.
    Keywords:  DNA methylation; Epigenetics; Histone modification; NcRNA; Osteoporosis
    DOI:  https://doi.org/10.1007/s00774-021-01249-8
  59. Front Oncol. 2021 ;11 678333
      Cancer stem cells (CSCs) are a minority subset of cancer cells that can drive tumor initiation, promote tumor progression, and induce drug resistance. CSCs are difficult to eliminate by conventional therapies and eventually mediate tumor relapse and metastasis. Moreover, recent studies have shown that CSCs display plasticity that renders them to alter their phenotype and function. Consequently, the varied phenotypes result in varied tumorigenesis, dissemination, and drug-resistance potential, thereby adding to the complexity of tumor heterogeneity and further challenging clinical management of cancers. In recent years, tumor microenvironment (TME) has become a hotspot in cancer research owing to its successful application in clinical tumor immunotherapy. Notably, emerging evidence shows that the TME is involved in regulating CSC plasticity. TME can activate stemness pathways and promote immune escape through cytokines and exosomes secreted by immune cells or stromal cells, thereby inducing non-CSCs to acquire CSC properties and increasing CSC plasticity. However, the relationship between TME and plasticity of CSCs remains poorly understood. In this review, we discuss the emerging investigations on TME and CSC plasticity to illustrate the underlying mechanisms and potential implications in suppressing cancer progression and drug resistance. We consider that this review can help develop novel therapeutic strategies by taking into account the interlink between TME and CSC plasticity.
    Keywords:  cancer progression; cancer stem cell; plasticity; resistance; tumor microenvironment
    DOI:  https://doi.org/10.3389/fonc.2021.678333
  60. Adv Exp Med Biol. 2021 ;1208 387-453
      Autophagy is an important and dynamic biological process, and provides an ideal application scenario for bioinformatics to develop new data resources, algorithms, tools and computational or mathematic models for a better understanding of complex regulatory mechanisms in cells. In the past decade, great efforts have been taken on the development of numerous bioinformatics technologies in autophagy research, and a comprehensive summarization of these important studies will provide a timely reference for both biologists and bioinformaticians who are working in the field of autophagy. In this book chapter, we first introduce bioinformatics technologies that allow sequence analysis of autophagy genes. We briefly summarize the mainstream algorithms in sequence alignment for the identification of homologous autophagy genes and emphasize the computational identification of potential orthologs and paralogs, as well as the evolutionary analysis of autophagy gene families. Three methods for the recognition of autophagy-related sequence motifs are introduced: regular expression, position-specific scoring matrix (PSSM) and group-based prediction system (GPS). Second, we carefully summarize recent progress in the analysis of autophagy-related omics data. We discuss how two major types of computational methods, enrichment analysis and network analysis can be used to analyze omics data, including transcriptomics, non-coding RNAomics, epigenomics, proteomics, phosphoproteomics and protein lysine modification (PLM) omics data. Finally, we summarize several important autophagy-related data resources, including both autophagy gene databases and autophagy-related RNA databases. We anticipate that more useful bioinformatics technologies will be developed and play an ever-more-important role in the analysis of autophagy.
    DOI:  https://doi.org/10.1007/978-981-16-2830-6_18
  61. Bioinformatics. 2021 07 12. 37(Suppl_1): i34-i41
       MOTIVATION: Metatranscriptomics (MTX) has become an increasingly practical way to profile the functional activity of microbial communities in situ. However, MTX remains underutilized due to experimental and computational limitations. The latter are complicated by non-independent changes in both RNA transcript levels and their underlying genomic DNA copies (as microbes simultaneously change their overall abundance in the population and regulate individual transcripts), genetic plasticity (as whole loci are frequently gained and lost in microbial lineages) and measurement compositionality and zero-inflation. Here, we present a systematic evaluation of and recommendations for differential expression (DE) analysis in MTX.
    RESULTS: We designed and assessed six statistical models for DE discovery in MTX that incorporate different combinations of DNA and RNA normalization and assumptions about the underlying changes of gene copies or species abundance within communities. We evaluated these models on multiple simulated and real multi-omic datasets. Models adjusting transcripts relative to their encoding gene copies as a covariate were significantly more accurate in identifying DE from MTX in both simulated and real datasets. Moreover, we show that when paired DNA measurements (metagenomic data) are not available, models normalizing MTX measurements within-species while also adjusting for total-species RNA balance sensitivity, specificity and interpretability of DE detection, as does filtering likely technical zeros. The efficiency and accuracy of these models pave the way for more effective MTX-based DE discovery in microbial communities.
    AVAILABILITY AND IMPLEMENTATION: The analysis code and synthetic datasets used in this evaluation are available online at http://huttenhower.sph.harvard.edu/mtx2021.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab327
  62. APL Bioeng. 2021 Sep;5(3): 030401
      Cells are exposed to a variety of mechanical forces in their daily lives, especially endothelial cells that are stretched from vessel distention and are exposed to hemodynamic shear stress from a blood flow. Exposure to excessive forces can induce a disease, but the molecular details on how these cells perceive forces, transduce them into biochemical signals and genetic events, i.e., mechanotransduction, and integrate them into physiological or pathological changes remain unclear. However, seminal studies in endothelial cells over the past several decades have begun to elucidate some of these signals. These studies have been highlighted in APL Bioengineering and elsewhere, describing a complex temporal pattern where forces are sensed immediately by ion channels and force-dependent conformational changes in surface proteins, followed by biochemical cascades, cytoskeletal contraction, and nuclear remodeling that can affect long-term changes in endothelial morphology and fate. Key examples from the endothelial literature that have established these pathways include showing that integrins and Flk-1 or VE-cadherin act as shear stress transducers, activating downstream proteins such as Cbl and Nckβ or Src, respectively. In this Editorial, we summarize a recent literature highlighting these accomplishments, noting the engineering tools and analysis methods used in these discoveries while also highlighting unanswered questions.
    DOI:  https://doi.org/10.1063/5.0058611