bims-gerecp 2024-04-14 papers

Issue of 2024–04–14
five papers selected by
Xiao Qin, University of Oxford

Nat Biotechnol. 2024 Apr 12.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data.

Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.

DOI: https://doi.org/10.1038/s41587-024-02182-7

Genome Biol. 2024 Apr 08. 25(1): 90

scifi-ATAC-seq: massive-scale single-cell chromatin accessibility sequencing using combinatorial fluidic indexing.

Xuan Zhang, Alexandre P Marand, Haidong Yan, Robert J Schmitz.

Single-cell ATAC-seq has emerged as a powerful approach for revealing candidate cis-regulatory elements genome-wide at cell-type resolution. However, current single-cell methods suffer from limited throughput and high costs. Here, we present a novel technique called scifi-ATAC-seq, single-cell combinatorial fluidic indexing ATAC-sequencing, which combines a barcoded Tn5 pre-indexing step with droplet-based single-cell ATAC-seq using the 10X Genomics platform. With scifi-ATAC-seq, up to 200,000 nuclei across multiple samples can be indexed in a single emulsion reaction, representing an approximately 20-fold increase in throughput compared to the standard 10X Genomics workflow.

Keywords: ATAC-seq; Chromatin accessibility; Combinatorial fluidic indexing; Massive-scale; Single-cell

DOI: https://doi.org/10.1186/s13059-024-03235-5

Cancer Res. 2024 Apr 08.

Transfer learning reveals cancer-associated fibroblasts are associated with epithelial-mesenchymal transition and inflammation in cancer cells in pancreatic ductal adenocarcinoma.

Pancreatic ductal adenocarcinoma (PDAC) is an aggressive malignancy characterized by an immunosuppressive tumor microenvironment enriched with cancer associated fibroblasts (CAFs). This study utilized a convergence approach to identify tumor cell and CAF interactions through the integration of single-cell data from human tumors with human organoid co-culture experiments. Analysis of a comprehensive atlas of PDAC single-cell RNA sequencing (scRNA-seq) data indicated that CAF density is associated with increased inflammation and epithelial-mesenchymal transition (EMT) in epithelial cells. Transfer learning using transcriptional data from patient-derived organoid and CAF co-cultures provided in silico validation of CAF induction of inflammatory and EMT epithelial cell states. Further experimental validation in co-cultures demonstrated integrin beta 1 (ITGB1) and vascular endothelial factor A (VEGF-A) interactions with neuropilin-1 (NRP1) mediating CAF-epithelial cell crosstalk. Together, this study introduces transfer learning from human single-cell data to organoid co-culture analyses for experimental validation of discoveries of cell-cell crosstalk and identifies fibroblast-mediated regulation of EMT and inflammation.

DOI: https://doi.org/10.1158/0008-5472.CAN-23-1660

STAR Protoc. 2024 Apr 10. pii: S2666-1667(24)00171-0. [Epub ahead of print]5(2): 103006

Protocol for unsupervised inference of cell-cell communication using matrix decomposition.

Yi Liu, Xiao Chang, Xiaoping Liu.

Exploring cell-cell communication is pivotal for understanding biological processes in multicellular life forms. Here, we present a protocol that details the use of matrix decomposition to infer cell-cell communication (MDIC3) for unsupervised cell-cell communication inference. We describe steps for using the MDIC3 Python scripts to deduce cell-cell communication and identify key ligand-receptor pairs between a specific cell type pair from a single-cell gene expression dataset. This protocol has potential application in cell-cell communication inference on any species. For complete details on the use and execution of this protocol, please refer to Liu et al.1.

Keywords: Bioinformatics; Cell Biology; Gene Expression; Single Cell

DOI: https://doi.org/10.1016/j.xpro.2024.103006

BMC Genomics. 2024 Apr 12. 25(1): 361

Split Pool Ligation-based Single-cell Transcriptome sequencing (SPLiT-seq) data processing pipeline comparison.

Lucas Kuijpers, Bastian Hornung, Mirjam C G N van den Hout-van Vroonhoven, Wilfred F J van IJcken, Frank Grosveld, Eskeatnaf Mulugeta.

BACKGROUND: Single-cell sequencing techniques are revolutionizing every field of biology by providing the ability to measure the abundance of biological molecules at a single-cell resolution. Although single-cell sequencing approaches have been developed for several molecular modalities, single-cell transcriptome sequencing is the most prevalent and widely applied technique. SPLiT-seq (split-pool ligation-based transcriptome sequencing) is one of these single-cell transcriptome techniques that applies a unique combinatorial-barcoding approach by splitting and pooling cells into multi-well plates containing barcodes. This unique approach required the development of dedicated computational tools to preprocess the data and extract the count matrices. Here we compare eight bioinformatic pipelines (alevin-fry splitp, LR-splitpipe, SCSit, splitpipe, splitpipeline, SPLiTseq-demultiplex, STARsolo and zUMI) that have been developed to process SPLiT-seq data. We provide an overview of the tools, their computational performance, functionality and impact on downstream processing of the single-cell data, which vary greatly depending on the tool used.
RESULTS: We show that STARsolo, splitpipe and alevin-fry splitp can all handle large amount of data within reasonable time. In contrast, the other five pipelines are slow when handling large datasets. When using smaller dataset, cell barcode results are similar with the exception of SPLiTseq-demultiplex and splitpipeline. LR-splitpipe that is originally designed for processing long-read sequencing data is the slowest of all pipelines. Alevin-fry produced different down-stream results that are difficult to interpret. STARsolo functions nearly identical to splitpipe and produce results that are highly similar to each other. However, STARsolo lacks the function to collapse random hexamer reads for which some additional coding is required.
CONCLUSION: Our comprehensive comparative analysis aids users in selecting the most suitable analysis tool for efficient SPLiT-seq data processing, while also detailing the specific prerequisites for each of these pipelines. From the available pipelines, we recommend splitpipe or STARSolo for SPLiT-seq data analysis.

Keywords: Combinatorial barcoding; Data-preprocessing; SPLiT-seq; Single cell RNA sequencing; Split-pool barcoding

DOI: https://doi.org/10.1186/s12864-024-10285-3