bims-strubi Biomed News
on Advances in structural biology
Issue of 2021‒08‒29
thirty-six papers selected by
Alessandro Grinzato
European Synchrotron Radiation Facility

  1. Int J Mol Sci. 2021 Aug 19. pii: 8940. [Epub ahead of print]22(16):
      Cryo-electron microscopy (Cryo-EM) has become a routine technology for resolving the structure of biological macromolecules due to the resolution revolution in recent years. The specimens are typically prepared in a very thin layer of vitrified ice suspending in the holes of the perforated amorphous carbon film. However, the samples prepared by directly applying to the conventional support membranes may suffer from partial or complete denaturation caused by sticking to the air-water interface (AWI). With the application in materials, graphene has also been used recently to improve frozen sample preparation instead of a suspended conventional amorphous thin carbon. It has been proven that graphene or graphene oxide and various chemical modifications on its surface can effectively prevent particles from adsorbing to the AWI, which improves the dispersion, adsorbed number, and orientation preference of frozen particles in the ice layer. Their excellent properties and thinner thickness can significantly reduce the background noise, allowing high-resolution three-dimensional reconstructions using a minimum data set.
    Keywords:  chemical modification; cryo-EM; graphene; graphene oxide; sample preparation; single-particle
  2. PLoS Biol. 2021 Aug;19(8): e3001318
      Subtomogram averaging (STA) is a powerful image processing technique in electron tomography used to determine the 3D structure of macromolecular complexes in their native environments. It is a fast growing technique with increasing importance in structural biology. The computational aspect of STA is very complex and depends on a large number of variables. We noticed a lack of detailed guides for STA processing. Also, current publications in this field often lack a documentation that is practical enough to reproduce the results with reasonable effort, which is necessary for the scientific community to grow. We therefore provide a complete, detailed, and fully reproducible processing protocol that covers all aspects of particle picking and particle alignment in STA. The command line-based workflow is fully based on the popular Dynamo software for STA. Within this workflow, we also demonstrate how large parts of the processing pipeline can be streamlined and automatized for increased throughput. This protocol is aimed at users on all levels. It can be used for training purposes, or it can serve as basis to design user-specific projects by taking advantage of the flexibility of Dynamo by modifying and expanding the given pipeline. The protocol is successfully validated using the Electron Microscopy Public Image Archive (EMPIAR) database entry 10164 from immature HIV-1 virus-like particles (VLPs) that describe a geometry often seen in electron tomography.
  3. J Vis Exp. 2021 Aug 05.
      Presented here is a protocol for preparing cryo-lamellae from plunge-frozen grids of Plasmodium falciparum-infected human erythrocytes, which could easily be adapted for other biological samples. The basic principles for preparing samples, milling, and viewing lamellae are common to all instruments and the protocol can be followed as a general guide to on-grid cryo-lamella preparation for cryo-electron microscopy (cryoEM) and cryo-electron tomography (cryoET). Electron microscopy grids supporting the cells are plunge-frozen into liquid nitrogen-cooled liquid ethane using a manual or automated plunge freezer, then screened on a light microscope equipped with a cryo-stage. Frozen grids are transferred into a cryo-scanning electron microscope equipped with a focused ion beam (cryoFIB-SEM). Grids are routinely sputter coated prior to milling, which aids dispersal of charge build-up during milling. Alternatively, an e-beam rotary coater can be used to apply a layer of carbon-platinum to the grids, the exact thickness of which can be more precisely controlled. Once inside the cryoFIB-SEM an additional coating of an organoplatinum compound is applied to the surface of the grid via a gas injection system (GIS). This layer protects the front edge of the lamella as it is milled, the integrity of which is critical for achieving uniformly thin lamellae. Regions of interest are identified via SEM and milling is carried out in a step-wise fashion, reducing the current of the ion beam as the lamella reaches electron transparency, in order to avoid excessive heat generation. A grid with multiple lamellae is then transferred to a transmission electron microscope (TEM) under cryogenic conditions for tilt-series acquisition. A robust and contamination-free workflow for lamella preparation is an essential step for downstream techniques, including cellular cryoEM, cryoET, and sub-tomogram averaging. Development of these techniques, especially for lift-out and milling of high-pressure frozen samples, is of high-priority in the field.
  4. Proteins. 2021 Aug 27.
      CASP (Critical Assessment of Structure prediction) conducts community experiments to determine the state of the art in computing protein structure from amino acid sequence. The process relies on the experimental community providing information about not yet public or about to be solved structures, for use as targets. For some targets, the experimental structure is not solved in time for use in CASP. Calculated structure accuracy improved dramatically in this round, implying that models should now be much more useful for resolving many sorts of experimental difficulties. To test this, selected models for seven unsolved targets were provided to the experimental groups. These models were from the AlphaFold2 group, who overall submitted the most accurate predictions in CASP14. Four targets were solved with the aid of the models, and, additionally, the structure of an already solved target was improved. An a posteriori analysis showed that in some cases models from other groups would also be effective. This paper provides accounts of the successful application of models to structure determination, including molecular replacement for X-ray crystallography, backbone tracing and sequence positioning in a Cryo-EM structure, and correction of local features. The results suggest that in future there will be greatly increased synergy between computational and experimental approaches to structure determination. This article is protected by copyright. All rights reserved.
    Keywords:  CASP, Protein Structure Prediction; X-ray crystallography; cryo-EM
  5. Science. 2021 Aug 27. pii: eaba0954. [Epub ahead of print]373(6558):
      Conformational changes within biological macromolecules control a vast array of chemical reactions in living cells. Time-resolved crystallography can reveal time-dependent structural changes that occur within protein crystals, yielding chemical insights in unparalleled detail. Serial crystallography approaches developed at x-ray free-electron lasers are now routinely used for time-resolved diffraction studies of macromolecules. These techniques are increasingly being applied at synchrotron radiation sources and to a growing diversity of macromolecules. Here, we review recent progress in the field, including visualizing ultrafast structural changes that guide the initial trajectories of light-driven reactions as well as capturing biologically important conformational changes on slower time scales, for which bacteriorhodopsin and photosystem II are presented as illustrative case studies.
  6. Hum Gene Ther. 2021 Aug 27.
      Gene therapy has evolved over the past decade into a promising therapeutic class for treating many intractable diseases. Recombinant adeno-associated virus (AAV) is the most commonly used viral vector for delivering therapeutic genes. Independent of the manufacturing process for AAVs, the clinical materials are inherently heterogeneous and contain both empty and full capsids. Empty capsids can impact the safety and efficacy of AAV products and therefore their level needs to be controlled. Several analytical methods have been reported for this purpose. However, some of these methods have an insufficient assay range, or rely on instruments that cannot be readily implemented in a QC environment. Here, we describe a fast size exclusion chromatography (SEC) assay with dual-wavelength detection (SEC-DW) to directly determine the percent full capsids of AAV samples based on their peak area (PA) ratios. The two detection wavelengths selected to represent encapsidated transgenes and capsid proteins are 260 nm and 230 nm, respectively instead of the conventionally used 260 nm and 280 nm. The use of 230 nm instead of 280 nm to monitor the contribution of the capsid protein results in a linear relationship between the PA260/PA230 ratio and the percent full capsids, unlike the non-linear relationship observed when the PA260/PA280 ratio is used. As a result, the method exhibits a significantly extended assay range (up to 91% full capsids). The accuracy of the SEC-DW method was confirmed by comparing the results obtained against results from orthogonal high-resolution methods such as analytical ultracentrifugation (AUC) and cryo-electron microscopy (Cryo-EM) and excellent agreement was obtained when common samples were analyzed using the different methods. The SEC-DW method runs on a readily accessible HPLC instrument platform, provides much higher assay throughput compared to AUC and electron microscopy (EM), and can be implemented as a release method in a QC environment or used as a rapid screening tool to support process development and product understanding.
  7. PLoS Biol. 2021 Aug;19(8): e3001319
      Cryo-electron tomography (cryo-ET) and subtomogram averaging (STA) are increasingly used for macromolecular structure determination in situ. Here, we introduce a set of computational tools and resources designed to enable flexible approaches to STA through increased automation and simplified metadata handling. We create a bidirectional interface between the Dynamo software package and the Warp-Relion-M pipeline, providing a framework for ab initio and geometrical approaches to multiparticle refinement in M. We illustrate the power of working within this framework by applying it to EMPIAR-10164, a publicly available dataset containing immature HIV-1 virus-like particles (VLPs), and a challenging in situ dataset containing chemosensory arrays in bacterial minicells. Additionally, we provide a comprehensive, step-by-step guide to obtaining a 3.4-Å reconstruction from EMPIAR-10164. The guide is hosted on, a collaborative online platform we establish for sharing knowledge about cryo-ET.
  8. J Appl Crystallogr. 2021 Aug 01. 54(Pt 4): 1034-1046
      A novel capillary-based microfluidic strategy to accelerate the process of small-molecule-compound screening by room-temperature X-ray crystallography using protein crystals is reported. The ultra-thin microfluidic devices are composed of a UV-curable polymer, patterned by cleanroom photolithography, and have nine capillary channels per chip. The chip was designed for ease of sample manipulation, sample stability and minimal X-ray background. 3D-printed frames and cassettes conforming to SBS standards are used to house the capillary chips, providing additional mechanical stability and compatibility with automated liquid- and sample-handling robotics. These devices enable an innovative in situ crystal-soaking screening workflow, akin to high-throughput compound screening, such that quantitative electron density maps sufficient to determine weak binding events are efficiently obtained. This work paves the way for adopting a room-temperature microfluidics-based sample delivery method at synchrotron sources to facilitate high-throughput protein-crystallography-based screening of compounds at high concentration with the aim of discovering novel binding events in an automated manner.
    Keywords:  X-ray diffraction; compound screening; microfluidics; protein crystallography; structural biology
  9. Biotechnol Bioeng. 2021 Aug 26.
      In this work, we show that maltose-binding protein (MBP) is capable of facilitating stable gold nanoparticle synthesis, and a structure of MBP in the presence of gold ions was determined by X-ray crystallography. Using this high-resolution structure of gold ion bound MBP, a peptide (AT1) was selected and synthesized, and was shown to also aid in the synthesis of stable gold nanoparticles under similar experimental conditions to those used for protein facilitated synthesis. This structure-based approach represents a new potential method for the selection of peptides capable of facilitating stable nanoparticle synthesis. This article is protected by copyright. All rights reserved.
    Keywords:  Gold nanoparticles; biomineralization; nanobiotechnology; protein crystallography
  10. Int J Mol Sci. 2021 Aug 23. pii: 9081. [Epub ahead of print]22(16):
      Protein homo-oligomerization is a very common phenomenon, and approximately half of proteins form homo-oligomeric assemblies composed of identical subunits. The vast majority of such assemblies possess internal symmetry which can be either exploited to help or poses challenges during structure determination. Moreover, aspects of symmetry are critical in the modeling of protein homo-oligomers either by docking or by homology-based approaches. Here, we first provide a brief overview of the nature of protein homo-oligomerization. Next, we describe how the symmetry of homo-oligomers is addressed by crystallographic and non-crystallographic symmetry operations, and how biologically relevant intermolecular interactions can be deciphered from the ordered array of molecules within protein crystals. Additionally, we describe the most important aspects of protein homo-oligomerization in structure determination by NMR. Finally, we give an overview of approaches aimed at modeling homo-oligomers using computational methods that specifically address their internal symmetry and allow the incorporation of other experimental data as spatial restraints to achieve higher model reliability.
    Keywords:  homo-oligomers; modeling; structure determination
  11. Sci Rep. 2021 Aug 23. 11(1): 17038
      Over the last decades the phase problem in macromolecular x-ray crystallography has become more controllable as methods and approaches have diversified and improved. However, solving the phase problem is still one of the biggest obstacles on the way of successfully determining a crystal structure. To overcome this caveat, we have utilized the anomalous scattering properties of the heavy alkali metal cesium. We investigated the introduction of cesium in form of cesium chloride during the three major steps of protein treatment in crystallography: purification, crystallization, and cryo-protection. We derived a step-wise procedure encompassing a "quick-soak"-only approach and a combined approach of CsCl supplement during purification and cryo-protection. This procedure was successfully applied on two different proteins: (i) Lysozyme and (ii) as a proof of principle, a construct consisting of the PH domain of the TFIIH subunit p62 from Chaetomium thermophilum for de novo structure determination. Usage of CsCl thus provides a versatile, general, easy to use, and low cost phasing strategy.
  12. Anal Chem. 2021 Aug 25.
      An effective intensity-based reference is a cornerstone for quantitative nuclear magnetic resonance (NMR) studies, as the molecular concentration is encoded in its signal. In theory, NMR is well suited for the measurement of competitive protein adsorption onto nanoparticle (NP) surfaces, but current referencing systems are not optimized for multidimensional experiments. Presented herein is a simple and novel referencing system using 15N tryptophan (Trp) as an external reference for 1H-15N 2D NMR experiments. The referencing system is validated by the determination of the binding capacity of a single protein onto gold NPs. Then, the Trp reference is applied to protein mixtures, and signals from each protein are accurately quantified. All results are consistent with previous studies, but with substantially higher precision, indicating that the Trp reference can accurately calibrate the residue peak intensities and reduce systematic errors. Finally, the proposed Trp reference is used to kinetically monitor in situ and in real time the competitive adsorption of different proteins. As a challenging test case, we successfully apply our approach to a mixture of protein variants differing by only a single residue. Our results show that the binding of one protein will affect the binding of the other, leading to an altered NP corona composition. This work therefore highlights the importance of studying protein-NP interactions in protein mixtures in situ, and the referencing system developed here enables the quantification of binding kinetics and thermodynamics of multiple proteins using various 1H-15N 2D NMR techniques.
  13. Biophys Chem. 2021 Aug 13. pii: S0301-4622(21)00148-4. [Epub ahead of print]278 106666
      Protein-protein interaction plays an important role in life activities. A more fine-grained analysis, such as residues and atoms level, will better benefit us to understand the mechanism for inter-protein interaction and drug design. The development of efficient computational methods to reduce trials and errors, as well as assisting experimental researchers to determine the complex structure are some of the ongoing studies in the field. The research of trimer protein interface, especially homotrimer, has been rarely studied. In this paper, we proposed an interpretable machine learning method for homo-trimeric protein interface residue pairs prediction. The structure, sequence, and physicochemical information are intergraded as feature input fed to model for training. Graph model is utilized to present spatial information for intra-protein. Matrix factorization captures the different features' interactions. Kernel function is designed to auto-acquire the adjacent information of our target residue pairs. The accuracy rate achieves 54.5% in an independent test set. Sequence and structure alignment exhibit the ability of model self-study. Our model indicates the biological significance between sequence and structure, and could be auxiliary for reducing trials and errors in the fields of protein complex determination and protein-protein docking, etc. SIGNIFICANCE: Protein complex structures are significant for understanding protein function and promising functional protein design. With data increasing, some computational tools have been developed for protein complex residue contact prediction, which is one of the most significant steps for complex structure prediction. But for homo-trimeric protein, the sequence-based deep learning predictors are infeasible for homologous sequences, and the algorithm black box prevents us from understanding of each step operation. In this way, we propose an interpreting machine learning method for homo-trimeric protein interface residue-residue interaction prediction, and the predictor shows a good performance. Our work provides a computational auxiliary way for determining the homo-trimeric proteins interface residue pairs which will be further verified by wet experiments, and and gives a hand for the downstream works, such as protein-protein docking, protein complex structure prediction and drug design.
    Keywords:  Graph model; Homotrimer; Interpretable machine learning; Matrix factorization
  14. Viruses. 2021 Aug 06. pii: 1555. [Epub ahead of print]13(8):
      Three-dimensional RNA domain reconstruction is important for the assembly, disassembly and delivery functionalities of a packed proteinaceus capsid. However, to date, the self-association of RNA molecules is still an open problem. Recent chemical probing reports provide, with high reliability, the secondary structure of diverse RNA ensembles, such as those of viral genomes. Here, we present a method for reconstructing the complete 3D structure of RNA genomes, which combines a coarse-grained model with a subdomain composition scheme to obtain the entire genome inside proteinaceus capsids based on secondary structures from experimental techniques. Despite the amount of sampling involved in the folded and also unfolded RNA molecules, advanced microscope techniques can provide points of anchoring, which enhance our model to include interactions between capsid pentamers and RNA subdomains. To test our method, we tackle the satellite tobacco mosaic virus (STMV) genome, which has been widely studied by both experimental and computational communities. We provide not only a methodology to structurally analyze the tertiary conformations of the RNA genome inside capsids, but a flexible platform that allows the easy implementation of features/descriptors coming from both theoretical and experimental approaches.
    Keywords:  RNA genome; RNA secondary structure; RNA tertiary structure; STMV
  15. Science. 2021 Aug 27. 373(6558): 1047-1051
      RNA molecules adopt three-dimensional structures that are critical to their function and of interest in drug discovery. Few RNA structures are known, however, and predicting them computationally has proven challenging. We introduce a machine learning approach that enables identification of accurate structural models without assumptions about their defining characteristics, despite being trained with only 18 known RNA structures. The resulting scoring function, the Atomic Rotationally Equivariant Scorer (ARES), substantially outperforms previous methods and consistently produces the best results in community-wide blind RNA structure prediction challenges. By learning effectively even from a small amount of data, our approach overcomes a major limitation of standard deep neural networks. Because it uses only atomic coordinates as inputs and incorporates no RNA-specific information, this approach is applicable to diverse problems in structural biology, chemistry, materials science, and beyond.
  16. IEEE Trans Vis Comput Graph. 2021 Aug 26. PP
      DNA nanostructures offer promising applications, particularly in the biomedical domain, as they can be used for targeted drug delivery, construction of nanorobots, or as a basis for molecular motors. One of the most prominent techniques for assembling these structures is DNA origami. Nowadays, desktop applications are used for the in silico design of such structures. However, as such structures are often spatially complex, their assembly and analysis are complicated. Since virtual reality (VR) was proven to be advantageous for such spatial-related tasks and there are no existing VR solutions focused on this domain, we propose Vivern, a VR application that allows domain experts to design and visually examine DNA origami nanostructures. Our approach presents different abstracted visual representations of the nanostructures, various color schemes, and an ability to place several DNA nanostructures and proteins in one environment, thus allowing for the detailed analysis of complex assemblies. We also present two novel examination tools, the Magic Scale Lens and the DNA Untwister, that allow the experts to visually embed different representations into local regions to preserve the context and support detailed investigation. To showcase the capabilities of our solution, prototypes of novel nanodevices conceptualized by our collaborating experts, such as DNA-protein hybrid structures and DNA origami superstructures, are presented. Finally, the results of two rounds of evaluations are summarized. They demonstrate the advantages of our solution, especially for scenarios where current desktop tools are very limited, while also presenting possible future research directions.
  17. Ultramicroscopy. 2021 Aug 18. pii: S0304-3991(21)00163-7. [Epub ahead of print]230 113383
      The effect of chromatic aberration (CC) on the spatial resolution in transmission electron microscopy (TEM) was studied in thick specimens in which the sample becomes the limiting factor in the resolution. The sample influences the energy spread of the electron beam, allows only a limited electron dose, and modulates electron scattering events. The experimental set-up consisted of a thin silicon nitride membrane and a silicon wedge containing gold nanoparticles. The resolution was measured as a function of electron dose and sample thickness for different sample configurations and for different microscopy modalities including regular TEM, energy filtered TEM (EFTEM) and CC-corrected TEM. Comparison with an analytical model aided the understanding of the experimental data applied over varied conditions. The general trend for all microscopy modalities was a transition from a noise-limited resolution at low electron dose to a CC-limited resolution at high-dose in the absence of beam blurring. EFTEM required an accurate energy slit offset and an optimal energy spread to energy-slit width ratio to surpass regular TEM. The key advantage of CC correction appeared to be the best possible resolution for larger sample thickness at low electron dose outperforming EFTEM by about fifty percent. Several hypothetical sample configurations relevant to liquid phase electron microscopy were evaluated as well to demonstrate the capabilities of the analytical model and to determine the most optimal microscopy modality for this type of experiment. The analytical model included an automated optimization of the EFTEM settings and may aid in optimizing the sample-limited resolution for experimental analysis and planning.
    Keywords:  Aberration correction; Chromatic aberration; EFTEM; Liquid cell TEM; TEM; electron scattering
  18. PLoS One. 2021 ;16(8): e0256691
      Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.
  19. Methods Mol Biol. 2021 ;2365 43-58
      Proteins are essential molecules with a diverse range of functions; elucidating their biological and biochemical characteristics can be difficult and time consuming using in vitro and/or in vivo methods. Additionally, in vivo protein-ligand binding site elucidation is unable to keep place with current growth in sequencing, leaving the majority of new protein sequences without known functions. Therefore, the development of new methods, which aim to predict the protein-ligand interactions and ligand-binding site residues directly from amino acid sequences, is becoming increasingly important. In silico prediction can utilise either sequence information, structural information or a combination of both. In this chapter, we will discuss the broad range of methods for ligand-binding site prediction from protein structure and we will describe our method, FunFOLD3, for the prediction of protein-ligand interactions and ligand-binding sites based on template-based modelling. Additionally, we will describe the step-by-step instructions using the FunFOLD3 downloadable application along with examples from the Critical Assessment of Techniques for Protein Structure Prediction (CASP) where FunFOLD3 has been used to aid ligand and ligand-binding site prediction. Finally, we will introduce our newer method, FunFOLD3-D, a version of FunFOLD3 which aims to improve template-based protein-ligand binding site prediction through the integration of docking, using AutoDock Vina.
    Keywords:  Critical Assessment of Techniques for Protein Structure Prediction (CASP); Docking; Ligand-binding site prediction; Protein–ligand interactions; Template-based modelling
  20. J Chem Theory Comput. 2021 Aug 27.
      Time-lagged independent component analysis (tICA) is a widely used dimension reduction method for the analysis of molecular dynamics (MD) trajectories and has proven particularly useful for the construction of protein dynamics Markov models. It identifies those "slow" collective degrees of freedom onto which the projections of a given trajectory show maximal autocorrelation for a given lag time. Here we ask how much information on the actual protein dynamics and, in particular, the free energy landscape that governs these dynamics the tICA-projections of MD-trajectories contain, as opposed to noise due to the inherently stochastic nature of each trajectory. To answer this question, we have analyzed the tICA-projections of high dimensional random walks using a combination of analytical and numerical methods. We find that the projections resemble cosine functions and strongly depend on the lag time, exhibiting strikingly complex behavior. In particular, and contrary to previous studies of principal component projections, the projections change noncontinuously with increasing lag time. The tICA-projections of selected 1 μs protein trajectories and those of random walks are strikingly similar, particularly for larger proteins, suggesting that these trajectories contain only little information on the energy landscape that governs the actual protein dynamics. Further the tICA-projections of random walks show clusters very similar to those observed for the protein trajectories, suggesting that clusters in the tICA-projections of protein trajectories do not necessarily reflect local minima in the free energy landscape. We also conclude that, in addition to the previous finding that certain ensemble properties of nonconverged protein trajectories resemble those of random walks; this is also true for their time correlations.
  21. Comput Biol Med. 2021 Aug 18. pii: S0010-4825(21)00566-7. [Epub ahead of print]137 104772
      The prediction of interactions in protein networks is very critical in various biological processes. In recent years, scientists have focused on computational approaches to predict the interactions of proteins. In protein-protein interaction (PPI) networks, each protein is accompanied by various features, including amino acid sequence, subcellular location, and protein domains. Embedding-based methods have been widely applied for many network analysis tasks, such as link prediction. The Deepwalk algorithm is one of the most popular graph embedding methods that capture the network structure using pure random walking. Here in this paper, we treat the protein-protein interaction prediction problem as a link prediction in attributed networks, and we use an attributed embedding approach to predict the interactions between proteins in the PPI network. In particular, the present paper seeks to present a modified version of Deepwalk based on feature selection for solving link prediction in the protein-protein interaction, which will benefit both network structure and protein features. More specifically the feature selection step consists of two distinct parts. First, a set of relevant features are selected from the original feature set, such that the dimensionality of features is reduced. Second, in the selected set of features, each feature is assigned with a weight based on its significance and therefore the contribution of each feature is distinguished from others. In this method, the new random walk model for link prediction will be introduced by integrating network structure and protein features, based on the assumption that two nodes on the network will be linked since they are nearby in the network. In order to justify the proposal, the authors carry out many experiments on protein-protein interaction networks for comparison with the state-of-the-art network embedding methods. The experimental results from the graphs indicate that our proposed approach is more capable compared to other link prediction approaches and increases the accuracy of prediction.
    Keywords:  Feature selection; Graph embedding; Link prediction; Protein-protein interaction network; Random walk
  22. J Mol Biol. 2021 Aug 18. pii: S0022-2836(21)00441-1. [Epub ahead of print] 167208
      Accurate predictions of the three-dimensional structures of proteins from their amino acid sequences have come of age. AlphaFold, a deep learning-based approach to protein structure prediction, shows remarkable success in independent assessments of prediction accuracy. A significant epoch in structural bioinformatics was the structural annotation of over 98% of protein sequences in the human proteome. Interestingly, many predictions feature regions of very low confidence, and these regions largely overlap with intrinsically disordered regions (IDRs). That over 30% of regions within the proteome are disordered is congruent with estimates that have been made over the past two decades, as intense efforts have been undertaken to generalize the structure-function paradigm to include the importance of conformational heterogeneity and dynamics. With structural annotations from AlphaFold in hand, there is the temptation to draw inferences regarding the "structures" of IDRs and their interactomes. Here, we offer a cautionary note regarding the misinterpretations that might ensue and highlight efforts that provide concrete understanding of sequence-ensemble-function relationships of IDRs. This perspective is intended to emphasize the importance of IDRs in sequence-function relationships (SERs) and to highlight how one might go about extracting quantitative SERs to make sense of how IDRs function.
    Keywords:  AlphaFold; Cautionary Notes; intrinsically disordered proteins
  23. Chem Rev. 2021 Aug 27.
      Small molecule drug discovery has been propelled by the continual development of novel scientific methodologies to occasion therapeutic advances. Although established biophysical methods can be used to obtain information regarding the molecular mechanisms underlying drug action, these approaches are often inefficient, low throughput, and ineffective in the analysis of heterogeneous systems including dynamic oligomeric assemblies and proteins that have undergone extensive post-translational modification. Native mass spectrometry can be used to probe protein-small molecule interactions with unprecedented speed and sensitivity, providing unique insights into polydisperse biomolecular systems that are commonly encountered during the drug discovery process. In this review, we describe potential and proven applications of native MS in the study of interactions between small, drug-like molecules and proteins, including large multiprotein complexes and membrane proteins. Approaches to quantify the thermodynamic and kinetic properties of ligand binding are discussed, alongside a summary of gas-phase ion activation techniques that have been used to interrogate the structure of protein-small molecule complexes. We additionally highlight some of the key areas in modern drug design for which native mass spectrometry has elicited significant advances. Future developments and applications of native mass spectrometry in drug discovery workflows are identified, including potential pathways toward studying protein-small molecule interactions on a whole-proteome scale.
  24. J Chem Inf Model. 2021 Aug 26.
      Small-molecule docking remains one of the most valuable computational techniques for the structure prediction of protein-small-molecule complexes. It allows us to study the interactions between compounds and the protein receptors they target at atomic detail in a timely and efficient manner. Here, we present a new protocol in HADDOCK (High Ambiguity Driven DOCKing), our integrative modeling platform, which incorporates homology information for both receptor and compounds. It makes use of HADDOCK's unique ability to integrate information in the simulation to drive it toward conformations, which agree with the provided data. The focal point is the use of shape restraints derived from homologous compounds bound to the target receptors. We have developed two protocols: in the first, the shape is composed of dummy atom beads based on the position of the heavy atoms of the homologous template compound, whereas in the second, the shape is additionally annotated with pharmacophore data for some or all beads. For both protocols, ambiguous distance restraints are subsequently defined between those beads and the heavy atoms of the ligand to be docked. We have benchmarked the performance of these protocols with a fully unbound version of the widely used DUD-E (Database of Useful Decoys-Enhanced) dataset. In this unbound docking scenario, our template/shape-based docking protocol reaches an overall success rate of 81% when a reliable template can be identified (which was the case for 99 out of 102 complexes in the DUD-E dataset), which is close to the best results reported for bound docking on the DUD-E dataset.
  25. Biochem Soc Trans. 2021 Aug 27. 49(4): 1555-1565
      Many receptors are able to undergo heteromerisation, leading to the formation of receptor complexes that may have pharmacological profiles distinct from those of the individual receptors. As a consequence of this, receptor heteromers can be classed as new drug targets, with the potential for achieving greater specificity and selectivity over targeting their constituent receptors. We have developed the Receptor-Heteromer Investigation Technology (Receptor-HIT), which enables the detection of receptor heteromers using a proximity-based reporter system such as bioluminescence resonance energy transfer (BRET). Receptor-HIT detects heteromers in live cells and in real time, by utilising ligand-induced signals that arise from altered interactions with specific biomolecules, such as ligands or proteins. Furthermore, monitoring the interaction between the receptors and the specific biomolecules generates functional information about the heteromer that can be pharmacologically quantified. This review will discuss various applications of Receptor-HIT, including its use with different classes of receptors (e.g. G protein-coupled receptors (GPCRs), receptor tyrosine kinases (RTKs) and others), its use to monitor receptor interactions both intracellularly and extracellularly, and also its use with genome-edited endogenous proteins.
    Keywords:  BRET; G-protein-coupled receptors; intracellular signaling; receptors
  26. Microsc Res Tech. 2021 Aug 27.
      Transmission electron microscopy (TEM) is an important analysis technique to visualize (bio)macromolecules and their assemblies, including collagen fibers. Many protocols for TEM sample preparation of collagen involve one or more washing steps to remove excess salts from the dispersion that could hamper analysis when dried on a TEM grid. Such protocols are not standardized and washing times as well as washing solvents vary from procedure to procedure, with each research group typically having their own protocol. Here, we investigate the influence of washing with water, ethanol, but also methanol and 2-propanol, for both mineralized and unmineralized collagen samples via a protocol based on centrifugation. Washing with water maintains the hydrated collagen structure and the characteristic banding pattern can be clearly observed. Conversely, washing with ethanol results in dehydration of the fibrils, often leading to aggregation of the fibers and a less obvious banding pattern, already within 1 min of ethanol exposure. As we show, this process is fully reversible. Similar observations were made for methanol and propanol. Based on these results, a standardized washing protocol for collagenous samples is proposed.
    Keywords:  collagen mineralization; sample preparation; transmission electron microscopy
  27. Mol Biol Cell. 2021 Aug 25. mbcE21050257
      The elucidation of a protein's interaction/association network is important for defining its biological function. Mass spectrometry-based proteomic approaches have emerged as powerful tools for identifying protein-protein interactions (PPIs) and protein-protein associations (PPAs). However, interactome/association experiments are difficult to interpret considering the complexity and abundance of data that is generated. Although tools have been developed to quantitatively identify protein interactions/associations, there is still a pressing need for easy-to-use tools that allow users to contextualize their results. To address this, we developed CANVS, a computational pipeline that cleans, analyzes, and visualizes mass spectrometry-based interactome/association data. CANVS is wrapped as an interactive Shiny dashboard, allowing users to easily interface with the pipeline. With simple requirements, users can analyze complex experimental data and create PPI/A networks. The application integrates systems biology databases like BioGRID and CORUM to contextualize the results. Furthermore, CANVS features a Gene Ontology tool that allows users to identify relevant GO terms in their results and create visual networks with proteins associated with relevant GO terms. Overall, CANVS is an easy-to-use application that benefits all researchers, especially those who lack an established bioinformatic pipeline and are interested in studying interactome/association data.
  28. Int J Infect Dis. 2021 Aug 24. pii: S1201-9712(21)00687-1. [Epub ahead of print]
      Placental malaria is a public health burden particularly in Africa as it causes severe symptoms and results in stillbirths or maternal deaths. Plasmodium falciparum protein VAR2CSA drives placental malaria (PM) in pregnant women by adhering to chondroitin sulfate A (CSA) on the placenta. VAR2CSA is a primary vaccine candidate for PM with two vaccines based on it already under clinical trials. The first cryo-EM three-dimensional structure of Pf CSA-VAR2CSA complex revealed crucial interacting residues considered to be highly conserved across P. falciparum strains. In the current study, we have conducted a global sequence analysis of 1,114 VAR2CSA field isolate sequences from more than nine countries across three continents revealing numerous mutations in CSA-binding residues. Further, structural mapping has revealed significant polymorphisms in the ligand binding surfaces. The variants from this limited set of 1,114 sequences highlight the concerns that are vital in current considerations for development of vaccines based-on VAR2CSA for placental malaria.
    Keywords:  Placental malaria; Placental malaria vaccine; VAR2CSA; field isolates, sequence analysis, structural mapping
  29. Nanomaterials (Basel). 2021 Aug 08. pii: 2024. [Epub ahead of print]11(8):
      Although it has been exploited since the late 1900s to study hybrid perovskite materials, nuclear magnetic resonance (NMR) spectroscopy has only recently received extraordinary research attention in this field. This very powerful technique allows the study of the physico-chemical and structural properties of molecules by observing the quantum mechanical magnetic properties of an atomic nucleus, in solution as well as in solid state. Its versatility makes it a promising technique either for the atomic and molecular characterization of perovskite precursors in colloidal solution or for the study of the geometry and phase transitions of the obtained perovskite crystals, commonly used as a reference material compared with thin films prepared for applications in optoelectronic devices. This review will explore beyond the current focus on the stability of perovskites (3D in bulk and nanocrystals) investigated via NMR spectroscopy, in order to highlight the chemical flexibility of perovskites and the role of interactions for thermodynamic and moisture stabilization. The exceptional potential of the vast NMR tool set in perovskite structural characterization will be discussed, aimed at choosing the most stable material for optoelectronic applications. The concept of a double-sided characterization in solution and in solid state, in which the organic and inorganic structural components provide unique interactions with each other and with the external components (solvents, additives, etc.), for material solutions processed in thin films, denotes a significant contemporary target.
    Keywords:  bulk; cation; characterization; dynamics; films; halide; interactions; ligands; nanocrystals; nuclear magnetic resonance; perovskite; solutions; stability; structure
  30. RSC Med Chem. 2021 Aug 18. 12(8): 1325-1351
      Peptides are a growing therapeutic class due to their unique spatial characteristics that can target traditionally "undruggable" protein-protein interactions and surfaces. Despite their advantages, peptides must overcome several key shortcomings to be considered as drug leads, including their high conformational flexibility and susceptibility to proteolytic cleavage. As a general approach for overcoming these challenges, macrocyclization of a linear peptide can usually improve these characteristics. Their synthetic accessibility makes peptide macrocycles very attractive, though traditional synthetic methods for macrocyclization can be challenging for peptides, especially for head-to-tail cyclization. This review provides an updated summary of the available macrocyclization chemistries, such as traditional lactam formation, azide-alkyne cycloadditions, ring-closing metathesis as well as unconventional cyclization reactions, and it is structured according to the obtained functional groups. Keeping peptide chemistry and screening in mind, the focus is given to reactions applicable in solution, on solid supports, and compatible with contemporary screening methods.
  31. BMC Bioinformatics. 2021 Aug 24. 22(Suppl 3): 415
      BACKGROUND: Plant long non-coding RNAs (lncRNAs) play vital roles in many biological processes mainly through interactions with RNA-binding protein (RBP). To understand the function of lncRNAs, a fundamental method is to identify which types of proteins interact with the lncRNAs. However, the models or rules of interactions are a major challenge when calculating and estimating the types of RBP.RESULTS: In this study, we propose an ensemble deep learning model to predict plant lncRNA-protein interactions using stacked denoising autoencoder and convolutional neural network based on sequence and structural information, named PRPI-SC. PRPI-SC predicts interactions between lncRNAs and proteins based on the k-mer features of RNAs and proteins. Experiments proved good results on Arabidopsis thaliana and Zea mays datasets (ATH948 and ZEA22133). The accuracy rates of ATH948 and ZEA22133 datasets were 88.9% and 82.6%, respectively. PRPI-SC also performed well on some public RNA protein interaction datasets.
    CONCLUSIONS: PRPI-SC accurately predicts the interaction between plant lncRNA and protein, which plays a guiding role in studying the function and expression of plant lncRNA. At the same time, PRPI-SC has a strong generalization ability and good prediction effect for non-plant data.
    Keywords:  Convolutional neural network; Stacked denoising autoencoder; k-Mer; lncRNA-protein
  32. Membranes (Basel). 2021 Jul 21. pii: 549. [Epub ahead of print]11(8):
      Protein crystallization still remains mostly an empirical science, as the production of crystals with the required quality for X-ray analysis is dependent on the intensive screening of the best protein crystallization and crystal's derivatization conditions. Herein, this demanding step was addressed by the development of a high-throughput and low-budget microfluidic platform consisting of an ion exchange membrane (117 Nafion® membrane) sandwiched between a channel layer (stripping phase compartment) and a wells layer (feed phase compartment) forming 75 independent micro-contactors. This microfluidic device allows for a simultaneous and independent screening of multiple protein crystallization and crystal derivatization conditions, using Hen Egg White Lysozyme (HEWL) as the model protein and Hg2+ as the derivatizing agent. This microdevice offers well-regulated crystallization and subsequent crystal derivatization processes based on the controlled transport of water and ions provided by the 117 Nafion® membrane. Diffusion coefficients of water and the derivatizing agent (Hg2+) were evaluated, showing the positive influence of the protein drop volume on the number of crystals and crystal size. This microfluidic system allowed for crystals with good structural stability and high X-ray diffraction quality and, thus, it is regarded as an efficient tool that may contribute to the enhancement of the proteins' crystals structural resolution.
    Keywords:  Nafion® membrane; membrane contactors; protein crystallization; protein structure; solute diffusion
  33. Comput Struct Biotechnol J. 2021 Aug 19.
      Safer and more-effective drugs are urgently needed to counter infections with the highly pathogenic SARS-CoV-2, cause of the COVID-19 pandemic. Identification of efficient inhibitors to treat and prevent SARS-CoV-2 infection is a predominant focus. Encouragingly, using X-ray crystal structures of therapeutically relevant drug targets (PLpro, Mpro, RdRp, and S glycoprotein) offers a valuable direction for anti-SARS-CoV-2 drug discovery and lead optimization through direct visualization of interactions. Computational analyses based primarily on MMPBSA calculations have also been proposed for assessing the binding stability of biomolecular structures involving the ligand and receptor. In this study, we focused on state-of-the-art X-ray co-crystal structures of the abovementioned targets complexed with newly identified small-molecule inhibitors (natural products, FDA-approved drugs, candidate drugs, and their analogues) with the assistance of computational analyses to support the precision design and screening of anti-SARS-CoV-2 drugs.
    Keywords:  3CLpro, 3C-Like protease; ACE2, angiotensin-converting enzyme 2; COVID-19, coronavirus disease 2019; Candidate drugs; Co-crystal structures; DyKAT, dynamic kinetic asymmetric transformation; EBOV, Ebola virus; EC50, half maximal effective concentration; EMD, Electron Microscopy Data; FDA, U.S. Food and Drug Administration; FDA-approved drugs; HCoV-229E, human coronavirus 229E; HPLC, high-performance liquid chromatography; IC50, half maximal inhibitory concentration; MD, molecular dynamics; MERS-CoV, Middle East respiratory syndrome coronavirus; MMPBSA, molecular mechanics Poisson-Boltzmann surface area; MTase, methyltransferase; Mpro, main protease; Natural products; Nsp, nonstructural protein; PDB, Protein Data Bank; PLpro, papain-like protease; RTP, ribonucleoside triphosphate; RdRp, RNA-dependent RNA polymerase; SAM, S-adenosylmethionine; SARS-CoV, severe acute respiratory syndrome coronavirus; SARS-CoV-2; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; SI, selectivity index; Ugi-4CR, Ugi four-component reaction; cryo-EM, cryo-electron microscopy
  34. Phys Biol. 2021 Aug 25.
      In this work we use a Discrete Markov Chain (DMC) approach combined with network centrality measures to identify and predict the location of active sites in globular proteins. To accomplish this, we use a three-dimensional network of protein Cα atoms as nodes connected through weighted edges which represent the varying interaction degree between protein's atoms. We compute the mean first passage time matrix H = {Hji} for this Markov chain and evaluate the averaged number of steps `Hje to reach single node njin order to identify such residues that, on the average, are at the least distant from every other node. We also carry out a graph theory analysis to evaluate closeness centrality Cc, betweenness centrality Cb and eigenvector centrality Ce measures which provide relevant information about the connectivity structure and topology of the Cα protein networks. Finally we also performed an analysis of equivalent random and regular networks of the same size N in terms of the average path length L and the average clustering coefficient `Ce comparing these with the corresponding values for Cα protein networks. Our results show that the mean-first passage time matrix H and its related quantity `Hje together with Cc, Cb and Ce can not only predict with relative high accuracy the location of active sites in globular proteins but also exhibit a high feasibility to use them to predict the existence of new regions in protein's structure to identify new potential binding or catalytic activity or, in some cases, the presence of new allosteric pathways.
    Keywords:  Hitting time matrix; active sites; discrete Markov chains; globular proteins; network centrality
  35. J Am Chem Soc. 2021 Aug 25.
      Cyanine (Cy) dyes are among the most useful organic fluorophores that have found a wide range of applications in single-molecule and super-resolution imaging as well as in other biophysical studies. However, recent observations that blueshifted derivatives of Cy dyes are formed via photoconversion have raised concerns as to the potential artifacts in multicolor imaging. Here, we report the mechanism for the photoconversion of Cy5 to Cy3 that occurs upon photoexcitation during fluorescent imaging. Our studies show that the formal C2H2 excision from Cy5 occurs mainly through an intermolecular pathway involving a combination of bond cleavage and reconstitution while unambiguously confirming the identity of the fluorescent photoproduct of Cy5 to be Cy3 using various spectroscopic tools. The carbonyl products generated from singlet oxygen-mediated photooxidation of Cy5 undergo a sequence of carbon-carbon bond-breaking and -forming events to bring about the novel dye-to-dye transformation. We also show that the deletion of a two-methine unit from the polymethine chain, which results in the formation of blueshifted products, commonly occurs in other cyanine dyes, such as Alexa Fluor 647 (AF647) and Cyanine5.5. The formation of a blueshifted congener dye can obscure the multicolor fluorescence imaging, leading to misinterpretation of the data. We demonstrate that the potentially deleterious photoconversion, however, can be exploited to develop a new photoactivation method for high-density single-particle tracking in a living cell without using UV illumination and cell-toxic additives.
  36. Acc Chem Res. 2021 Aug 24.
      ConspectusIn the past two decades, a DNA-encoded chemical library (DEL or DECL) has emerged and has become a major technology platform for ligand discovery in drug discovery as well as in chemical biology research. Although based on a simple concept, i.e., encoding each compound with a unique DNA tag in a combinatorial chemical library, DEL has been proven to be a powerful tool for interrogating biological targets by accessing vast chemical space at a fraction of the cost of traditional high-throughput screening (HTS). Moreover, the recent technological advances and rapid developments of DEL-compatible reactions have greatly enhanced the chemical diversity of DELs. Today, DELs have been adopted by nearly all major pharmaceutical companies and are also gaining momentum in academia. However, this field is heavily biased toward library encoding and synthesis, and an underexplored aspect of DEL research is the selection methods. Generally, DEL selection is considered to be a massive binding assay conducted over an immobilized protein to identify the physical binders using the typical bind-wash-elute procedure. In recent years, we and other research groups have developed new approaches that can perform DEL selections in the solution phase, which has enabled the selection against complex biological targets beyond purified proteins. On the one hand, these methods have significantly widened the target scope of DELs; on the other hand, they have enabled the functional and potentially phenotypic assays of DELs beyond simple binding. An overview of these methods is provided in this Account.Our laboratory has been using DNA-programmed affinity labeling (DPAL) as the main strategy to develop new DEL selection methods. DPAL is based on DNA-templated synthesis; by using a known ligand to guide the target binding, DPAL is able to specifically establish a stable linkage between the target protein and the ligand. The DNA tag of the target-ligand conjugates serves as a programmable handle for protein characterization or hit compound decoding in the case of DEL selections. DPAL also takes advantage of the fast reaction kinetics of photo-cross-linking to achieve high labeling specificity and fidelity, especially in the selection of DNA-encoded dynamic libraries (DEDLs). DPAL has enabled DEL selections not only in buffer and cell lysates but also with complex biological systems, such as large protein complexes and live cells. Moreover, this strategy has also been employed in other biological applications, such as site-specific protein labeling, protein detection, protein profiling, and target identification. In the Account, we describe these methods, highlight their underlying principles, and conclude with perspectives of the development of the DEL technology.