bims-strubi Biomed News
on Advances in structural biology
Issue of 2021‒09‒26
eighteen papers selected by
Alessandro Grinzato
European Synchrotron Radiation Facility


  1. Proteins. 2021 Sep 24.
      NMR studies can provide unique information about protein conformations in solution. In CASP14, three reference structures provided by solution NMR methods were available (T1027, T1029, and T1055), as well as a fourth data set of NMR-derived contacts for an integral membrane protein (T1088). For the three targets with NMR-based structures, the best prediction results ranged from very good (GDT_TS = 0.90, for T1055) to poor (GDT_TS = 0.47, for T1029). We explored the basis of these results by comparing all CASP14 prediction models against experimental NMR data. For T1027, NMR data reveal extensive internal dynamics, presenting a unique challenge for protein structure prediction methods. The analysis of T1029 motivated exploration of a novel method of "inverse structure determination", in which an AlphaFold2 model was used to guide NMR data analysis. NMR data provided to CASP predictor groups for target T1088, a 238-residue integral membrane porin, was also used to assess several NMR-assisted prediction methods. Most groups involved in this exercise generated similar beta-barrel models, with good agreement with the experimental data. However, as was also observed in CASP13, some pure prediction groups that did not use any NMR data generated models for T1088 that better fit the NMR data than the models generated using these experimental data. These results demonstrate the remarkable power of modern methods to predict structures of proteins with accuracies rivaling solution NMR structures, and that it is now possible to reliably use prediction models to guide and complement experimental NMR data analysis. This article is protected by copyright. All rights reserved.
    Keywords:  MipA; Protein structure prediction; integral membrane proteins; inverse structure determination; machine leaning; protein dynamics; solution NMR
    DOI:  https://doi.org/10.1002/prot.26246
  2. Bioinformatics. 2021 Sep 20. pii: btab666. [Epub ahead of print]
      MOTIVATION: An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are employed to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins.RESULTS: The approach was extensively validated on a large test dataset (n = 25,005) of protein structures, comprising 23,661 models of 82 non-homologous proteins and 1,344 non-homologous experimental structures. Additionally, an external dataset of 40,000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further employed for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state of the art methods in terms of Spearman's and Pearson's correlation coefficients, average GDT-TS loss, sum of z-scores, and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design.
    AVAILABILITY: http://github.com/KYZ-LSB/ProTerS-FitFun.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab666
  3. Appl Microsc. 2021 Sep 25. 51(1): 13
      The novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has arisen as a global pandemic affecting the respiratory system showing acute respiratory distress syndrome (ARDS). However, there is no targeted therapeutic agent yet and due to the growing cases of infections and the rising death tolls, discovery of the possible drug is the need of the hour. In general, the study for discovering therapeutic agent for SARS-CoV-2 is largely focused on large-scale screening with fragment-based drug discovery (FBDD). With the recent advancement in cryo-electron microscopy (Cryo-EM), it has become one of the widely used tools in structural biology. It is effective in investigating the structure of numerous proteins in high-resolution and also had an intense influence on drug discovery, determining the binding reaction and regulation of known drugs as well as leading the design and development of new drug candidates. Here, we review the application of cryo-EM in a structure-based drug design (SBDD) and in silico screening of the recently acquired FBDD in SARS-CoV-2. Such insights will help deliver better understanding in the procurement of the effective remedial solution for this pandemic.
    Keywords:  Cryo-electron microscopy; Fragment-based drug discovery; Severe acute respiratory syndrome coronavirus 2; Structure-based drug design; Transmission electron microscopy
    DOI:  https://doi.org/10.1186/s42649-021-00062-x
  4. Bioinformatics. 2021 Sep 21. pii: btab667. [Epub ahead of print]
      MOTIVATION: Accurately identifying protein-ATP binding poses is significantly valuable for both basic structure biology and drug discovery. Although many docking methods have been designed, most of them require a user-defined binding site and are difficult to achieve a high-quality protein-ATP docking result. It is critical to develop a protein-ATP-specific blind docking method without user-defined binding sites.RESULTS: Here, we present ATPdock, a template-based method for docking ATP into protein. For each query protein, if no pocket site is given, ATPdock first identifies its most potential pocket using ATPbind, an ATP-binding site predictor; then, the template pocket, which is most similar to the given or identified pocket, is searched from the database of pocket-ligand structures using APoc, a pocket structural alignment tool; thirdly, the rough docking pose of ATP (rdATP) is generated using LS-align, a ligand structural alignment tool, to align the initial ATP pose to the template ligand corresponding to template pocket; finally, the Metropolis Monte Carlo simulation is used to fine-tune the rdATP under the guidance of AutoDock Vina energy function. Benchmark tests show that ATPdock significantly outperforms other state-of-the-art methods in docking accuracy.
    AVAILABILITY: https://jun-csbio.github.io/atpdock/.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab667
  5. Methods Mol Biol. 2022 ;2364 363-425
      Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence, as well as for predicting interactions with other proteins or macromolecules. Although a wealth of structural and functional information is available for many cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice while retaining advanced functionality for more experienced computational biologists.
    Keywords:  Comparative modeling; Docking analysis; Homology modeling; Multiple sequence alignment; Protein domains; Protein-protein interactions; Proteomics; Secondary structure prediction; Sequence similarity; Structure analysis; Threading
    DOI:  https://doi.org/10.1007/978-1-0716-1661-1_19
  6. Adv Theory Simul. 2020 Jan;3(1): 1900194
      Over the past two decades, the use of fragment-based lead generation has become a common, mature approach to identify tractable starting points in chemical space for the drug discovery process. This approach naturally involves the study of the binding properties of highly heterogeneous ligands. Such datasets challenge computational techniques to provide comparable binding free energy estimates from different binding modes. The performance of a range of statistically robust ensemble-based binding free energy calculation protocols, called ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent), is evaluated. Ligands designed to target two binding pockets in the lactate dehydogenase, a target protein, which vary in size, charge, and binding mode, are studied. When compared to experimental results, excellent statistical rankings are obtained across this highly diverse set of ligands. In addition, three approaches to account for entropic contributions are investigated: 1) normal mode analysis, 2) weighted solvent accessible surface area (WSAS), and 3) variational entropy. Normal mode analysis and WSAS correlate strongly with each other-although the latter is computationally far cheaper-but do not improve rankings. Variational entropy corrects exaggerated discrimination of ligands bound in different pockets but creates three outliers which reduce the quality of the overall ranking.
    Keywords:  binding free energy calculations; fragment‐based drug design; molecular dynamics; molecular mechanics Poisson–Boltzmann surface area (MMPBSA)
    DOI:  https://doi.org/10.1002/adts.201900194
  7. Ultramicroscopy. 2021 Aug 28. pii: S0304-3991(21)00156-X. [Epub ahead of print]230 113376
      Crystal diffraction is a well-established technique for high-resolution structural analysis of material science and biological samples. However, the recovered structure is a result of averaging over all the unit cells in the crystal, which smears out the imperfections, atomic defects, or asymmetries and chiral properties of the individual molecules. We propose Bragg holography, where a nano-crystal is imaged at a defocus distance allowing separation of the diffracted beams, without turning them into peaks. The presence of a reference wave gives rise to a Bragg hologram, which can be reconstructed by conventional holographic reconstruction algorithms. The recovered complex-valued wavefront contains the complete information about the atomic distribution in the crystal, including defects. Bragg holography is demonstrated for gold nano-crystals, and its feasibility for biological nano-crystals is shown.
    Keywords:  Bacteriorhodopsin; Electron diffraction; Electron holography; Holography; Nano-crystals; Protein structure; Structural biology; TEM; Transmission electron microscopy; Twin image
    DOI:  https://doi.org/10.1016/j.ultramic.2021.113376
  8. Faraday Discuss. 2021 Sep 20.
      Peripheral membrane proteins play a major role in numerous biological processes by transiently associating with cellular membranes, often with extreme membrane specificity. Because of the short-lived nature of these interactions, molecular dynamics (MD) simulations have emerged as an appealing tool to characterize at the structural level the molecular details of the protein-membrane interface. Transferable coarse-grained (CG) MD simulations, in particular, offer the possibility to investigate the spontaneous association of peripheral proteins with lipid bilayers of different compositions at limited computational cost, but they are hampered by the lack of a reliable a priori estimation of their accuracy and thus typically require a posteriori experimental validation. In this article, we investigate the ability of the MARTINI CG force field, specifically the 3 open-beta version, to reproduce known experimental observations regarding the membrane binding behavior of 12 peripheral membrane proteins and peptides. Based on observations of multiple binding and unbinding events in several independent replicas, we found that, despite the presence of false positives and false negatives, this model is mostly able to correctly characterize the membrane binding behavior of peripheral proteins, and to identify key residues found to disrupt membrane binding in mutagenesis experiments. While preliminary, our investigations suggest that transferable chemical-specific CG force fields have enormous potential in the characterization of the membrane binding process by peripheral proteins, and that the identification of negative results could help drive future force field development efforts.
    DOI:  https://doi.org/10.1039/d0fd00058b
  9. IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep 24. PP
      Much of the recent success in protein structure prediction has been a result of accurate protein contact prediction---a binary classification problem. As an alternative, we recently proposed real-valued distance predictions, formulating the problem as a regression problem. The nuances of protein 3D structures make this formulation appropriate, allowing predictions to reflect inter-residue distances in nature. Despite these promises, the accurate prediction of real-valued distances remains relatively unexplored. To investigate if regression methods can be designed to predict real-valued distances as precisely as binary contacts, here we propose multiple novel methods of input label engineering with the goal of optimizing the distribution of distances to cater to the loss function of the deep-learning model. Our results demonstrate, for the first time, that deep learning methods for real-valued protein distance prediction can deliver distances as precise as binary classification methods. When using an optimal distance transformation function on the standard PSICOV dataset consisting of 150 representative proteins, the precision of 'top-all' long-range contacts improves from 60.9% to 61.4% when predicting real-valued distances instead of contacts. When building three-dimensional models we observed an average TM-score increase from 0.61 to 0.72, highlighting the advantage of predicting real-valued distances.
    DOI:  https://doi.org/10.1109/TCBB.2021.3115053
  10. Curr Opin Chem Biol. 2021 Sep 18. pii: S1367-5931(21)00112-5. [Epub ahead of print]65 136-144
      Since the first revelation of proteins functioning as macromolecular machines through their three dimensional structures, researchers have been intrigued by the marvelous ways the biochemical processes are carried out by proteins. The aspiration to understand protein structures has fueled extensive efforts across different scientific disciplines. In recent years, it has been demonstrated that proteins with new functionality or shapes can be designed via structure-based modeling methods, and the design strategies have combined all available information - but largely piece-by-piece - from sequence derived statistics to the detailed atomic-level modeling of chemical interactions. Despite the significant progress, incorporating data-derived approaches through the use of deep learning methods can be a game changer. In this review, we summarize current progress, compare the arc of developing the deep learning approaches with the conventional methods, and describe the motivation and concepts behind current strategies that may lead to potential future opportunities.
    Keywords:  Deep learning; Neural networks; Protein design; Protein sequence design; Protein structure; Protein structure design
    DOI:  https://doi.org/10.1016/j.cbpa.2021.08.004
  11. Drug Discov Today. 2021 Sep 21. pii: S1359-6446(21)00397-4. [Epub ahead of print]
      Artificial intelligence (AI) is often presented as a new Industrial Revolution. Many domains use AI, including molecular simulation for drug discovery. In this review, we provide an overview of ligand-protein molecular docking and how machine learning (ML), especially deep learning (DL), a subset of ML, is transforming the field by tackling the associated challenges.
    Keywords:  data representation; deep learning; machine learning; molecular docking; sampling; scoring
    DOI:  https://doi.org/10.1016/j.drudis.2021.09.007
  12. Curr Opin Struct Biol. 2021 Sep 15. pii: S0959-440X(21)00127-5. [Epub ahead of print]72 63-70
      Liquid-liquid phase separation drives the formation of biological condensates that play essential roles in transcriptional regulation and signal sensing. Computational modeling could provide high-resolution structural characterizations of these condensates and help uncover physicochemical interactions that dictate their stability. However, many protein molecules involved in phase separation often contain multiple ordered domains connected with flexible, structureless linkers. Simulating such proteins necessitates force fields with consistent accuracy for both folded and disordered proteins. We provide a critical review of existing coarse-grained force fields for disordered proteins and highlight the challenges in their application to folded proteins. After discussing existing algorithms for force field parameterization, we propose an optimization strategy that should lead to computer models with improved transferability across protein types.
    Keywords:  Coarse graining; Disordered proteins; Force field parameterization; Liquid–liquid phase separation; Protein folding
    DOI:  https://doi.org/10.1016/j.sbi.2021.08.006
  13. Bioinformatics. 2021 Sep 21. pii: btab660. [Epub ahead of print]
      MOTIVATION: Antibodies are one of the most important classes of pharmaceuticals, with over 80 approved molecules currently in use against a wide variety of diseases. The drug discovery process for antibody therapeutic candidates however is time- and cost-intensive and heavily reliant on in-vivo and in-vitro high throughput screens. Here, we introduce a framework for structure-based deep learning for antibodies (DLAB) which can virtually screen putative binding antibodies against antigen targets of interest. DLAB is built to be able to predict antibody-antigen binding for antigens with no known antibody binders.RESULTS: We demonstrate that DLAB can be used both to improve antibody-antigen docking and structure-based virtual screening of antibody drug candidates. DLAB enables improved pose ranking for antibody docking experiments as well as selection of antibody-antigen pairings for which accurate poses are generated and correctly ranked. We also show that DLAB can identify binding antibodies against specific antigens in a case study. Our results demonstrate the promise of deep learning methods for structure-based virtual screening of antibodies.
    AVAILABILITY: The DLAB source code and pre-trained models are available at https://github.com/oxpig/dlab-public.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btab660
  14. Curr Res Struct Biol. 2021 ;3 206-215
      Acetylcholinesterase (AChE) catalyzes hydrolysis of acetylcholine thereby terminating cholinergic nerve impulses for efficient neurotransmission. Human AChE (hAChE) is a target of nerve agent and pesticide organophosphorus compounds that covalently attach to the catalytic Ser203 residue. Reactivation of inhibited hAChE can be achieved with nucleophilic antidotes, such as oximes. Understanding structural and electrostatic (i.e. protonation states) determinants of the catalytic and reactivation processes is crucial to improve design of oxime reactivators. Here we report X-ray structures of hAChE conjugated with a reversible covalent inhibitor 4K-TMA (4K-TMA:hAChE) at 2.8 ​Å resolution and of 4K-TMA:hAChE conjugate with oxime reactivator methoxime, MMB4 (4K-TMA:hAChE:MMB4) at 2.6 ​Å resolution, both at physiologically relevant room temperature, as well as cryo-crystallographic structure of 4K-TMA:hAChE at 2.4 ​Å resolution. 4K-TMA acts as a substrate analogue reacting with the hydroxyl of Ser203 and generating a reversible tetrahedral hemiketal intermediate that closely resembles the first tetrahedral intermediate state during hAChE-catalyzed acetylcholine hydrolysis. Structural comparisons of room temperature with cryo-crystallographic structures of 4K-TMA:hAChE and published mAChE complexes with 4K-TMA, as well as the effect of MMB4 binding to the peripheral anionic site (PAS) of the 4K-TMA:hAChE complex, revealed only discrete, minor differences. The active center geometry of AChE, already highly evolved for the efficient catalysis, was thus indicative of only minor conformational adjustments to accommodate the tetrahedral intermediate in the hydrolysis of the neurotransmitter acetylcholine (ACh). To map protonation states in the hAChE active site gorge we collected 3.5 ​Å neutron diffraction data paving the way for obtaining higher resolution datasets that will be needed to determine locations of individual hydrogen atoms.
    Keywords:  4K-TMA; Neutron diffraction; Reversible covalent inhibitor; Room temperature; X-ray diffraction; hAChE
    DOI:  https://doi.org/10.1016/j.crstbi.2021.08.003
  15. J Phys Chem B. 2021 Sep 21.
      Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency with our new Stransitive data set, and a new machine learning based method, the first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that predicts small relative protein stability changes for missense mutations that do not significantly alter the structure. It estimates a residue's contributions toward protein stability (ΔG) in its local structural environment, and the difference between independently predicted contributions of the reference and mutant residues is reported as ΔΔG. We show that this self-consistent machine learning architecture is immune to many common biases in data sets, relies less on data than existing methods, is robust to overfitting, and can explain a substantial portion of the variance in experimental data.
    DOI:  https://doi.org/10.1021/acs.jpcb.1c04913
  16. J Chem Theory Comput. 2021 Sep 22.
      We present a methodology for defining and optimizing a general force field for classical molecular simulations, and we describe its use to derive the Open Force Field 1.0.0 small-molecule force field, codenamed Parsley. Rather than using traditional atom typing, our approach is built on the SMIRKS-native Open Force Field (SMIRNOFF) parameter assignment formalism, which handles increases in the diversity and specificity of the force field definition without needlessly increasing the complexity of the specification. Parameters are optimized with the ForceBalance tool, based on reference quantum chemical data that include torsion potential energy profiles, optimized gas-phase structures, and vibrational frequencies. These quantum reference data are computed and are maintained with QCArchive, an open-source and freely available distributed computing and database software ecosystem. In this initial application of the method, we present essentially a full optimization of all valence parameters and report tests of the resulting force field against compounds and data types outside the training set. These tests show improvements in optimized geometries and conformational energetics and demonstrate that Parsley's accuracy for liquid properties is similar to that of other general force fields, as is accuracy on binding free energies. We find that this initial Parsley force field affords accuracy similar to that of other general force fields when used to calculate relative binding free energies spanning 199 protein-ligand systems. Additionally, the resulting infrastructure allows us to rapidly optimize an entirely new force field with minimal human intervention.
    DOI:  https://doi.org/10.1021/acs.jctc.1c00571
  17. Brief Bioinform. 2021 Sep 22. pii: bbab384. [Epub ahead of print]
      MOTIVATION: The Estimation of Model Accuracy problem is a cornerstone problem in the field of Bioinformatics. As of CASP14, there are 79 global QA methods, and a minority of 39 residue-level QA methods with very few of them working on protein complexes. Here, we introduce ZoomQA, a novel, single-model method for assessing the accuracy of a tertiary protein structure/complex prediction at residue level, which have many applications such as drug discovery. ZoomQA differs from others by considering the change in chemical and physical features of a fragment structure (a portion of a protein within a radius $r$ of the target amino acid) as the radius of contact increases. Fourteen physical and chemical properties of amino acids are used to build a comprehensive representation of every residue within a protein and grade their placement within the protein as a whole. Moreover, we have shown the potential of ZoomQA to identify problematic regions of the SARS-CoV-2 protein complex.RESULTS: We benchmark ZoomQA on CASP14, and it outperforms other state-of-the-art local QA methods and rivals state of the art QA methods in global prediction metrics. Our experiment shows the efficacy of these new features and shows that our method is able to match the performance of other state-of-the-art methods without the use of homology searching against databases or PSSM matrices.
    AVAILABILITY: http://zoomQA.renzhitech.com.
    DOI:  https://doi.org/10.1093/bib/bbab384
  18. Methods. 2021 Sep 21. pii: S1046-2023(21)00222-X. [Epub ahead of print]
      Protein adenosine diphosphate-ribosylation (ADPr) is caused by the covalent binding of one or more ADP-ribose moieties to a target protein and regulates the biological functions of the target protein. To fully understand the regulatory mechanism of ADP-ribosylation, the essential step is the identification of the ADPr sites from the proteome. As the experimental approaches are costly and time-consuming, it is necessary to develop a computational tool to predict ADPr sites. Recently, serine has been found to be the major residue type for ADP-ribosylation but no predictor is available. In this study, we collected thousands of experimentally validated human ADPr sites on serine residue and constructed several different machine-learning classifiers. We found that the hybrid model, dubbed DeepSADPr, which integrated the one-dimensional convolutional neural network (CNN) with the One-Hot encoding approach and the word-embedding approach, compared favourably to other models in terms of both ten-fold cross-validation and independent test. Its AUC values reached 0.935 for ten-fold cross-validation. Its values of sensitivity, accuracy and Matthews's correlation coefficient reached 0.933, 0.867 and 0.740, respectively, with the fixed specificity value of 0.80. Overall, DeepSADPr is the first classifier for predicting Serine ADPr sites, which is available at http://www.bioinfogo.org/DeepSADPr.
    Keywords:  ADP-ribosylation; Post-translational modification; convolutional neural network; deep learning
    DOI:  https://doi.org/10.1016/j.ymeth.2021.09.008