bims-strubi Biomed News
on Advances in structural biology
Issue of 2021–09–19
nineteen papers selected by
Alessandro Grinzato, European Synchrotron Radiation Facility



  1. Proteins. 2021 Sep 17.
      CASP is a community experiment to advance methods of computing three-dimensional protein structure from amino acid sequence. Core components are rigorous blind testing of methods and evaluation of the results by independent assessors. In the most recent experiment (CASP14) deep learning methods from one research group consistently delivered computed structures rivalling the corresponding experimental ones in accuracy. In this sense, the results represent a solution to the classical protein folding problem, at least for single proteins. The models have already been shown to be capable of providing solutions for problematic crystal structures, and there are broad implications for the rest of structural biology. Other research groups also substantially improved performance. Here we describe these results and outline some of the many implications. Other related areas of CASP, including modeling of protein complexes, structure refinement, estimation of model accuracy, and prediction of inter-residue contacts and distances, are also described.
    Keywords:  CASP; Community Wide Experiment; Protein Structure Prediction
    DOI:  https://doi.org/10.1002/prot.26237
  2. J Chem Theory Comput. 2021 Sep 15.
      It has been challenging to obtain reliable free energies for protein conformational changes from all-atom molecular dynamics simulations, despite the availability of many enhanced sampling techniques. To alleviate the difficulties associated with the enormous complexity of the conformational space, here we propose a few practical strategies for such calculations, including (1) a stringent method to examine convergence by comparing independent simulations starting from different initial coordinates, (2) adoption of multistep schemes in which the complete conformational change consists of multiple transition steps, each sampled using a distinct reaction coordinate, and (3) application of boundary restraints to simplify the conformational space. We demonstrate these strategies on the conformational changes between the outward-facing and outward-occluded states of the Mhp1 membrane transporter, obtaining the equilibrium thermodynamics of the relevant metastable states, the kinetic rates between these states, and the reactive trajectories that reveal the atomic details of spontaneous transitions. Our approaches thus promise convergent and reliable calculations to examine intuition-based hypotheses and to eventually elucidate the underlying molecular mechanisms of reversible conformational changes in complex protein systems.
    DOI:  https://doi.org/10.1021/acs.jctc.1c00585
  3. J Struct Biol. 2021 Sep 11. pii: S1047-8477(21)00096-4. [Epub ahead of print] 107791
      Cryo-electron tomography is the highest resolution tool available for structural analysis of macromolecular complexes within their native cellular environments. At present, data acquisition suffers from low throughput, in part due to the low probability of positioning a cell such that the subcellular structure of interest is on a region of the electron microscopy (EM) grid that is suitable for imaging. Here, we photo-micropatterned EM grids to optimally position endothelial cells so as to enable high-throughput imaging of cell-cell contacts. Lattice micropatterned grids increased the average distance between intercellular contacts and the thicker cell nuclei such that the regions of interest were sufficiently thin for direct imaging. We observed a diverse array of membranous and cytoskeletal structures at intercellular contacts, demonstrating the utility of this technique in enhancing the rate of data acquisition for cellular cryo-electron tomography studies.
    Keywords:  bioengineering; cryo-EM; cryo-ET; micropatterning
    DOI:  https://doi.org/10.1016/j.jsb.2021.107791
  4. BMC Bioinformatics. 2021 Sep 15. 22(1): 439
       BACKGROUND: Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising.
    RESULTS: In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches.
    CONCLUSIONS: By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.
    Keywords:  Ab initio prediction; Protein structure prediction; Protein structure prediction web service; Template-based modeling
    DOI:  https://doi.org/10.1186/s12859-021-04353-8
  5. Nat Commun. 2021 Sep 15. 12(1): 5465
      Peptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.
    DOI:  https://doi.org/10.1038/s41467-021-25772-4
  6. Proteins. 2021 Sep 13.
      The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions. Novel emerging approaches include (i) geometric learning, i.e. learning on representations such as graphs, 3D Voronoi tessellations, and point clouds; (ii) pre-trained protein language models leveraging attention; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; (vi) and finally truly end-to-end architectures, i.e. differentiable models starting from a sequence and returning a 3D structure. Here, we provide an overview and our opinion of the novel deep learning approaches developed in the last two years and widely used in CASP14.
    Keywords:  CASP14; deep learning; end-to-end architectures; equivariance; geometric learning; protein language models; protein structure prediction
    DOI:  https://doi.org/10.1002/prot.26235
  7. Prog Biophys Mol Biol. 2021 Sep 11. pii: S0079-6107(21)00105-X. [Epub ahead of print]
      Although determination of structures of biological molecules became a real possibility after the first X-ray analyses of crystals by the William Henry Bragg and his son Lawrence in 1913, the crystal structure determination of globular proteins became a possibility only in 1934 with the demonstration of X-ray diffraction from pepsin by J D Bernal and Dorothy Crowfoot, later Hodgkin, who had realised the importance of maintaining an aqueous environment for proteins in crystals. After a further 20 years of hard work by Max Perutz, John Kendrew and others the structures of haemoglobin and myoglobin emerged. Further innovation resulted in a revolution in X-ray diffraction studies in the 1960s, which focused first on polypeptides with alpha helix, beta strand and collagen polyproline helix structures, described in a review by David Davies in 1965 in the journal Progress in Biophysics, later to become Progress in Biophysics and Molecular Biology. It was followed in 1969 by a further detailed review by Tony North and David Phillips in the same journal on crystal structure analyses of globular proteins that successfully emerged soon after that of myoglobin. These included the structure of the first enzyme, lysozyme, followed by structures of chymotrypsin, trypsin, carboxypeptidase and many others. This first resolution revolution in X-ray analysis described in the two reviews is the subject of this retrospective analysis just over five decades later.
    Keywords:  Abbreviations; Polypeptides; Proteins; X-ray diffraction
    DOI:  https://doi.org/10.1016/j.pbiomolbio.2021.09.002
  8. Protein J. 2021 Sep 12.
      Protein Structure Prediction (PSP) is considered to be a complicated problem in computational biology. In spite of, the remarkable progress made by the co-evolution-based method in PSP, it is still a challenging and unresolved problem. Recently, along with co-evolutionary relationships, deep learning approaches have been introduced in PSP that lead to significant progress. In this paper a novel methodology using deep ResNet architecture for predicting inter-residue distance and dihedral angles is proposed, that aims to generate 125 homologous sequences in an average from a set of customized sequence database. These sequences are used to generate input features. As an outcome of neural networks, a pool of structures is generated from which the lowest potential structure is chosen as the final predicted 3-D protein structure. The proposed method is trained using 6521 protein sequences extracted from Protein Data Bank (PDB). For testing 48 protein sequences whose residue length is less than 400 residues are chosen from the 13th Critical Assessment of protein Structure Prediction (CASP 13) dataset are used. The model is compared with Alphafold, Zhang, and RaptorX. The template modeling (TM) score is used to evaluate the accuracy of the estimated structure. The proposed method produces better performances for 52% of the target sequences while that of Alphafold, Zhang, RaptorX were 10%, 22.9%, and 6% respectively. Additionally, for 37.5% target sequences, the proposed method was able to achieve accuracy greater than or equal to 0.80. The TM score obtained for the sequences under consideration were 0.69, 0.67, 0.65, and 0.58 respectively for the proposed method, Alphafold, Zhang, and RaptorX.
    Keywords:  3-D protein structure prediction; CASP; Deep ResNet Architecture; Distance prediction; Experimental and Computational techniques; Protein
    DOI:  https://doi.org/10.1007/s10930-021-10016-7
  9. J Struct Biol. 2021 Sep 14. pii: S1047-8477(21)00103-9. [Epub ahead of print] 107798
      A rapid assay is described, based upon the Marangoni effect, which detects the formation of a denatured-protein film at the air-water interface (AWI) of aqueous samples. This assay requires no more than a 20 µL aliquot of sample, at a protein concentration of no more than1 mg/ml, and it can be performed with any buffer that is used to prepare grids for electron cryo-microscopy (cryo-EM). In addition, this assay provides an easy way to estimate the rate at which a given protein forms such a film at the AWI. Use of this assay is suggested as a way to pre-screen the effect of various additives and chemical modifications that one might use to optimize the preparation of grids, although the final proof of optimization still requires further screening of grids in the electron microscope. In those cases when the assay establishes that a given protein does form a sacrificial, denatured-protein monolayer, it is suggested that subsequent optimization strategies might focus on discovering how to improve the adsorption of native proteins onto that monolayer, rather than to prevent its formation. A second alternative might be to bind such proteins to the surface of rationally designed affinity grids, in order to prevent their diffusion to, and unwanted interaction with, the AWI.
    DOI:  https://doi.org/10.1016/j.jsb.2021.107798
  10. J Chem Inf Model. 2021 Sep 15.
      Alchemical free energy methods, such as free energy perturbation (FEP) and thermodynamic integration (TI), become increasingly popular and crucial for drug design and discovery. However, the system preparation of alchemical free energy simulation is an error-prone, time-consuming, and tedious process for a large number of ligands. To address this issue, we have recently presented CHARMM-GUI Free Energy Calculator that can provide input and postprocessing scripts for NAMD and GENESIS FEP molecular dynamics systems. In this work, we extended three submodules of Free Energy Calculator to work with the full suite of GPU-accelerated alchemical free energy methods and tools in AMBER, including input and postprocessing scripts. The BACE1 (β-secretase 1) benchmark set was used to validate the AMBER-TI simulation systems and scripts generated by Free Energy Calculator. The overall results of relatively large and diverse systems are almost equivalent with different protocols (unified and split) and with different timesteps (1, 2, and 4 fs), with R2 > 0.9. More importantly, the average free energy differences between two protocols are small and reliable with four independent runs, with a mean unsigned error (MUE) below 0.4 kcal/mol. Running at least four independent runs for each pair with AMBER20 (and FF19SB/GAFF2.1/OPC force fields), we obtained a MUE of 0.99 kcal/mol and root-mean-square error of 1.31 kcal/mol for 58 alchemical transformations in comparison with experimental data. In addition, a set of ligands for T4-lysozyme was used to further validate our free energy calculation protocol whose results are close to experimental data (within 1 kcal/mol). In summary, Free Energy Calculator provides a user-friendly web-based tool to generate the AMBER-TI system and input files for high-throughput binding free energy calculations with access to the full set of GPU-accelerated alchemical free energy, enhanced sampling, and analysis methods in AMBER.
    DOI:  https://doi.org/10.1021/acs.jcim.1c00747
  11. J Chem Inf Model. 2021 Sep 15.
      Cryo-electron microscopy (cryo-EM) single-particle image analysis is a powerful technique to resolve structures of biomacromolecules, while the challenge is that the cryo-EM image is of a low signal-to-noise ratio. For both two-dimensional image analysis and three-dimensional density map analysis, image alignment is an important step to improve the precision of the image distance calculation. In this paper, we introduce a new algorithm for performing two-dimensional pairwise alignment for cryo-EM particle images, which is based on the Fourier transform and power spectrum analysis. Compared to the existing heuristic iterative alignment methods, our method utilizes the signal distribution and signal feature on images' power spectrum to directly compute the alignment parameters. It does not require iterative computations and is robust against the cryo-EM image noise. Both theoretical analysis and experimental results suggest that our power-spectrum-feature-based alignment method is highly computational-efficient and is capable of offering effective alignment results. This new alignment algorithm is publicly available at: www.csbio.sjtu.edu.cn/bioinf/EMAF/for academic use.
    DOI:  https://doi.org/10.1021/acs.jcim.1c00745
  12. J Chem Theory Comput. 2021 Sep 13.
      Alchemical free energy methods have become indispensable in computational drug discovery for their ability to calculate highly accurate estimates of protein-ligand affinities. Expanded ensemble (EE) methods, which involve single simulations visiting all of the alchemical intermediates, have some key advantages for alchemical free energy calculation. However, there have been relatively few examples published in the literature of using expanded ensemble simulations for free energies of protein-ligand binding. In this paper, as a test of expanded ensemble methods, we compute relative binding free energies using the Open Force Field Initiative force field (codename "Parsley") for 24 pairs of Tyk2 inhibitors derived from a congeneric series of 16 compounds. The EE predictions agree well with the experimental values (root-mean-square error (RMSE) of 0.94 ± 0.13 kcal mol-1 and mean unsigned error (MUE) of 0.75 ± 0.12 kcal mol-1). We find that while increasing the number of alchemical intermediates can improve the phase space overlap, faster convergence can be obtained with fewer intermediates, as long as acceptance rates are sufficient. We also find that convergence can be improved using more aggressive updating of biases, and that estimates can be improved by performing multiple independent EE calculations. This work demonstrates that EE is a viable option for alchemical free energy calculation. We discuss the implications of these findings for rational drug design, as well as future directions for improvement.
    DOI:  https://doi.org/10.1021/acs.jctc.1c00513
  13. Adv Mater. 2021 Sep 12. e2102991
      Cryogenic-electron microscopy (cryo-EM) is the preferred method to determine 3D structures of proteins and to study diverse material systems that intrinsically have radiation or air sensitivity. Current cryo-EM sample preparation methods provide limited control over the sample quality, which limits the efficiency and high throughput of 3D structure analysis. This is partly because it is difficult to control the thickness of the vitreous ice that embeds specimens, in the range of nanoscale, depending on the size and type of materials of interest. Thus, there is a need for fine regulation of the thickness of vitreous ice to deliver consistent high signal-to-noise ratios for low-contrast biological specimens. Herein, an advanced silicon-chip-based device is developed which has a regular array of micropatterned holes with a graphene oxide (GO) window on freestanding silicon nitride (Six Ny ). Accurately regulated depths of micropatterned holes enable precise control of vitreous ice thickness. Furthermore, GO window with affinity for biomolecules can facilitate concentration of the sample molecules at a higher level. Incorporation of micropatterned chips with a GO window enhances cryo-EM imaging for various nanoscale biological samples including human immunodeficiency viral particles, groEL tetradecamers, apoferritin octahedral, aldolase homotetramer complexes, and tau filaments, as well as inorganic materials.
    Keywords:  cryogenic-electron microscope; graphene oxide; microelectromechanical systems; nanomaterials; vitreous ice thickness
    DOI:  https://doi.org/10.1002/adma.202102991
  14. J Chem Theory Comput. 2021 Sep 13.
      The nonpolarizable CHARMM force field is one of the most widely used energy functions for all-atom biomolecular simulations. Chloride is the only halide ion included in the latest version, CHARMM36m, and is used widely in simulation studies, often as an electrolyte ion but also as the biological substrate of transport proteins and enzymes. Here, we find that existing parameters systematically underestimate the interaction of Cl- with proteins and lipids. Accordingly, when examined in solution, little to no Cl-association can be observed with most components of the protein, including backbone, polar side chains and aromatic rings. The strength of the interaction with cationic side chains and with alkali ions is also incongruent with experimental measurements, specifically osmotic coefficients of concentrated solutions. Consistent with these findings, a 4-μs trajectory of the Cl--specific transport protein CLC-ec1 shows irreversible Cl- dissociation from the so-called Scen binding site, even in a 150 mM NaCl buffer. To correct for these deficiencies, we formulate a series of pair-specific Lennard-Jones parameters that override those resulting from the conventional Lorentz-Berthelot combination rules. These parameters, referred to as NBFIX, are systematically calibrated against available experimental data as well as ab initio geometry optimizations and energy evaluations, for a wide set of binary and ternary Cl- complexes with protein and lipid analogs and alkali cations. Analogously, we also formulate parameter sets for the other three biological halide ions, namely, fluoride, bromide, and iodide. The resulting parameters are used to calculate the potential of mean force defining the interaction of each anion and each of the protein and lipid analogues in bulk water, revealing association free energies in the range of -0.3 to -3.3 kcal/mol, with the F- complexes being the least stable. The NBFIX corrections also preserve the Cl- occupancy of CLC-ec1 in a second 4-μs trajectory. We posit that these optimized molecular-mechanics models provide a more realistic foundation for all-atom simulation studies of processes entailing changes in hydration, recognition, or transport of halide anions.
    DOI:  https://doi.org/10.1021/acs.jctc.1c00550
  15. J Biomol NMR. 2021 Sep 15.
      Protein-ligand interaction is one of the highlights of molecular recognition. The most popular application of this type of interaction is drug development which requires a high throughput screening of a ligand that binds to the target protein. Our goal was to find a binding ligand with a simple detection, and once this type of ligand was found, other methods could then be used to measure the detailed kinetic or thermodynamic parameters. We started with the idea that the ligand NMR signal would disappear if it was bound to the non-tumbling mass. In order to create the non-tumbling mass, we tried the aggregates of a target protein, which was fused to the elastin-like polypeptide. We chose the maltose binding proteinas a test case, and we tried it with several sugars, which included maltose, glucose, sucrose, lactose, galactose, maltotriose, and β-cyclodextrin. The maltose signal in the H-1 NMR spectrum disappeared completely as hoped around the protein to ligand ratio of 1:3 at 298 K where the proteins aggregated. The protein signals also disappeared upon aggregation except for the fast-moving part, which resulted in a cleaner background than the monomeric form. Since we only needed to look for a disappearing signal amongst those from the mixture, it should be useful in high throughput screening. Other types of sugars except for the maltotriose and β-cyclodextrin, which are siblings of the maltose, did not seem to bind at all. We believe that our system would be especially more effective when dealing with a smaller target protein, so both the protein and the bound ligand would lose their signals only when the aggregates formed. We hope that our proposed method would contribute to accelerating the development of the potent drug candidates by simultaneously identifying several binders directly from a mixture.
    Keywords:  Ligand screening; NMR; Protein aggregates
    DOI:  https://doi.org/10.1007/s10858-021-00381-x
  16. Brief Bioinform. 2021 Sep 17. pii: bbab376. [Epub ahead of print]
      Protein post-translational modification (PTM) is an important regulatory mechanism that plays a key role in both normal and disease states. Acetylation on lysine residues is one of the most potent PTMs owing to its critical role in cellular metabolism and regulatory processes. Identifying protein lysine acetylation (Kace) sites is a challenging task in bioinformatics. To date, several machine learning-based methods for the in silico identification of Kace sites have been developed. Of those, a few are prokaryotic species-specific. Despite their attractive advantages and performances, these methods have certain limitations. Therefore, this study proposes a novel predictor STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION), containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed 11 different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the predicted values from baseline models were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION. Comparative benchmarking experiments showed that STALLION significantly outperformed existing predictor on independent tests. To expedite direct accessibility to the STALLION models, a user-friendly online predictor was implemented, which is available at: http://thegleelab.org/STALLION.
    Keywords:  bioinformatics; feature optimization; lysine acetylation sites; machine learning; performance assessment; stacking strategy
    DOI:  https://doi.org/10.1093/bib/bbab376
  17. Biophys J. 2021 Sep 09. pii: S0006-3495(21)00749-9. [Epub ahead of print]
      This article bemoans the demise of truly modular open-source image processing systems, such as SPIDER, in recent years' development of tools for three-dimensional reconstruction in cryo-electron microscopy. Instead, today's users have to rely on the functionality of software systems that have little or no transparency. As a consequence, users of such packages no longer gain a conceptual understanding and intuitive grasp of the analytical routes leading from the stream of input data to the final density map. Possible remedies of this situation with free software are discussed.
    Keywords:  3D reconstruction; electron microscopy; image processing; molecular structure; teaching tools
    DOI:  https://doi.org/10.1016/j.bpj.2021.09.015
  18. J Chem Inf Model. 2021 Sep 14.
      Relative binding free energy calculations in drug design are becoming a useful tool in facilitating lead binding affinity optimization in a cost- and time-efficient manner. However, they have been limited by technical challenges such as the manual creation of large numbers of input files to set up, run, and analyze free energy simulations. In this Application Note, we describe FEPrepare, a novel web-based tool, which automates the setup procedure for relative binding FEP calculations for the dual-topology scheme of NAMD, one of the major MD engines, using OPLS-AA force field topology and parameter files. FEPrepare provides the user with all necessary files needed to run a FEP/MD simulation with NAMD. FEPrepare can be accessed and used at https://feprepare.vi-seem.eu/.
    DOI:  https://doi.org/10.1021/acs.jcim.1c00215