bims-lances Biomed News
on Landscapes from Cryo-EM and Simulations
Issue of 2024–04–07
seven papers selected by
James M. Krieger, National Centre for Biotechnology



  1. bioRxiv. 2024 Mar 19. pii: 2024.03.18.585544. [Epub ahead of print]
      Molecules are essential building blocks of life and their different conformations (i.e., shapes) crucially determine the functional role that they play in living organisms. Cryogenic Electron Microscopy (cryo-EM) allows for acquisition of large image datasets of individual molecules. Recent advances in computational cryo-EM have made it possible to learn latent variable models of conformation landscapes. However, interpreting these latent spaces remains a challenge as their individual dimensions are often arbitrary. The key message of our work is that this interpretation challenge can be viewed as an Independent Component Analysis (ICA) problem where we seek models that have the property of identifiability. That means, they have an essentially unique solution, representing a conformational latent space that separates the different degrees of freedom a molecule is equipped with in nature. Thus, we aim to advance the computational field of cryo-EM beyond visualizations as we connect it with the theoretical framework of (nonlinear) ICA and discuss the need for identifiable models, improved metrics, and benchmarks. Moving forward, we propose future directions for enhancing the disentanglement of latent spaces in cryo-EM, refining evaluation metrics and exploring techniques that leverage physics-based decoders of biomolecular systems. Moreover, we discuss how future technological developments in time-resolved single particle imaging may enable the application of nonlinear ICA models that can discover the true conformation changes of molecules in nature. The pursuit of interpretable conformational latent spaces will empower researchers to unravel complex biological processes and facilitate targeted interventions. This has significant implications for drug discovery and structural biology more broadly. More generally, latent variable models are deployed widely across many scientific disciplines. Thus, the argument we present in this work has much broader applications in AI for science if we want to move from impressive nonlinear neural network models to mathematically grounded methods that can help us learn something new about nature.
    DOI:  https://doi.org/10.1101/2024.03.18.585544
  2. bioRxiv. 2024 Mar 14. pii: 2024.03.13.584744. [Epub ahead of print]
      During formation of the transcription-competent open complex (RPo) by bacterial RNA polymerases (RNAP), transient intermediates pile up before overcoming a rate-limiting step. Structural descriptions of these interconversions in real time are unavailable. To address this gap, time-resolved cryo-electron microscopy (cryo-EM) was used to capture four intermediates populated 120 or 500 milliseconds (ms) after mixing Escherichia coli σ70-RNAP and the λPR promoter. Cryo-EM snapshots revealed the upstream edge of the transcription bubble unpairs rapidly, followed by stepwise insertion of two conserved nontemplate strand (nt-strand) bases into RNAP pockets. As nt-strand "read-out" extends, the RNAP clamp closes, expelling an inhibitory σ70 domain from the active-site cleft. The template strand is fully unpaired by 120 ms but remains dynamic, indicating yet unknown conformational changes load it in subsequent steps. Because these events likely describe DNA opening at many bacterial promoters, this study provides needed insights into how DNA sequence regulates steps of RPo formation.
    DOI:  https://doi.org/10.1101/2024.03.13.584744
  3. Brief Bioinform. 2024 Mar 27. pii: bbae137. [Epub ahead of print]25(3):
      The dynamics and variability of protein conformations are directly linked to their functions. Many comparative studies of X-ray protein structures have been conducted to elucidate the relevant conformational changes, dynamics and heterogeneity. The rapid increase in the number of experimentally determined structures has made comparison an effective tool for investigating protein structures. For example, it is now possible to compare structural ensembles formed by enzyme species, variants or the type of ligands bound to them. In this study, the author developed a multilevel model for estimating two covariance matrices that represent inter- and intra-ensemble variability in the Cartesian coordinate space. Principal component analysis using the two estimated covariance matrices identified the inter-/intra-enzyme variabilities, which seemed to be important for the enzyme functions, with the illustrative examples of cytochrome P450 family 2 enzymes and class A $\beta$-lactamases. In P450, in which each enzyme has its own active site of a distinct size, an active-site motion shared universally between the enzymes was captured as the first principal mode of the intra-enzyme covariance matrix. In this case, the method was useful for understanding the conformational variability after adjusting for the differences between enzyme sizes. The developed method is advantageous in small ensemble-size problems and hence promising for use in comparative studies on experimentally determined structures where ensemble sizes are smaller than those generated, for example, by molecular dynamics simulations.
    Keywords:  EM algorithm; covariance matrix; principal component analysis; random effects model; structural superposition
    DOI:  https://doi.org/10.1093/bib/bbae137
  4. J Phys Chem Lett. 2024 Apr 03. 3938-3945
      Biased enhanced sampling methods that utilize collective variables (CVs) are powerful tools for sampling conformational ensembles. Due to their large intrinsic dimensions, efficiently generating conformational ensembles for complex systems requires enhanced sampling on high-dimensional free energy surfaces. While temperature-accelerated molecular dynamics (TAMD) can trivially adopt many CVs in a simulation, unbiasing the simulation to generate unbiased conformational ensembles requires accurate modeling of a high-dimensional CV probability distribution, which is challenging for traditional density estimation techniques. Here we propose an unbiasing method based on the score-based diffusion model, a deep generative learning method that excels in density estimation across complex data landscapes. We demonstrate that this unbiasing approach, tested on multiple TAMD simulations, significantly outperforms traditional unbiasing methods and can generate accurate unbiased conformational ensembles. With the proposed approach, TAMD can adopt CVs that focus on improving sampling efficiency and the proposed unbiasing method enables accurate evaluation of ensemble averages of important chemical features.
    DOI:  https://doi.org/10.1021/acs.jpclett.3c03515
  5. J Chem Inf Model. 2024 Apr 02.
      Understanding the conformational dynamics of proteins, such as the inward-facing (IF) and outward-facing (OF) transition observed in transporters, is vital for elucidating their functional mechanisms. Despite significant advances in protein structure prediction (PSP) over the past three decades, most efforts have been focused on single-state prediction, leaving multistate or alternative conformation prediction (ACP) relatively unexplored. This discrepancy has led to the development of highly accurate PSP methods such as AlphaFold, yet their capabilities for ACP remain limited. To investigate the performance of current PSP methods in ACP, we curated a data set, named IOMemP, consisting of 32 experimentally determined high-resolution IF and OF structures of 16 membrane proteins with substantial conformational changes. We benchmarked 12 representative PSP methods, along with two recent multistate methods based on AlphaFold, against this data set. Our findings reveal a remarkably consistent preference for specific states across various PSP methods. We elucidated how coevolution information in MSAs influences state preference. Moreover, we showed that AlphaFold, when excluding coevolution information, estimated similar energies between the experimental IF and OF conformations, indicating that the energy model learned by AlphaFold is not biased toward any particular state. Our IOMemP data set and benchmark results are anticipated to advance the development of robust ACP methods.
    DOI:  https://doi.org/10.1021/acs.jcim.3c01936
  6. Biophys J. 2024 Apr 02. pii: S0006-3495(24)00247-9. [Epub ahead of print]
      Conformational dynamics of RNA plays crucial for variety of cellular functions including acting as regulators of gene expression to being molecular scaffolds and sensors. The liquid-liquid phase separation of RNAs and the formation of stress granules partly relies on RNA's conformational plasticity and its ability to engage in multivalent interactions. Recent experiments with homopolymeric and low-complexity RNAs have revealed significant differences in phase separations due to differences in base chemistry of RNA units. We hypothesize that differences in RNA phase-transition dynamics stem from the differences in conformational dynamics of single RNA chains. To test this hypothesis we have employed atomistic simulations and deep dimensionality reduction techniques to map temperature dependent conformational free energy landscapes for homopolymeric RNA. Temperature dependent conformational energy landscapes of RNAs reveal a plethora of metastable states, populations of which are highly base dependent. Through detailed analysis base, phosphate and sugar interactions we show that experimentally observed temperature-driven shifts in metastable state populations align with experimental phase diagrams for homopolymer RNAs. Specifically, we finding that thermodynamics of folding of homopolymeric RNA follows the Poly(G) > Poly(A) > Poly(C) > Poly(U) order of stability which is mirroring the propensities for homotypic RNA phase-separation. Thus, the work establishes a microscopic framework to reason about base-specific RNA propensity for phase separation by analyzing single chain conformational energy landscapes.
    DOI:  https://doi.org/10.1016/j.bpj.2024.04.003
  7. Phys Rev Lett. 2024 Mar 22. 132(12): 128001
      The computer simulation of many molecular processes is complicated by long timescales caused by rare transitions between long-lived states. Here, we propose a new approach to simulate such rare events, which combines transition path sampling with enhanced exploration of configuration space. The method relies on exchange moves between configuration and trajectory space, carried out based on a generalized ensemble. This scheme substantially enhances the efficiency of the transition path sampling simulations, particularly for systems with multiple transition channels, and yields information on thermodynamics, kinetics and reaction coordinates of molecular processes without distorting their dynamics. The method is illustrated using the isomerization of proline in the KPTP tetrapeptide.
    DOI:  https://doi.org/10.1103/PhysRevLett.132.128001