bims-lances 2024-10-27 papers

Nat Methods. 2024 Oct 21.

DiffModeler: large macromolecular structure modeling for cryo-EM maps using a diffusion model.

Xiao Wang, Han Zhu, Genki Terashi, Manav Taluja, Daisuke Kihara.

Cryogenic electron microscopy (cryo-EM) has now been widely used for determining multichain protein complexes. However, modeling a large complex structure, such as those with more than ten chains, is challenging, particularly when the map resolution decreases. Here we present DiffModeler, a fully automated method for modeling large protein complex structures. DiffModeler employs a diffusion model for backbone tracing and integrates AlphaFold2-predicted single-chain structures for structure fitting. DiffModeler showed an average template modeling score of 0.88 and 0.91 for two datasets of cryo-EM maps of 0-5 Å resolution and 0.92 for intermediate resolution maps (5-10 Å), substantially outperforming existing methodologies. Further benchmarking at low resolutions (10-20 Å) confirms its versatility, demonstrating plausible performance.

DOI: https://doi.org/10.1038/s41592-024-02479-0

Bioinformatics. 2024 Oct 21. pii: btae627. [Epub ahead of print]

Weighted families of contact maps to characterize conformational ensembles of (highly-)flexible proteins.

Javier González-Delgado, Pau Bernadó, Pierre Neuvial, Juan Cortés.

MOTIVATION: Characterizing the structure of flexible proteins, particularly within the realm of intrinsic disorder, presents a formidable challenge due to their high conformational variability. Currently, their structural representation relies on (possibly large) conformational ensembles derived from a combination of experimental and computational methods. The detailed structural analysis of these ensembles is a difficult task, for which existing tools have limited effectiveness.
RESULTS: This study proposes an innovative extension of the concept of contact maps to the ensemble framework, incorporating the intrinsic probabilistic nature of disordered proteins. Within this framework, a conformational ensemble is characterized through a weighted family of contact maps. To achieve this, conformations are first described using a refined definition of contact that appropriately accounts for the geometry of the inter-residue interactions and the sequence context. Representative structural features of the ensemble naturally emerge from the subsequent clustering of the resulting contact-based descriptors. Importantly, transiently-populated structural features are readily identified within large ensembles. The performance of the method is illustrated by several use cases and compared with other existing approaches, highlighting its superiority in capturing relevant structural features of highly flexible proteins.
AVAILABILITY AND IMPLEMENTATION: An open-source implementation of the method is provided together with an easy-to-use Jupyter notebook, available at https://gitlab.laas.fr/moma/WARIO.
SUPPLEMENTARY INFORMATION: Implementation details and additional results are provided in (ADD LINK TO SUPP. INFO. FILE).

Keywords: Clustering; Conformational ensembles; Contact maps; Intrinsically disordered regions; Protein flexibility

DOI: https://doi.org/10.1093/bioinformatics/btae627

Biol Psychol. 2024 Oct 19. pii: S0301-0511(24)00151-0. [Epub ahead of print]193 108891

Supervised structure learning.

Karl J Friston, Lancelot Da Costa, Alexander Tschantz, Alex Kiefer, Tommaso Salvatori, Victorita Neacsu, Magnus Koudahl, Conor Heins, Noor Sajid, Dimitrije Markovic, Thomas Parr, Tim Verbelen, Christopher L Buckley.

This paper concerns structure learning or discovery of discrete generative models. It focuses on Bayesian model selection and the assimilation of training data or content, with a special emphasis on the order in which data are ingested. A key move-in the ensuing schemes-is to place priors on the selection of models, based upon expected free energy. In this setting, expected free energy reduces to a constrained mutual information, where the constraints inherit from priors over outcomes (i.e., preferred outcomes). The resulting scheme is first used to perform image classification on the MNIST dataset to illustrate the basic idea, and then tested on a more challenging problem of discovering models with dynamics, using a simple sprite-based visual disentanglement paradigm and the Tower of Hanoi (cf., blocks world) problem. In these examples, generative models are constructed autodidactically to recover (i.e., disentangle) the factorial structure of latent states-and their characteristic paths or dynamics.

Keywords: Active inference; Active learning; Bayesian model selection; Disentanglement; Expected free energy; Planning as inference; Structure learning

DOI: https://doi.org/10.1016/j.biopsycho.2024.108891