Brief Bioinform. 2026 May 04. pii: bbag247. [Epub ahead of print]27(3):
Liquid chromatography-mass spectrometry (LC-MS) and nuclear magnetic resonance (NMR) spectroscopy are complementary analytical techniques widely used in proteomics, metabolomics, and structural biology. Both generate high-dimensional, noisy spectra where overlapping peaks complicate interpretation. LC-MS relies on retention time (RT) separation before mass analysis, while multidimensional NMR spreads information across chemical-shift axes to reduce congestion. However, comparative or replicate experiments often introduce RT shifts in LC-MS or frequency shifts in NMR, hindering accurate matching of corresponding features. In some experiments, such as variable-temperature NMR, the shifts are intentionally triggered, and frequency tracking provides important information. In any case, a robust, scalable alignment across runs is critical for reliable compound identification, quantification, and structural analysis. We propose a truncated Wasserstein distance-based algorithm for aligning LC-MS and NMR spectra. By constraining maximum transport distance and formulating alignment as a minimum-cost flow problem solved via the Network Simplex algorithm, our method accelerates computation, suppresses spurious matches, and improves robustness to noise. On benchmark LC-MS datasets, it achieved 0.97 precision, 0.96 recall, and a 0.6-s runtime, outperforming OpenMS and DeepRTAlign tools. For NMR data, the algorithm proved effective in 2D, 4D, and even 7D analyses. The algorithm is implemented in wnetalign with supporting modules wnet and pylmcf, available on PyPI and GitHub under permissive licenses: https://github.com/michalsta/pylmcf, https://github.com/michalsta/wnet, https://github.com/michalsta/wnetalign.
Keywords: Wasserstein distance; liquid chromatography–mass spectrometry; network simplex algorithm; nuclear magnetic resonance spectroscopy; optimal transport; spectra alignment