Metabolomics. 2025 Dec 01. 22(1):
7
INTRODUCTION: Metabolite identification remains a bottleneck in untargeted liquid chromatography-tandem mass spectrometry (LC-MS) metabolomics studies, particularly when the underlying metabolite is absent in the tandem mass spectrometry (MS/MS) databases.
OBJECTIVE: A new approach, formula subset analysis (FSA), was developed to effectively prescreen and rank the chemical formula candidates for an MS/MS spectrum.
METHODS: This approach first computes mother-daughter relationships (MDRs) among possible formulas of fragments and the precursor under a given mass tolerance and then determines the characteristic fragments (CFs) that only present one MDR with the precursor and other fragments. Subsequently, the precursor formula candidates are ranked by the scores derived from the number of MDRs.
RESULTS: A numerical study using eight large datasets totaling 30,690 MS/MS spectra from 6792 metabolites consisting of C, H, O, N, S, and P showed that FSA ranked the correct chemical formula as the top-1 candidate for a metabolite in 85.28% of the cases and in the top-5 candidates in 97.35% of the cases. The average processing time for each spectrum was 0.024 s. Moreover, FSA does not require training data, not rely on MS/MS databases, can be applied to a wide mass range, and can be quickly expanded with more chemical elements and formulas to identify different chemical species.
CONCLUSIONS: FSA has not utilized structural information yet and therefore its accuracy may not be competitive with some of the state-of-the-art identification tools. However, its advantages in speed, expandability, and applicability, make it suitable for prescreening candidates in untargeted LC-MS metabolomics studies.
Keywords: Formula ranking; LC-MS/MS; Metabolomics; Mother-daughter relationship