J Proteomics. 2025 Apr 21. pii: S1874-3919(25)00067-3. [Epub ahead of print] 105440
Intensity-based absolute quantification (iBAQ) is essential in proteomics as it allows for the assessment of a protein's absolute abundance in various samples or conditions. However, the computation of these values for increasingly large-scale and high-throughput experiments, such as those using DIA, TMT, or LFQ workflows, poses significant challenges in scalability and reproducibility. Here, we present ibaqpy (https://github.com/bigbio/ibaqpy), a Python package designed to compute iBAQ values efficiently for experiments of any scale. Ibaqpy leverages the Sample and Data Relationship Format (SDRF) metadata standard to incorporate experimental metadata into the quantification workflow. This allows for automatic normalization and batch correction while accounting for key aspects of the experimental design, such as technical and biological replicates, fractionation strategies, and sample conditions. Designed for large-scale proteomics datasets, ibaqpy can also recompute iBAQ values for existing experiments when an SDRF is available. We showcased ibaqpy's capabilities by reanalyzing 17 public proteomics datasets from ProteomeXchange, covering HeLa cell lines with 4921 samples and 5766 MS runs, quantifying a total of 11,014 proteins. In our reanalysis, ibaqpy is a key component in automating reproducible quantification, reducing manual effort and making quantitative proteomics more accessible while supporting FAIR principles for data reuse. SIGNIFICANCE: Proteomics studies often rely on intensity-based absolute quantification (iBAQ) to assess protein abundance across various biological conditions. Despite its widespread use, computing iBAQ values at scale remains challenging due to the increasing complexity and volume of proteomics experiments. Existing tools frequently lack metadata integration, limiting their ability to handle experimental design intricacies such as replicates, fractions, and batch effects. Our work introduces ibaqpy, a scalable Python package that leverages the Sample and Data Relationship Format (SDRF) to compute iBAQ values efficiently while incorporating critical experimental metadata. By enabling automated normalization and batch correction, ibaqpy ensures reproducible and comparable quantification across large-scale datasets. We validated the utility of ibaqpy through the reanalysis of 17 public HeLa datasets, comprising over 200 million peptide features and quantifying 11,000 proteins across thousands of samples. This comprehensive reanalysis highlights the robustness and scalability of ibaqpy, making it an essential tool for researchers conducting large-scale proteomics experiments. Moreover, by promoting FAIR principles for data reuse and interoperability, ibaqpy offers a transformative approach to baseline protein quantification, supporting reproducible research and data integration within the proteomics community.
Keywords: Big data; Bioinformatics; Data integration; Proteomics; Quantification