J Proteome Res. 2020 Aug 19.
Small open reading frame encoded proteins (SEPs) gained increasing interest during the last years due to their broad range of important functions in both, prokaryotes and eukaryotes. In bacteria, signalling, virulence or regulation of enzyme activities have been associated with SEPs. Nonetheless, the number of SEPs detected in large-scale proteome studies is often low as classical methods are biased towards the identification of larger proteins. Here, we present a workflow that allows enhanced identification of small proteins compared to traditional protocols. For this aim, the steps of small protein enrichment, proteolytic digest and database search were reviewed and adjusted to the special requirement of SEPs. Enrichment by the use of small-pore-sized solid-phase material increased the number of identified SEPs by a factor of two and the utilisation of alternative proteases to trypsin reduced spectral counts for larger proteins. The application of the optimised protocol allowed the detection of 210 already annotated proteins up to 100 amino acids length, including 16 proteins below 51 amino acids in the Gram-positive model organism Bacillus subtilis. Moreover, 12% of all identified proteins were up to 100 amino acids which is a significant larger fraction than reported in studies involving traditional proteomics workflows. Finally, the application of an integrated proteogenomics search database and extensive subsequent validation resulted in the confident identification of three novel, not yet annotated SEPs, which are 21, 26 and 42 amino acid long, respectively.