Mol Cell. 2026 May 19. pii: S1097-2765(26)00275-3. [Epub ahead of print]
Yabo Guo,
Ti Qin,
Jiancheng Luo,
Qiannan Pan,
Runguo Shu,
Ruiyang Guo,
Jiajia Qian,
Chenyang Xu,
Jiawei Wang,
Ziyi Wang,
Nanxiang Zheng,
Hao Li,
Xiaogang Guo,
Xiongwen Cao,
Yong Wang,
Shan Zhang.
Thousands of non-canonical open reading frames (ORFs) in the human transcriptome are translated into microproteins, many with ribosome occupancy comparable to canonical proteins. Intriguingly, most microproteins fail to accumulate as stable proteins; instead, their derived peptides are widely presented by human leukocyte antigen class I (HLA-I) molecules and show emerging immunomodulatory roles. To understand the underlying biology, we explored the folding and stability landscape of a large microprotein cohort, revealing a fundamental rule that connects the genetic code, protein folding, and stability. Structural modeling and parallel profiling revealed that most microproteins are intrinsically disordered and rapidly degraded. Mechanistically, the high GC content of microprotein-coding sequences, which facilitates non-canonical translation, enriches for residues encoded by multiple GC-rich codons (primarily glycine, arginine, alanine, and proline), thereby promoting structural disorder and terminal-residue motif-mediated, Cullin-RING E3 ubiquitin ligase (CRL)-dependent proteasomal degradation. Together, our findings establish a concise, quantitative rule by which high GC content constrains protein evolvability, revealing how surveillance machinery differentially targets microproteins versus canonical proteins.
Keywords: GC content; genetic code; lncORF; microprotein; non-canonical translation; protein stability; protein structure; uORF