Nat Rev Genet. 2026 Feb 17.
Genome annotation captures the essence of a genome by cataloguing its genes, transcripts, proteins and other functional elements of the DNA sequence. Accurate annotation serves as the foundation for a wide range of downstream analyses and discoveries, ranging from basic biology to an understanding of the linkage between genes and disease. Over the past two decades, advances in high-throughput sequencing techniques have enabled faster and more accurate capture of diverse genomic features, generating data at an unprecedented scale. Concurrently, computational methods for translating these data into evidence for genome annotation have steadily improved, leading to better automated genome annotation systems. As such, the growing number of sequenced genomes provides a positive feedback loop, in which database searches become more effective and shared sequence patterns emerge more clearly. These advances are promising steps towards annotating the functions of many poorly understood genes, particularly non-coding RNA genes, for which more research is needed.