bioRxiv. 2025 Dec 11. pii: 2025.12.09.693231. [Epub ahead of print]
Human Genome Structural Variation Consortium
Centromeres are essential for accurate chromosome segregation during cell division, yet their highly repetitive sequence has historically hindered their complete assembly and characterization. Consequently, the full spectrum of centromere diversity across individuals, populations, and evolutionary contexts remains largely unexplored. Here, we address this gap in knowledge by assembling and characterizing 2,110 complete human centromeres from a diverse cohort of individuals representing 5 continental and 28 population groups. By developing a novel suite of bioinformatic tools tailored for centromeric regions, we uncover previously unknown variation within centromeres, including 226 novel centromere haplotypes and 1,870 new α-satellite higher-order repeat (HOR) variants. We find that mobile element insertions are present in 30% of centromeres, with chromosome 16 harboring Alu elements within the kinetochore site at an 11-fold higher frequency than expected. While most centromeres have a single kinetochore site, 6% of them have di-kinetochores, and <<1% have tri-kinetochores, which we confirm with long-read CENP-A CUT&RUN, DiMeLo-seq, and multi-generational inheritance. We further show that the position of the kinetochore is not random and is, instead, closely associated with the underlying sequence and structure of the centromere. To understand the nature of evolutionary change, we compared 2,110 complete human centromeres to 5,747 complete centromeres recently assembled from the Human Pangenome Reference Consortium. We show that centromeres have a >50-fold variation in mutation rate, with the most rapidly mutating centromeres on chromosome 1 and the slowest mutating centromeres on chromosome Y. Additionally, a subset of centromeres show evidence of introgression from archaic hominins, shaping their sequence, structure, and evolutionary history. We validate these centromere mutation rates in a four-generation family, spanning 28 family members and 483 accurately assembled centromeres, and show that the kinetochore site is the most rapidly mutating region in the centromere, with twofold more single-nucleotide variants than the rest of the centromeric α-satellite HOR array on average. We propose a model that reveals an 'arms race' between centromeric sequence and proteins, with frequent mutations within the site of the kinetochore that lead to changes in genetic and epigenetic landscapes and, ultimately, rapid evolution of these critically important regions.