bims-rednas Biomed News
on Repetitive DNA sequences
Issue of 2025–03–23
twelve papers selected by
Anna Zawada, International Centre for Translational Eye Research 



  1. Genome Res. 2025 Mar 20.
      Short tandem repeats (STRs) are common variations in human genomes that frequently expand or contract, causing genetic disorders, mainly when expanded. Traditional diagnostic methods for identifying these expansions, such as repeat-primed PCR and Southern blotting, are often labor-intensive, locus-specific, and are unable to precisely determine long repeat expansions. Sequencing-based methods, although capable of genome-wide detection, are limited by inaccuracy (short-read technologies) and high associated costs (long-read technologies). This study evaluated optical genome mapping (OGM) as an efficient, accurate approach for measuring STR lengths and assessing somatic stability in 85 samples with known pathogenic repeat expansions in DMPK, CNBP, and RFC1, causing myotonic dystrophy types 1 and 2 and cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS), respectively. Three workflows-manual de novo assembly, local guided assembly (local-GA), and a molecule distance script-were applied, of which the latter two were developed as part of this study to assess the repeat sizes and somatic repeat stability. OGM successfully identified 84/85 (98.8%) of the pathogenic expansions, distinguishing between wild-type and expanded alleles or between two expanded alleles in recessive cases, with greater accuracy than standard of care (SOC) for long repeats and no apparent upper size limit. Notably, OGM detected somatic instability in a subset of DMPK, CNBP, and RFC1 samples. These findings suggest OGM could advance diagnostic accuracy for large repeat expansions, providing a more comprehensive genome-wide assay for repeat expansion disorders by measuring exact repeat lengths and somatic instability across multiple loci simultaneously.
    DOI:  https://doi.org/10.1101/gr.279491.124
  2. Comput Struct Biotechnol J. 2025 ;27 705-716
      Tandem repeat sequences (TRs), a class of repetitive genomic elements, are broadly distributed in both coding and non-coding regions. Investigating the relationship between sequences and function is essential for understanding the genome. Saccharomyces cerevisiae serves as a vital model organism and is widely used as an engineered strain. Although the transcriptional regulatory functions of TRs in the promoters of S.cerevisiae have been elucidated, our understanding of their roles within coding sequences (CDS) remains limited. In this study, we integrate RNA-seq, ChIP-seq, ATAC-seq, Hi-C, and Micro-C data from S.cerevisiae to analyze the types and distribution of TRs, and their impact on gene expression. Our results indicate that genes containing short tandem repeats (STRs) in their CDS exhibit lower expression levels. Epigenetic analysis reveals that these regions are characterized by high levels of repressive histone modifications and low levels of activating marks, with reduced chromatin accessibility and fewer chromatin interactions. Furthermore, trinucleotide and hexanucleotide repeated motifs of STR are found primarily enriched in genes encoding transcriptional regulatory proteins. This study provides new insights into the functions and characteristics of STRs in the CDS of S.cerevisiae. The identification of key STR motifs offers potential targets for the design of transcriptional regulatory elements.
    Keywords:  Coding sequences; Epigenetic characteristic; Saccharomyces cerevisiae; Short tandem repeats (STRs); Transcriptional regulation
    DOI:  https://doi.org/10.1016/j.csbj.2025.02.003
  3. J Genet Genomics. 2025 Mar 13. pii: S1673-8527(25)00078-5. [Epub ahead of print]
      Short tandem repeats (STRs) modulate gene expression and contribute to trait variation. However, a systematic evaluation of the genomic characteristics of STRs has not been conducted, and their influence on gene expression in rice remains unclear. Here, we construct a map of 137,629 polymorphic STRs in the rice (Oryza sativa L.) genome using a population-scale resequencing dataset. A genome-wide survey encompassing 4,726 accessions shows that the occurrence frequency, mutational patterns, chromosomal distribution, and functional properties of STRs are correlated with the sequences and lengths of repeat motifs. Leveraging a transcriptome dataset from 127 rice accessions, we identify 44,672 expression STRs (eSTRs) by modeling gene expression in response to the length variation of STRs. These eSTRs are notably enriched in the regulatory regions of genes with active transcriptional signatures. Population analysis identifies numerous STRs that have undergone genetic divergence among different rice groups and 1,726 tagged STRs that may be associated with agronomic traits. By editing the (ACT)7 STR in OsFD1 promoter, we further experimentally validate its role in regulating gene expression and phenotype. Our study highlights the contribution of STRs to transcriptional regulation in plants and establishes the foundation for their potential use as alternative targets for genetic improvement.
    Keywords:  Gene expression; Genomic variation; Rice; STR; Short tandem repeat; Transcriptional regulation
    DOI:  https://doi.org/10.1016/j.jgg.2025.03.005
  4. Cold Spring Harb Perspect Biol. 2025 Mar 17. pii: a041694. [Epub ahead of print]
      Telomeric repeats recruit the shelterin complex to prevent activation of the double-strand break response at chromosome ends. Thousands of TTAGGG repeats are present at each chromosome end to ensure telomere function. This abundance of G-rich repeats comes with the propensity to generate unusual DNA structures. The telomere loop (t-loop) structure, generated by strand invasion of the 3' overhang in the internal repeats, contributes to telomere function. G4-DNA is promoted by the stretches of G-rich repeats in a single-stranded form and may affect telomere replication and elongation by telomerase. The intramolecular homology can lead to the formation of internal loops (i-loops) via intramolecular recombination at sites of telomeric damage, which can promote the excision of telomeric repeats as extrachromosomal circular DNA. Shelterin promotes t-loops, counteracting the accumulation of pathological structures either directly or via the recruitment of specialized helicases. Here, we will discuss the current evidence for the formation of unusual DNA structures at telomeres and possible implications for telomere function.
    DOI:  https://doi.org/10.1101/cshperspect.a041694
  5. Gigascience. 2025 Jan 06. pii: giaf013. [Epub ahead of print]14
      Oxford Nanopore Technology (ONT) sequencing is a third-generation sequencing technology that enables cost-effective long-read sequencing, with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR)-related research. To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy model significantly improved STR sequencing accuracy. Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1-bp- and 2-bp repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78% and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability status in cancer sequencing data. NanoMnT is publicly available at https://github.com/18parkky/NanoMnT.
    Keywords:  Oxford Nanopore; bioinformatics; error profile; long-read sequencing; microsatellite; short tandem repeats
    DOI:  https://doi.org/10.1093/gigascience/giaf013
  6. bioRxiv. 2025 Mar 06. pii: 2025.02.28.640809. [Epub ahead of print]
      Double-strand break (DSB) repair is highly mutagenic compared to normal replication. In budding yeast, repair of an HO endonuclease-induced DSB at MATα can be repaired by using a transcriptionally silent HMR::Kl-URA3 donor. During repair, -1 deletions in homonucleotide runs are strongly favored over +1 insertions, whereas during replication, spontaneous +1 and -1 events are equal. Microhomology-bounded, repair-associated intragenic deletions (IDs) are recovered 12 times more frequently than tandem duplications (TDs). IDs have a mean length of 56 bp, while TDs average 22 bp. These data suggest a picture of the structure of the repair replication fork: IDs and TDs occur within the open structure of a migrating D-loop, where the 3' end of a partly copied new DNA strand can dissociate and anneal with a single-stranded region of microhomology that lies either ∼80 bp ahead or ∼40 bp behind the 3' end. Another major class of repair-associated mutations (∼10%) are interchromosomal template switches (ICTS), even though the K. lactis URA3 sequence in HMR is only 72% identical (homeologous) with S. cerevisiae ura3-52 . ICTS events begin and end at regions of short (∼7 bp) microhomology; however, ICTS events are constrained to the middle of the copied sequence. Whereas microhomology usage in intragenic deletions is not influenced by adjacent homeology, we show that extensive pairing of adjacent homeology plays a critical role in ICTS. Thus, although by convention, structural variants are characterized by the precise base pairs at their junction, microhomology-mediated template switching actually requires alignment of extensive adjacent homeology.
    Significance statement: DNA synthesis during repair of a double-strand chromosome break by homologous recombination exhibits a high rate of mutation compared to normal replication. Using a genetic system in budding yeast, we isolated thousands of mutations occurring during repair. We conclude that the repair replication fork appears to have the two DNA strands open ∼80 bp ahead of the DNA polymerase, but the strands re-anneal rapidly behind the polymerase. Additionally, we analyzed interchromosomal template switching, in which the partially copied DNA strand dissociates and pairs with a new template at a short stretch of perfectly matching bases (microhomology), and resumes copying. We show that these apparent microhomology-mediated template switching events in fact require the pairing of ∼200 bp of imperfectly matching bases (homeology).
    DOI:  https://doi.org/10.1101/2025.02.28.640809
  7. Genome Biol. 2025 Mar 18. 26(1): 63
       BACKGROUND: The Drosophila genus is ideal for studying genome evolution due to its relatively simple chromosome structure and small genome size, with rearrangements mainly restricted to within chromosome arms, such as Muller elements. However, work on the rapidly evolving repetitive genomic regions, composed of transposons and tandem repeats, have been hampered by the lack of genus-wide chromosome-level assemblies.
    RESULTS: Integrating long-read genomic sequencing and chromosome capture technology, here we produce and annotate 30 chromosome-level genome assemblies within the Drosophila genus. Based on this dataset, we reveal the evolutionary dynamics of genome rearrangements across the Drosophila phylogeny, including the identification of genomic regions that show comparatively high structural stability throughout evolution. Moreover, within the ananassae subgroup, we uncover the emergence of new chromosome conformations and the rapid expansion of novel satellite DNA sequence families, which form large and continuous pericentromeric domains with higher-order repeat structures that are reminiscent of those observed in the human and Arabidopsis genomes.
    CONCLUSIONS: These chromosome-level genome assemblies present a valuable resource for future research, the power of which is demonstrated by our analysis of genome rearrangements and chromosome evolution. In addition, based on our findings, we propose the ananassae subgroup as an ideal model system for studying the evolution of centromere structure.
    DOI:  https://doi.org/10.1186/s13059-025-03527-4
  8. Genome Res. 2025 Mar 20.
      Structural variants (SVs) are omnipresent in human DNA, yet their genotype and methylation statuses are rarely characterized due to previous limitations in genome assembly and detection of modified nucleotides. Also, the extent to which SVs act as methylation quantitative trait loci (SV-mQTLs) is largely unknown. Here, we generated a pangenome graph summarizing SVs in 782 de novo assemblies obtained from Genomic Answers for Kids, capturing 14.6 million CpG dinucleotides that are absent from the CHM13v2 reference (SV-CpGs), thus expanding their number by 43.6%. Using 435 methylomes, we genotyped 4.06 million SV-CpGs, of which 3.93 million (96.8%) are methylated at least once. Nonrepeat sequences contribute 1.59 × 106 novel SV-CpGs, followed by centromeric satellites (6.57 × 105), simple repeats (5.40 × 105), Alu elements (5.07 × 105), satellites (2.17 × 105), LINE-1s (1.83 × 105), and SVA (SINE-VNTR-Alu) elements (1.50 × 105). Centromeric satellites, simple repeats, and SVAs are overrepresented in SV-CpGs versus reference CpGs. Similarly, methylation levels in SV-CpGs are more variable than in reference CpGs. To explore if SVs are potentially causal for functional variation, we measured SV-mQTLs. This revealed over 230,464 methylation bins where the methylation is associated with common SVs within 100 kbp. Finally, we identified 65,659 methylation bins (28.5%) where the leading QTL variant is an SV. In conclusion, we demonstrate that graph pangenomes provide full SV structures, the associated methylation variation, and reveal tens of thousands of SV-mQTLs, underscoring the importance of assembly based analyses of human traits.
    DOI:  https://doi.org/10.1101/gr.279240.124
  9. Cell Genom. 2025 Mar 14. pii: S2666-979X(25)00067-9. [Epub ahead of print] 100811
      Expanding tandem gene arrays facilitates adaptation through dosage effects and gene family formation via sequence diversification. However, experimental induction of such expansions remains challenging. Here, we introduce a method termed break-induced replication (BIR)-mediated tandem repeat expansion (BITREx) to address this challenge. BITREx places Cas9 nickase adjacent to a tandem gene array to break the replication fork that has just replicated the array, forming a single-ended double-strand break. This break is subsequently end-resected to become single stranded. Since there is no repeat unit downstream of the break, the single-stranded DNA often invades an upstream unit to initiate ectopic BIR, resulting in array expansion. BITREx has successfully expanded gene arrays in budding yeast, with the CUP1 array reaching ∼1 Mb. Furthermore, appropriate splint DNAs allow BITREx to generate tandem arrays de novo from single-copy genes. We have also demonstrated BITREx in mammalian cells. Therefore, BITREx will find various unique applications in genome engineering.
    Keywords:  genome editing; nCas9; replication fork; structural variation
    DOI:  https://doi.org/10.1016/j.xgen.2025.100811
  10. Chem Sci. 2025 Mar 10.
      Z-DNA is a non-canonical, left-handed helical structure that plays crucial roles in various cellular processes. DNA mismatches, which involve the incorporation of incorrect Watson-Crick base pairs, are present in all living organisms and contribute to the mechanism of Z-DNA formation. However, the impact of mismatches on Z-DNA formation remains poorly understood. Moreover, the combined effect of DNA mismatches and bending, a common biological phenomenon observed in vivo, has not yet been explored due to technological limitations. Here, using single-molecule FRET, we show that a mismatch inside the Z-DNA region, i.e., the CG repeat region, hinders Z-DNA formation. In stark contrast, however, a mismatch in the B-Z junction facilitates Z-DNA formation. When the bending force is applied on double stranded DNA, a mismatch in the B-Z junction releases the bending stress more effectively than one in the CG repeat region. These findings provide mechanical insights into the role of DNA mismatches and bending forces in regulating Z-DNA formation, whether promoting or inhibiting it in biological environments.
    DOI:  https://doi.org/10.1039/d5sc00749f
  11. Mol Cell Probes. 2025 Mar 14. pii: S0890-8508(25)00019-2. [Epub ahead of print]81 102026
      Huntington's disease (HD) arises from the abnormal expansion of a CAG repeat in the HTT gene. The mutant CAG repeat triggers aberrant RNA-protein interactions and translates into toxic aggregate-prone polyglutamine protein. These aberrant RNA-protein ineractions also seed the formation of cytoplasmic liquid-like granules, such as stress granules. Emerging evidence demonstrates that granules formed via liquid-liquid phase separation can mature into gel-like inclusions that persist within the cell and may act as precursor to aggregates that occur in patients' tissue. Thus, deregulation of RNA granules is an important component of neurodegeneration. Interestingly, both the formation of intracellular membrane-less organelles like stress granules and the secretion of small extracellular vesicles (sEVs) increase upon stress and under disease conditions. sEVs are lipid membrane-bound particles that are secreted from all cell types and may participate in the spreading of misfolded proteins and aberrant RNA-protein complexes across the central nervous system in neurodegenerative diseases like HD. In this study, we performed a comparative transcriptomic analysis of sEVs and RNA granules in an HD model. RNA granules and sEVs were isolated from an inducible HD cell model. Both sEVs and RNA granules were isolated from induced (HD) and non-induced (control) cells and analyzed by RNA sequencing. Our comparative analysis between the transcriptomics data of HD RNA granules and sEVs showed that: (I) intracellular RNA granules and extracellular RNA vesicles share content, (II) several non-coding RNAs translocate to RNA granules, and (III) the composition of RNA granules and sEVs is affected in HD cells. Our data showing common transcripts in intracellular RNA granules and extracellular sEVs suggest that formation of RNA granules and sEV loading may be related. Moreover, we found a high abundance of lncRNAs in both control and HD samples, with several transcripts under REST regulation, highlighting their potential role in HD pathogenesis and selective incorporation into sEVs. The transcriptome cargo of RNA granules or sEVs may serve as a source for diagnostic strategies. For example, disease-specific RNA-signatures of sEVs can serve as biomarker of central nervous system diseases. Therefore, we compared our dataset to transcriptomic data from HD patient sEVs in blood. However, our data suggest that the cell-type specific signature of sEV-secreted RNAs as well as their high variability may make it difficult to detect these biomarkers in blood.
    Keywords:  Extracellular vesicles; Huntington's disease; Neurodegeneration; RNA binding proteins; RNA granules; Stress granules
    DOI:  https://doi.org/10.1016/j.mcp.2025.102026
  12. Forensic Sci Int Genet. 2025 Mar 15. pii: S1872-4973(25)00052-3. [Epub ahead of print]78 103272
      Mixture deconvolution remains one of the major challenges in the field of forensic science. Currently, genetic markers are used and studied in the field of forensic genetics, including short tandem repeat (STR), insertion/deletion polymorphism (InDel), single nucleotide polymorphism (SNP), InDel closely linked to STR (DIP-STR), SNP closely linked to STR (SNP-STR), InDel closely linked to SNP (DIP-SNP) and microhaplotype (MH), all of which have been studied for DNA mixture analysis and have their own advantages and disadvantages. Mini-haplotype (MiniHap), as a novel haplotype genetic marker, contains 5 or more SNPs. A previous study has substantiated its significant high polymorphic characteristics, and it is expected to have potential applications in individual identification, paternity testing, ancestry inference, and mixture deconvolution. In this study, we first screened 22 MiniHaps with high polymorphism potential and constructed a panel based on the QNome nanopore sequencing device. Subsequently, we tested 100 unrelated Chinese Han individuals to evaluate the sequencing performance, allele (haplotype) frequencies, effective number of alleles (Ae) and forensic parameters of the 22 MiniHaps markers included in this novel assay. Next, a series of mixture simulations (two- or three-person mixtures with mixing ratios of 1:1-1:99 or 1:1:1-1:8:1) based on three standard materials (9947 A, 9948 and 2800 M) were detected by this MiniHap panel to explore its potential for DNA mixture deconvolution. The average Ae value was 10.9574, and 52.38 % of MiniHap loci had Ae values greater than 12.0000. The mean values of genetic diversity (GD) and power of discrimination (PD) were 0.8717 and 0.9457, respectively. Notably, most MiniHaps (85.71 %) had PD values exceeding 0.9000. The combined match probability (CMP) and combined power of exclusion (CPE) of this MiniHap panel were 4.4505 × 10-31 and 0.999999999999999996653, respectively. Moreover, the results of mixture analysis demonstrated that this MiniHap panel allowed detecting the components of minor contributor (s) even in imbalanced mixture samples, with detection limits of 1:39 and 1:8:1 for two- and three-person mixtures, respectively. In summary, MiniHap markers have remarkable application potential in mixture deconvolution, and it is necessary to conduct in-depth research on MiniHap markers for mixture analysis in the future.
    Keywords:  Forensic genetics; MiniHaps; Mixture deconvolution; Nanopore sequencing; QNome
    DOI:  https://doi.org/10.1016/j.fsigen.2025.103272