bims-rednas Biomed News
on Repetitive DNA sequences
Issue of 2025–05–04
thirty-one papers selected by
Anna Zawada, International Centre for Translational Eye Research



  1. bioRxiv. 2025 Apr 09. pii: 2024.11.02.621562. [Epub ahead of print]
      Tandem repeats (TRs) - highly polymorphic, repetitive sequences dispersed across the human genome - are crucial regulators of gene expression and diverse biological processes, but have remained underexplored relative to other classes of genetic variation due to historical challenges in their accurate calling and analysis. Here, we leverage whole genome and single-cell RNA sequencing from over 5.4 million blood-derived cells from 1,925 individuals to explore the impact of variation in over 1.7 million polymorphic TR loci on blood cell type-specific gene expression. We identify over 62,000 single-cell expression quantitative trait TR loci (sc-eTRs), 16.6% of which are specific to one of 28 distinct immune cell types. Further fine-mapping uncovers 4,283 sc-eTRs as candidate causal drivers of gene expression in 13.6% of genes tested genome-wide. We show through colocalization that TRs are likely mediators of genetic associations with immune-mediated and hematological traits in over 700 genes, and further identify novel TRs warranting investigation in rare disease cohorts. TRs are critical, yet long-overlooked, contributors to cell type-specific gene expression, with implications for understanding rare disease pathogenesis and the genetic architecture of complex traits.
    DOI:  https://doi.org/10.1101/2024.11.02.621562
  2. Nucleic Acids Res. 2025 Apr 22. pii: gkaf352. [Epub ahead of print]53(8):
      G-quadruplexes (G4s) are functional elements of the human genome, some of which inhibit DNA replication. We investigated replication of G4s within highly abundant microsatellite (GGGA, GGGT) and transposable element (L1 and SVA) sequences. We found that genome-wide, numerous motifs are located preferentially on the replication leading strand and the transcribed strand templates. We directly tested replicative polymerase ϵ and δ holoenzyme inhibition at these G4s, compared to low abundant motifs. For all G4s, DNA synthesis inhibition was higher on the G-rich than C-rich strand or control sequence. No single G4 was an absolute block for either holoenzyme; however, the inhibitory potential varied over an order of magnitude. Biophysical analyses showed the motifs form varying topologies, but replicative polymerase inhibition did not correlate with a specific G4 structure. Addition of the G4 stabilizer pyridostatin severely inhibited forward polymerase synthesis specifically on the G-rich strand, enhancing G/C strand asynchrony. Our results reveal that replicative polymerase inhibition at every G4 examined is distinct, causing complementary strand synthesis to become asynchronous, which could contribute to slowed fork elongation. Altogether, we provide critical information regarding how replicative eukaryotic holoenzymes navigate synthesis through G4s naturally occurring thousands of times in functional regions of the human genome.
    DOI:  https://doi.org/10.1093/nar/gkaf352
  3. J Appl Genet. 2025 Apr 29.
      Dominantly inherited GAA repeat expansions in the FGF14 gene have recently been identified as the cause of spinocerebellar ataxia 27B (SCA27B). Our study focused on a Polish patient case along with asymptomatic family members. Moreover, we systematically reviewed available case reports to better understand the SCA27B phenotype. Genetic tests for SCA27B were performed on genomic DNA isolated from blood. Long-range polymerase chain reaction (LR-PCR) followed by Nanopore sequencing was conducted to establish the number of GAA repeats. The available literature was systematically reviewed per the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-analyses. The patient's genetic studies identified pure expansions of (GAA) 420/94 repeats in FGF14, confirming the SCA27B diagnosis. A systematic review of 815 cases provides further insight into the typical clinical presentation, with gait ataxia (95.96%) being the most prevalent symptom, followed by abnormal saccadic pursuits (80.69%), nystagmus (71.15%), diplopia (54.05%), and dysarthria (51.22%). Notably, 41.87% of cases exhibited episodic symptoms. The correlation between GAA repeat expansions and the pathogenesis of SCA27B requires further studies. The unique course of the disease with episodic symptoms may cause diagnostic difficulties. Due to its high prevalence in the European population, SCA27B should be considered when diagnosing the causes of late-onset cerebellar ataxia.
    Keywords:   FGF14 ; Downbeat nystagmus; GAA repeats; LOCA; SCA27B
    DOI:  https://doi.org/10.1007/s13353-025-00967-3
  4. Cerebellum. 2025 Apr 28. 24(4): 89
      Spinocerebellar ataxias (SCAs) are autosomal dominant genetic disorders characterized by progressive cerebellar degeneration and phenotypic variability. MJD/SCA3, the most prevalent form around the world and in Latin America, is also likely the most common hereditary ataxia in Uruguay. Despite its relevance, Uruguay lacks comprehensive epidemiological studies, and molecular diagnostics remain inaccessible in public health systems. This review provides a phenotypic description on genetically confirmed patients with MJD/SCA3 as a first step towards generating knowledge on this matter in our country. A retrospective review of 37 Uruguayan patients with suspected SCA was conducted. Sixteen patients with confirmed molecular diagnosis of MJD/SCA3 were included on this review. Data collected encompassed demographic information, genetic testing results, clinical manifestations, and imaging findings. Patients were evaluated at the Ataxias Polyclinic, Hospital de Clínicas, between 2019 and 2024 by the authors. Statistical analyses were performed using SPSS version 29.0. The mean age of symptom onset was 41.75 years, with gait ataxia as the initial symptom in 87.5% of cases. Clinical findings included appendicular ataxia (100%), dysarthria (90%), and oculomotor alterations (90%), with diverse deep sensitivity impairment in 62.5%. Genetic testing revealed an average of 72.9 CAG repeats in the ATXN3 gene. Cerebellar atrophy was observed in 75% of patients with MRI. Most had a diagnostic delay of 6.5 years and an autosomal dominant family history. Findings align with international descriptions of MJD/SCA3 while highlighting regional characteristics, including a potential genetic link with southern Brazil. The absence of dysautonomia, typically prevalent in MJD/SCA3, suggests underdiagnosis or insufficient evaluation. This study underscores the need for systematic clinical and molecular evaluations in Uruguay and serves as a foundation to understand hereditary ataxias at a national level. Further research is essential for improving diagnosis and management of this complex pathology.
    Keywords:  Hereditary ataxia; Machado-Joseph disease; Natural history; Spinocerebellar ataxia; Uruguay
    DOI:  https://doi.org/10.1007/s12311-025-01839-6
  5. ACS Omega. 2025 Apr 22. 10(15): 14980-14993
      Huntington's disease (HD) and Spinocerebellar Ataxia (SCA) are debilitating neurological disorders triggered by the expansion of CAG sequences within the specific genes (HTT and ATXN, respectively). These are characterized as poly glutamine (polyQ) disorders, which are marked by widespread neurodegeneration and metabolic irregularities across systemic, cellular, and intracellular levels. This study aimed to identify small molecules that specifically interact with and target the toxic CAG repeat RNA. Here, we investigated the neuroprotective effects of Oseltamivir, an antiviral drug, against the HD and SCA-causing CAG repeats, through biophysical, cellular, and Drosophila model-based studies. Using a multidimensional approach encompassing biophysical techniques, cellular assays, and a Drosophila model, we explored Oseltamivir's interaction with toxic CAG repeat RNA. Our comprehensive analyses, including circular dichroism (CD), isothermal titration calorimetry (ITC), electrophoretic mobility shift assay (EMSA), and nuclear magnetic resonance (NMR) spectroscopy, demonstrated Oseltamivir's specific binding affinity for AA mismatches and its potential to mitigate the toxicity associated with polyQ aggregation. Moreover, the identified U.S. FDA-approved drug effectively mitigated polyQ-induced toxicity in both HD cells and the Drosophila model of the disease. The results obtained from this drug repurposing approach are indicative of the neuro-shielding role of Oseltamivir in HD and several SCAs, paving the way for its translation into clinical practice to benefit patients afflicted with these devastating diseases.
    DOI:  https://doi.org/10.1021/acsomega.4c10338
  6. Eur Neurol. 2025 Apr 29. 1-16
       INTRODUCTION: Huntington's disease (HD) is an autosomal dominant neurodegenerative disorder characterized by involuntary movements, psychiatric symptoms and cognitive decline. Its prevalence is highest in individuals of European descent. However, a previous study in 2007 Iceland showed an unusually low incidence and prevalence.
    OBJECTIVES: The aim of this study was to investigate the incidence and prevalence of HD in Iceland between 2008-2022 as well as age, sex, symptoms, number of CAG repeats, treatment and prognosis.
    MATERIALS AND METHODS: A retrospective epidemiological study was conducted with clinical information obtained from medical records of individuals diagnosed with Huntington's disease 2008-2022. Information was also obtained from the Department of Genetics at University Hospital of Iceland and neurologists managing HD patients.
    RESULTS: Among the 22 diagnosed individuals (11 men) identified, the point prevalence on December 31, 2022, was 4.38 per 100,000 inhabitants, with an average annual incidence rate of 0.314 per 100,000 person-years. Average age at symptom onset was 46.3 years. 21 out of 22 individuals had confirmed HD through genetic testing, with an average CAG repeat length of 42.3 (range 40-45). Five individuals died during the study-period with the most common cause of death being aspiration pneumonia. The average age at death was 70.4 years.
    CONCLUSION: The prevalence and incidence of HD in Iceland have increased compared to the 2007 study but remain lower than in other European populations. Results showed a lower number of CAG repeats in the Icelandic HD population, potentially explaining the higher age at symptom onset and death compared to global averages.
    DOI:  https://doi.org/10.1159/000546150
  7. Med Sci (Paris). 2025 Apr;41(4): 394-397
      A beautiful piece of work using extensive single-cell studies illuminates the mechanism of Huntington's disease: the somatic expansion of the (CAG)n tract, very slow at first but accelerating once a critical repeat length is reached, drives extensive changes in gene expression in striatal neurons and eventually leads to cell death and atrophy of the striatum. This explains many puzzling features of the disease and may have important implications for possible therapy and for the understanding of other triplet repeat disorders.
    DOI:  https://doi.org/10.1051/medsci/2025059
  8. Biochemistry. 2025 Apr 27.
      Huntington's disease (HD) is a neurological condition caused by an excessive expansion of CAG repeats in the Huntingtin (HTT) gene. Although experiments have shown an altered epigenetic landscape and chromatin architecture upon HD development, the structural consequences on the HTT gene remain elusive. Structural data are only available for model nucleosome systems and yeast systems with human nucleosomes. Here, we use our experimentally validated nucleosome-resolution mesoscale chromatin model to investigate folding changes of the HTT gene associated with HD. We investigate how the histone fold domain of the variant macroH2A1, a biomarker of HD, affects the genome structure by modeling HD-like systems that contain (i) 100% canonical, (ii) 100% macroH2A1, (iii) 50% canonical and 50% macroH2A1, and (iv) 100% hybrid cores (one canonical H2A and one macroH2A1 per nucleosome). Then, we model the mouse HTT gene in healthy and HD conditions by incorporating the CAG expansion and macroH2A1 cores, reducing the linker histone density and tail acetylation levels, and incorporating genomic contacts. Overall, our results show that the histone fold domain of macroH2A1 affects chromatin compaction in a fiber-dependent manner (i.e., nucleosome distribution dependent) and can thus both enhance or repress HTT gene expression. Our modeling of the HTT gene shows that HTT is less compact in the diseased condition, which could accelerate the production of the mutated protein. By suggesting the structural biophysical consequences of the HTT gene under HD conditions, our findings may help in the development of diagnostic and therapeutic treatments for HD.
    DOI:  https://doi.org/10.1021/acs.biochem.5c00029
  9. Epigenetics Chromatin. 2025 Apr 28. 18(1): 24
       BACKGROUND: Repeat-induced epigenetic changes are observed in many repeat expansion disorders (REDs). These changes result in transcriptional deficits and/or silencing of the associated gene. MSH2, a mismatch repair protein that is required for repeat expansion in the REDs, has been implicated in the maintenance of DNA methylation seen in the region upstream of the expanded CTG repeats at the DMPK locus in myotonic dystrophy type 1 (DM1). Here, we investigated the role of MSH2 in aberrant DNA methylation in two additional REDs, fragile X syndrome (FXS) that is caused by a CGG repeat expansion in the 5' untranslated region (UTR) of the Fragile X Messenger Ribonucleoprotein 1 (FMR1) gene, and Friedreich's ataxia (FRDA) that is caused by a GAA repeat expansion in intron 1 of the frataxin (FXN) gene.
    RESULTS: In contrast to what is seen at the DMPK locus in DM1, loss of MSH2 did not decrease DNA methylation at the FMR1 promoter in FXS embryonic stem cells (ESCs) or increase FMR1 transcription. This difference was not due to the differences in the CpG density of the two loci as a decrease in DNA methylation was also not observed in a less CpG dense region upstream of the expanded GAA repeats in the FXN gene in MSH2 null induced pluripotent stem cells (iPSCs) derived from FRDA patient fibroblasts. Surprisingly, given previous reports, we found that FMR1 reactivation was associated with a high frequency of MSH2-independent CGG-repeat contractions that resulted a permanent loss of DNA methylation. MSH2-independent GAA-repeat contractions were also seen in FRDA cells.
    CONCLUSIONS: Our results suggest that there are mechanistic differences in the way that DNA methylation is maintained in the region upstream of expanded repeats among different REDs even though they share a similar mechanism of repeat expansion. The high frequency of transcription-induced MSH2-dependent and MSH2-independent contractions we have observed may contribute to the mosaicism that is frequently seen in carriers of FMR1 alleles with expanded CGG-repeat tracts. These contractions may reflect the underlying problems associated with transcription through the repeat. Given the recent interest in the therapeutic use of transcription-driven repeat contractions, our data may have interesting mechanistic, prognostic, and therapeutic implications.
    Keywords:  DNA damage; DNA methylation; Fragile X syndrome; Friedreich’s ataxia; MSH2; Mismatch repair; Repeat expansion
    DOI:  https://doi.org/10.1186/s13072-025-00588-4
  10. Biomedicines. 2025 Mar 27. pii: 805. [Epub ahead of print]13(4):
      Fragile X syndrome (FXS) is the most common inherited cause of intellectual disability and a major genetic contributor to autism spectrum disorder. It is caused by a CGG trinucleotide repeat expansion in the FMR1 gene, resulting in gene silencing and the loss of FMRP, an RNA-binding protein essential for synaptic plasticity. This review covers over 80 years of FXS research, highlighting key milestones, clinical features, genetic and molecular mechanisms, the FXS mouse model, disrupted molecular pathways, and current therapeutic strategies. Additionally, we discuss recent advances including AI-driven combination therapies, CRISPR-based gene editing, and antisense oligonucleotides (ASOs) therapies. Despite these scientific breakthroughs, translating preclinical findings into effective clinical treatments remains challenging. Clinical trials have faced several difficulties, including patient heterogeneity, inconsistent outcome measures, and variable therapeutic responses. Standardized preclinical testing protocols and refined clinical trial designs are required to overcome these challenges. The development of FXS-specific biomarkers could also improve the precision of treatment assessments. Ultimately, future therapies will need to combine pharmacological and behavioral interventions tailored to individual needs. While significant challenges remain, ongoing research continues to offer hope for transformative breakthroughs that could significantly improve the quality of life for individuals with FXS and their families.
    Keywords:  fragile X messenger ribonucleoprotein (FMRP); fragile X messenger ribonucleoprotein 1 (FMR1); fragile X syndrome (FXS); trinucleotide repeat expansion (CGG repeat)
    DOI:  https://doi.org/10.3390/biomedicines13040805
  11. J Neurodev Disord. 2025 Apr 26. 17(1): 22
       BACKGROUND: Fragile X syndrome (FXS) is a neurodevelopmental disorder caused by the expansion of a CGG repeat in the 5'UTR of the FMR1 (fragile X messenger ribonucleoprotein 1) gene. Healthy individuals possess a repeat 30-55 CGG units in length. Once the CGG repeat exceeds 200 copies it triggers methylation at the locus. This methylation covers the FMR1 promoter region and silences expression of the gene and the production of FMRP (fragile X messenger ribonucleoprotein). The loss of FMRP is responsible for a number of pathologies including neurodevelopmental delay and autism spectrum disorder. Methylation of the expanded repeat in the FMR1 locus is the causal factor for FXS, however it is not known why the expanded repeat triggers this epigenetic change or how exactly DNA methylation is established. Intriguingly, genetic engineering of expanded CGG repeats of over 300 copies in the FMR1 locus in mice remains unmethylated. Also in humans, in very rare cases, individuals can have an FMR1 CGG expansion > 200 copies but the locus remains unmethylated. These unmethylated full mutation (UFM) individuals give us a rare opportunity to investigate the mechanism of FMR1 promoter methylation.
    METHODS: Fibroblasts were obtained from a healthy control, an FXS patient and two unmethylated full expansion carriers. RNA was extracted and comparative transcriptomic analysis was performed on all samples. Whole genome sequencing was carried out on DNA from the two UFM carriers and the results analysed to investigate DNA variants that could explain the observed differences in gene expression.
    RESULTS: Our analyses focused on genes involved in epigenetic modification. We show that Tet methylcytosine dioxygenase 3 (TET3), a gene involved in DNA methylation, is significantly downregulated in UFM carriers compared to healthy controls or FXS patient derived cells. Genomic analyses reveal a number of rare variants present in the TET3 locus in UFM carriers when compared to the reference genome. However, no clear modifying TET3 variants were identified.
    CONCLUSION: Our results suggest that TET3 is a candidate factor responsible for the lack of methylation of the expanded FMR1 locus. Further analyses are needed to further elucidate this relationship, however given its potential to directly interact with CGG repeats and its ambiguous role in 5-hydroxy-methylation of CG containing sequences, TET3 is a strong candidate for further exploration.
    DOI:  https://doi.org/10.1186/s11689-025-09609-5
  12. J Mol Diagn. 2025 Apr 23. pii: S1525-1578(25)00090-X. [Epub ahead of print]
      Fragile X syndrome (FXS) is the leading cause of monogenic autism spectrum disorder and inherited intellectual disabilities. Although the value of population-based FXS carrier screening has been acknowledged, appropriate screening methods are urgently required to establish and implement screening programs. We developed a nanopore sequencing-based assay that includes data analysis software to identify FXS carriers. Reference and clinical samples were used to evaluate the performance of nanopore sequencing assay. Triplet-primed PCR and PacBio sequencing assays were used for comparisons. Nanopore sequencing identified reference carrier samples with a full range of premutation alleles in single-, 10-, and 100-plex assays, and identified AGG interruptions in an allele-specific manner. Moreover, nanopore sequencing revealed no size preference for amplicons containing different length CGG repeat regions. Finally, nanopore sequencing successfully identified three carriers among ten clinical samples for preliminary clinical validation. The observed variation in CGG repeat region size resulted from the base-calling process of nanopore sequencing. In conclusion, the nanopore sequencing assay is rapid, high-capacity, inexpensive, and easy to perform, thus providing a promising tool and paving the way for population-based FXS carrier screening.
    Keywords:  AGG interruption; CGG repeat; carrier screening; fragile X syndrome; nanopore sequencing
    DOI:  https://doi.org/10.1016/j.jmoldx.2025.03.008
  13. Sci Rep. 2025 Apr 26. 15(1): 14665
      Fuchs endothelial corneal dystrophy (FECD) remains a leading cause of corneal blindness globally, with corneal transplantation being the primary treatment. FECD is characterized by the formation of guttae, extracellular matrix (ECM) deposits beneath the corneal endothelium, and progressive endothelial cell loss. These pathological changes cause visual deterioration through light scattering by guttae and corneal edema due to endothelial cell loss. However, limitations such as donor shortage and graft failure necessitate alternative therapeutic approaches. We employed computational drug screening using three platforms (L1000FWD, L1000CDS2, and SigCom LINCS) to identify compounds capable of normalizing FECD-associated differentially expressed genes (DEGs). Analysis of transcriptome data from FECD patients with TCF4trinucleotide repeat expansion identified 706 upregulated and 962 downregulated genes. The screening platforms identified 200, 35, and 76 compounds through L1000FWD, L1000CDS2, and SigCom LINCS, respectively, with five compounds commonly predicted across all platforms. Among these, LDN193189 and cercosporin were selected for further evaluation based on availability and lack of cytotoxicity. Both compounds significantly decreased the expression of ECM-related genes (FN1, MATN3, BGN, and LTBP2) in FECD cell models and suppressed TGF-β-induced fibronectin expression. Additionally, both compounds reduced aggresome formation to normal control levels, suggesting protection against endoplasmic reticulum stress-induced cell death. This study demonstrates the feasibility of computational drug screening for identifying therapeutic candidates for FECD, with LDN193189 and cercosporin showing promise in normalizing FECD-associated pathological changes.
    DOI:  https://doi.org/10.1038/s41598-025-95003-z
  14. Sci Rep. 2025 Apr 26. 15(1): 14664
      Trinucleotide repeat (TNR) expansion in the transcription factor 4 (TCF4) gene represents the most prevalent genetic risk factor for Fuchs endothelial corneal dystrophy (FECD) and may cause dysfunction of splicing regulators. We investigated differential alternative splicing (DAS) events in corneal endothelial cells (CECs) from FECD patients with and without TCF4 TNR expansion through RNA-Seq analysis. We identified distinct splicing profiles among control subjects, FECD patients with TNR expansion, and FECD patients without TNR expansion. Skipped Exon events constituted approximately 50% of all DAS events across all comparisons, with the remaining events distributed among alternative 3' splice site, alternative 5' splice site, mutually exclusive exon, and retained intron categories. Motif analysis in FECD patients with TNR expansion revealed several RNA-binding proteins, including MBNL1, as potential regulators of these splicing alterations. Computational analysis demonstrated that 34% of Skipped Exon events in the TNR expansion group significantly impacted protein structure. This comprehensive analysis revealed distinct alternative splicing signatures in FECD, particularly in cases with TNR expansion, suggesting a crucial role for aberrant splicing in FECD pathogenesis.
    DOI:  https://doi.org/10.1038/s41598-025-92119-0
  15. medRxiv. 2025 Apr 23. pii: 2025.04.18.25325809. [Epub ahead of print]
    PNRR Study Group
       Objective: Biallelic intronic AAGGG repeat expansions in RFC1 cause Cerebellar Ataxia with Neuropathy and Vestibular Areflexia Syndrome and may also contribute to isolated sensory neuropathy. The clinical significance of both heterozygous and biallelic RFC1 expansions in more diverse patient populations remains unclear-partly due to the absence of accurate, user-friendly computational tools specifically tailored for tandem repeat analysis.
    Methods: To discern the relationship between RFC1 expansions and idiopathic peripheral neuropathy (iPN), we performed whole-genome sequencing (WGS) followed by PCR-based confirmation in a large, well-characterized U.S. cohort consisting of 788 iPN patients (369 pure small fiber neuropathy (SFN), 266 sensorimotor, 144 pure sensory, and 9 pure motor). We developed an integrative pipeline combining ExpansionHunter Denovo and Expansion Hunter coupled with unsupervised clustering to reliably detect and genotype RFC1 expansions from short-read WGS data, achieving 98.2% concordance with repeat-primed PCR based validation.
    Results: Biallelic RFC1 expansions were absent in 879 controls but present in 2.8% of iPN patients (Fisher's exact p = 5.9×10 -8 ), including 6.2% of pure sensory, 2.2% of SFN, and 1.5% of sensorimotor neuropathy, indicating that motor nerve involvement should not exclude patients from RFC1 repeat screening. We also observed a markedly increased frequency of monoallelic expansions in iPN compared to controls (13.2% versus 2.5%; Fisher's exact p = 3.4×10 -17 ), without evidence of secondary mutations or expansions on the other allele.
    Interpretation: Our approach provides a robust, cost-effective method for detecting RFC1 expansions from WGS data. Our findings indicate that both heterozygous and homozygous AAGGG repeat expansions in RFC1 can contribute to development of iPN.
    DOI:  https://doi.org/10.1101/2025.04.18.25325809
  16. Genes (Basel). 2025 Mar 30. pii: 406. [Epub ahead of print]16(4):
      Short tandem repeat (STR) sequences are highly variable DNA segments that significantly contribute to human neurodegenerative disorders, highlighting their crucial role in neuropsychiatric conditions. This article examines the pathogenicity of abnormal STRs and classifies tandem repeat expansion disorders(TREDs), emphasizing their genetic characteristics, mechanisms of action, detection methods, and associated animal models. STR expansions exhibit complex genetic patterns that affect the age of onset and symptom severity. These expansions disrupt gene function through mechanisms such as gene silencing, toxic gain-of-function mutations leading to RNA and protein toxicity, and the generation of toxic peptides via repeat-associated non-AUG (RAN) translation. Advances in sequencing technologies-from traditional PCR and Southern blotting to next-generation and long-read sequencing-have enhanced the accuracy of STR variation detection. Research utilizing these technologies has linked STR expansions to a range of neuropsychiatric disorders, including autism spectrum disorders and schizophrenia, highlighting their contribution to disease risk and phenotypic expression through effects on genes involved in neurodevelopment, synaptic function, and neuronal signaling. Therefore, further investigation is essential to elucidate the intricate interplay between STRs and neuropsychiatric diseases, paving the way for improved diagnostic and therapeutic strategies.
    Keywords:  STRs; autism; genetic mechanisms; neuropsychiatric disorders; repeat expansions; schizophrenia; sequencing technologies
    DOI:  https://doi.org/10.3390/genes16040406
  17. Nat Rev Immunol. 2025 Apr 29.
      Transposable elements (TEs) are mobile repetitive nucleic acid sequences that have been incorporated into the genome through spontaneous integration, accounting for almost 50% of human DNA. Even though most TEs are no longer mobile today, studies have demonstrated that they have important roles in different biological processes, such as ageing, embryonic development, and cancer. TEs influence these processes through various mechanisms, including active transposition of TEs contributing to ongoing evolution, transposon transcription generating RNA or protein, and by influencing gene regulation as enhancers. However, how TEs interact with the immune system remains a largely unexplored field. In this Perspective, we describe how TEs might influence different aspects of the immune system, such as innate immune responses, T cell activation and differentiation, and tissue adaptation. Furthermore, TEs can serve as a source of neoantigens for T cells in antitumour immunity. We suggest that TE biology is an important emerging field of immunology and discuss the potential to harness the TE network therapeutically, for example, to improve immunotherapies for cancer and autoimmune and inflammatory diseases.
    DOI:  https://doi.org/10.1038/s41577-025-01172-3
  18. Dev Biol. 2025 Apr 23. pii: S0012-1606(25)00109-5. [Epub ahead of print]523 111-114
      In germ cells, repressing transposable elements (TEs) is important to maintain genomic integrity. TE expression and transposition are repressed by PIWI-interacting RNAs (piRNAs). Although many genes for piRNA synthesis have been described, few transcription factors activating their expression have been identified. We previously reported that a transcription factor, maternal Ovo (Ovo-B) protein activates germline-specific gene expression in progenitors of germ cells. In this study, we found that maternal Ovo also activates several genes, including aubergine (aub), for TE silencing. Knocking down maternal Ovo de-repressed TEs in adult ovaries. In addition, embryonic knockdown of aub caused de-repression of TEs in adult Drosophila ovaries. Surprisingly, embryonic knockdown of maternal Ovo affected neither expression of ovo nor its downstream TE-silencing genes in adult ovaries after growth. These results strongly suggest that maternal Ovo is required for TE silencing in ovaries, via transcriptional activation of genes for piRNA synthesis in embryos.
    Keywords:  Drosophila; Germline; Ovo; Transposable elements
    DOI:  https://doi.org/10.1016/j.ydbio.2025.04.014
  19. G3 (Bethesda). 2025 Apr 30. pii: jkaf092. [Epub ahead of print]
      Genomic structural variations (SVs) and transposable elements (TEs) can be significant contributors to genome evolution, gene expression alterations, and genetic disease risk. Recent advancements in long-read sequencing have greatly improved the quality of de novo genome assemblies and enhanced the detection of larger and highly repetitive sequence variants at the scale of hundreds or thousands of bases. Comparisons between two diverged wild isolates of Caenorhabditis elegans, the Bristol and Hawaiian strains, have been widely utilized in the analysis of small genetic variations. To comprehensively detect SVs and TEs, we generated de novo genome assemblies and annotations for the N2 Bristol and CB4856 Hawaiian C. elegans strains from our lab collection using both long- and short-read sequencing. Within our lab assemblies, we annotate over 3.1Mb of sequence divergence between the Bristol and Hawaiian isolates: 246,298 homozygous SNPs, 73,789 homozygous small insertion-deletions (<50 bp), and 4,334 structural variations (>50 bp). We also define the location and movement of specific TEs between N2 Bristol and CB4856 Hawaiian wild type isolates. Specifically, we find the N2 Bristol genome has 20.6% more TEs from the Tc1/mariner family than the CB4856 Hawaiian genome. Moreover, we identified Zator elements as the most abundant and mobile TE family in the genome. Using specific TE sequences with unique SNPs, we also identify 9 TEs that moved intrachromosomally and 8 TEs that moved to new chromosomes between the N2 Bristol and CB4856 Hawaiian genomes. Further, we show an enrichment of genomic variation in transposon sequences and silenced heterochromatic regions of chromosomes in the germline. Taken together, our studies demonstrate how specific regions of the genome, including large scale repetitive regions, are more susceptible to accumulation of genetic variation and changes to genome structure.
    Keywords:   C. elegans ; WormBase; genetic drift; genome assembly; genome stability; reference genomes; sequence variation; transposons; whole genome sequencing
    DOI:  https://doi.org/10.1093/g3journal/jkaf092
  20. Genes (Basel). 2025 Mar 29. pii: 397. [Epub ahead of print]16(4):
       BACKGROUND: The African hedgehog (Atelerix albiventris) exhibits specialized skin differentiation leading to spine formation, yet its regulatory mechanisms remain unclear. Transposable elements (TEs), particularly LINEs (long interspersed nuclear elements) and SINEs (short interspersed nuclear elements), are known to influence genome organization and gene regulation.
    OBJECTIVES: Given the high proportion of SINEs in the hedgehog genome, this study aims to characterize the distribution, evolutionary dynamics, and potential regulatory roles of LINEs and SINEs, focusing on their associations with chromatin architecture, DNA methylation, and gene expression.
    METHODS: We analyzed LINE and SINE distribution using HiFi sequencing and classified TE families through phylogenetic reconstruction. Hi-C data were used to explore TE interactions with chromatin architecture, while whole-genome 5mCpG methylation was inferred from PacBio HiFi reads of muscle tissue using a deep-learning-based approach. RNA-seq data from skin tissues were analyzed to assess TE expression and potential associations with genes linked to spine development.
    RESULTS: SINEs form distinct genomic blocks in GC-rich and highly methylated regions, whereas LINEs are enriched in AT-rich, hypomethylated regions. LINEs and SINEs are associated differently with A/B compartments, with SINEs in euchromatin and LINEs in heterochromatin. Methylation analysis suggests that younger TEs tend to have higher methylation levels, and expression analysis indicates that some differentially expressed TEs may be linked to genes involved in epidermal and skeletal development.
    CONCLUSIONS: This study provides a genome-wide perspective on LINE and SINE distribution, methylation patterns, and potential regulatory roles in A. albiventris. While not establishing a direct causal link, the findings suggest that TEs may influence gene expression associated with spine development, offering a basis for future functional studies.
    Keywords:  Atelerix albiventris; LINEs (long interspersed nuclear elements); SINEs (short interspersed nuclear elements); genome structure; repetitive sequences
    DOI:  https://doi.org/10.3390/genes16040397
  21. Genome Biol. 2025 Apr 28. 26(1): 107
       BACKGROUND: The interplay between 3D genomic structure and transposable elements (TE) in regulating cell state-specific gene expression program is largely unknown. Here, we explore the utilization of TE-derived enhancers in naïve and expanded pluripotent states by integrative analysis of genome-wide Hi-C-defined enhancer interactions, H3K27ac HiChIP profiling and CRISPR-guided TE proteomics landscape.
    RESULTS: We find that short interspersed nuclear elements (SINEs) are the more involved TEs in the active chromatin and 3D genome architecture. In particular, mammalian-wide interspersed repeat (MIR), a SINE family member, is highly associated with naïve-specific genomic interactions compared to the expanded state. Primarily, in the naïve pluripotent state, MIR enhancer is co-opted by ESRRB for naïve-specific gene expression program. This ESRRB and MIR enhancer interaction is crucial for the formation of loops that build a network of enhancers and super-enhancers regulating pluripotency genes. We demonstrate that loss of a ESRRB-bound MIR enhancer impairs self-renewal. We also find that MIR is co-bound by structural protein complex, ESRRB-YY1, in the naïve pluripotent state.
    CONCLUSIONS: Altogether, our study highlights the topological regulation of ESRRB on MIR in the naïve potency state.
    Keywords:  3D genome; Mouse embryonic stem cells; Pluripotency; Transposable element
    DOI:  https://doi.org/10.1186/s13059-025-03577-8
  22. Sci Rep. 2025 Apr 25. 15(1): 14489
      Transposable elements (TEs) make up 45% of the human genome, are a source of genetic variability difficult to detect, and involved in processes related to gene regulation and disease. Nanopore sequencing is recognized as one of the best technologies for detecting TEs; however, tools for analyzing of human TE insertions and deletions with nanopore-based data can be improved. RetroInspector is an easy to use, configurable Snakemake pipeline that performs detection, annotation, enrichment, and genotyping of TEs. RetroInspector requires the FASTQ files of the samples and the reference genome to start the identification and analysis of TEs. The user can also set the threshold for the number of supporting reads for the variant filtering. RetroInspector also allows users to compare the results of two samples. Different versions of the reference genome can be used and the presence of retrotransposition features can be annotated. RetroInspector has been run on three nanopore sequencing datasets and validated experimentally using proprietary and public data with over 80% precision.
    DOI:  https://doi.org/10.1038/s41598-025-98847-7
  23. medRxiv. 2025 Apr 25. pii: 2025.04.23.25326276. [Epub ahead of print]
      The TOMM40'523 poly-T repeat polymorphism (rs10524523), located in the TOMM40 gene and in linkage disequilibrium with APOE , has been associated with cognitive decline and Alzheimer's disease (AD) progression. Accurate genotyping of this polymorphism is crucial for understanding its role in neurodegeneration. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these polymorphisms. Here, we introduce a novel computational pipeline that integrates multiple short tandem repeat (STR) detection tools in an ensemble machine learning model using XGBoost . This approach leverages STR tool predictions, k-mer counts, and related features to enhance poly-T repeat length estimation. Using a sample of 1,202 participants from four cohort studies, we benchmarked our method against PCR-based measures. Our ensemble model outperformed individual STR tools, improving repeat length estimation accuracy (R 2 = 0.92) and achieving an accuracy rate of 93.2% with PCR-derived genotypes as the gold standard. Additionally, we validated our WGS-derived genotypes by replicating previously reported associations between TOMM40'523 variants and cognitive decline, demonstrating consistency with prior findings. Our results suggest that computational genotyping from WGS data is a scalable and reliable alternative to PCR-based assays, enabling broader investigations of TOMM40 variation in studies where WGS data is available.
    DOI:  https://doi.org/10.1101/2025.04.23.25326276
  24. Proc Natl Acad Sci U S A. 2025 May 06. 122(18): e2411446122
      Transposable elements (TEs) make up the bulk of eukaryotic genomes and examples abound of TE-derived sequences repurposed for organismal function. The process by which TEs become coopted remains obscure because most cases involve ancient, transpositionally inactive elements. Reports of active TEs serving beneficial functions are scarce and often contentious due to difficulties in manipulating repetitive sequences. Here, we show that recently active TEs in zebrafish encode products critical for embryonic development. Knockdown and rescue experiments demonstrate that the endogenous retrovirus family BHIKHARI-1 (Bik-1) encodes a Gag protein essential for mesoderm development. Mechanistically, Bik-1 Gag associates with the cell membrane, and its ectopic expression in chicken embryos alters cell migration. Similarly, depletion of BHIKHARI-2 Gag, a relative of Bik-1, causes defects in neural crest development in zebrafish. We propose an "addiction" model to explain how active TEs can be integrated into conserved developmental processes.
    Keywords:  addiction; cooperation; embryogenesis; transposable elements; zebrafish
    DOI:  https://doi.org/10.1073/pnas.2411446122
  25. Biology (Basel). 2025 Apr 20. pii: 448. [Epub ahead of print]14(4):
      This study reports the first complete mitochondrial genome assembly of Hippophae salicifolia, an ecologically and economically important plant endemic to the Himalayas. The 475,105 bp genome has a 44.80% GC content and an overall AT bias, comprising 74 genes (37 protein-coding, 31 tRNA, three rRNA, and three pseudogenes). We identified extensive repetitive elements, including 188 SSRs, 20 tandem repeats, and 455 dispersed repeats, and explored their potential roles in genome evolution. Codon usage analysis showed a bias for codons ending in A or U, while RNA editing analysis revealed 415 sites that mostly convert hydrophilic to hydrophobic amino acids. Phylogenetic and collinearity analyses clarified evolutionary relationships within Hippophae and uncovered genome rearrangements. In addition, extensive gene transfer was detected between the mitochondrial and chloroplast genomes. Ka/Ks and nucleotide diversity analyses indicate that most genes are under purifying selection, with some possibly undergoing positive selection. Overall, these findings enhance our understanding of the structural and evolutionary features of the H. salicifolia mitochondrial genome and provide valuable insights for the genetic improvement and conservation of Hippophae species.
    Keywords:  Hippophae salicifolia; RNA editing; mitochondrial genome; phylogeny; repetitive sequences
    DOI:  https://doi.org/10.3390/biology14040448
  26. Microb Genom. 2025 May;11(5):
      Repeats are the most diverse and dynamic but also the least well-understood component of microbial genomes. For all we know, repeat-associated mutations such as duplications, deletions, inversions and gene conversion might be as common as point mutations, but because of short-read myopia and methodological bias, they have received much less attention. Long-read DNA sequencing opens the perspective of resolving repeats and systematically investigating the mutations they induce. For this study, we assembled the genomes of 16 closely related strains of the bacterial pathogen Mycobacterium tuberculosis from Pacific Biosciences HiFi reads, with the aim of characterizing the full spectrum of DNA polymorphisms. We found that complete and accurate genomes can be assembled from HiFi reads, with read size being the main limitation in the presence of duplications. By combining a reference-free pangenome graph with extensive repeat annotation, we identified 110 variants, 58 of which could be assigned to repeat-associated mutational mechanisms such as strand slippage and homologous recombination. Whilst recombination events were less frequent than point mutations, they affected large regions and introduced multiple variants at once, as shown by three gene conversion events and a duplication of 7.3 kb that involved ppe18 and ppe57, two genes possibly involved in immune subversion. The vast majority of variants were present in single isolates, such that phylogenetic resolution was only marginally increased when estimating a tree from complete genomes. Our study shows that the contribution of repeat-associated mechanisms of mutation can be similar to that of point mutations at the microevolutionary scale of an outbreak. A large reservoir of unstudied genetic variation in this 'monomorphic' bacterial pathogen awaits investigation.
    Keywords:  PE/PPE genes; long-read sequencing; pangenome graph; phylogenetic resolution; recombination; repeats
    DOI:  https://doi.org/10.1099/mgen.0.001396
  27. Genome Biol. 2025 May 02. 26(1): 111
       BACKGROUND: Centromeres play a crucial role in maintaining genomic stability during cell division. They are typically composed of large arrays of tandem satellite repeats, which hinder high-quality assembly and complicate our efforts to understand their evolution across species. Here, we use long-read sequencing to generate near-complete genome assemblies for two Populus and two Salix species belonging to the Salicaceae family and characterize the genetic and epigenetic landscapes of their centromeres.
    RESULTS: The results show that only limited satellite repeats are present as centromeric components in these species, while most of them are located outside the centromere but exhibit a homogenized structure similar to that of the Arabidopsis centromeres. Instead, the Salicaceae centromeres are mainly composed of abundant transposable elements, including CRM and ATHILA, while LINE elements are exclusively discovered in the poplar centromeres. Comparative analysis reveals that these centromeric repeats are extensively expanded and interspersed with satellite arrays in a species-specific and chromosome-specific manner, driving rapid turnover of centromeres both in sequence compositions and genomic locations in the Salicaceae.
    CONCLUSIONS: Our results highlight the dynamic evolution of diverse centromeric landscapes among closely related species mediated by satellite homogenization and widespread invasions of transposable elements and shed further light on the role of centromere in genome evolution and species diversification.
    DOI:  https://doi.org/10.1186/s13059-025-03578-7
  28. Front Plant Sci. 2025 ;16 1573546
       Introduction: The rice improvement process, driven by modern breeding techniques, represents the second revolutionary advancement in rice agronomic traits, following domestication. Advances in pan-genomes and enhanced capacity for analyzing structural variations have increasingly highlighted their role in rice genetic improvement. Transposable element (TE) variants have been previously reported to influence rice genomic diversity during the domestication, but their contribution to the improvement from landraces to improved varieties remains unclear.
    Methods: Here, we combined a high-quality pan-TE variation map, transcriptome profiles, and phenotypic data for 100 landraces and 92 improved varieties to investigate the contribution of TE variations to phenotypic improvement in rice.
    Results: The total number and length of TE variations in improved varieties were significantly greater than those in rice landraces, particularly for Ty3-retrotransposons, LTR Copia and Helitron elements. Comparing landraces and improved varieties, 4,334 selective TEs were detected within or near 3,070 genes that were enriched in basic metabolism and development and stress resistance. Among the 14,076 differentially expressed genes between the two groups, the expression level of 3,480 (24.7%) genes were significantly associated with TE variations. Combining with haplotype analysis, we demonstrated potential patterns of how TEs affect gene expression variation and thereby participate in the improvement of important agronomic traits in rice.
    Discussion: Collectively, our results highlight the contributions of TE variations to rice improvement in shaping the genetic basis of modern rice varieties and will facilitate the exploration of superior genes and advance molecular breeding efforts in rice.
    Keywords:  improvement; molecular breeding; rice; super pan-genome; transposable element
    DOI:  https://doi.org/10.3389/fpls.2025.1573546
  29. Plant J. 2025 Apr;122(2): e70153
      Centromeres in eukaryotes are defined by the presence of histone H3 variant CENP-A/CENH3. Chlamydomonas encodes two predicted CENH3 paralogs, CENH3.1 and CENH3.2, that have not been previously characterized. We generated peptide antibodies to unique N-terminal epitopes for each of the two predicted Chlamydomonas CENH3 paralogs as well as an antibody against a shared CENH3 epitope. All three CENH3 antibodies recognized proteins of the expected size on immunoblots and had punctate nuclear immunofluorescence staining patterns. These results are consistent with both paralogs being expressed and localized to centromeres. CRISPR-Cas9-mediated insertional mutagenesis was used to generate predicted null mutations in either CENH3.1 or CENH3.2. Single mutants were viable but cenh3.1 cenh3.2 double mutants were not recovered, confirming that the function of CENH3 is essential. We sequenced and assembled two chromosome-scale Chlamydomonas genomes from strains CC-400 and UL-1690 (a derivative of CC-1690) with complete centromere sequences for 17/17 and 14/17 chromosomes respectively, enabling us to compare centromere evolution across four isolates with near complete assemblies. These data revealed significant changes across isolates between homologous centromeres including mobility and degeneration of ZeppL-LINE1 (ZeppL) transposons that comprise the major centromere repeat sequence in Chlamydomonas. We used cleavage under targets and tagmentation (CUT&Tag) to purify and map CENH3-bound genomic sequences and found enrichment of CENH3-binding almost exclusively at predicted centromere regions. An interesting exception was chromosome 2 in UL-1690, which had enrichment at its genetically mapped centromere repeat region as well as a second, distal location, centered around a single recently acquired ZeppL insertion. The CENH3-bound regions of the 17 Chlamydomonas centromeres ranged from 63.5 kb (average lower estimate) to 175 kb (average upper estimate). The relatively small size of its centromeres suggests that Chlamydomonas may be a useful organism for testing and deploying artificial chromosome technologies.
    Keywords:  CRISPR‐Cas9; Histone H3 variant; artificial chromosome; centromere; genome assembly; neocentromere; transposon
    DOI:  https://doi.org/10.1111/tpj.70153
  30. Sci Rep. 2025 Apr 29. 15(1): 14997
      Micronuclei originate from DNA damage generated by clastogenic and/or by aneugenic effects. Depending on the pattern of damage, they may have distinct genomic origin and composition. Sequences of the centromere, telomere and rDNA have been identified in plant micronuclei. However, other DNA sequences may also be present in the micronuclei, as well as their DNA contents may be different. Here, we investigate the DNA content, genomic composition and origin of micronuclei induced in Zea mays by methyl methanesulfonate (MMS). DNA contents showed a wide range of distribution, suggesting their diverse genomic origins and illustrating how much of the nuclear genome can be lost due to mutagen effects. Micronuclei diversity was also evidenced by in situ probing with different DNA sequences (5S and 18S rDNAs, 180-bp knob and Grande LTR-retrotransposon) and by 6-diamidino-2 phenylindole (DAPI) fluorochrome. Perhaps these sequences are hotspots for MMS damage, especially the Grande LTR-retrotransposon, 5S and 18S rDNAs, which are rich in guanine. In addition, probe pools were constructed from individual genomic DNA of two microdissected micronuclei. These probe pools hybridized on all Z. mays chromosomes. However, the centromere, knob and secondary constriction were hybridized by only one probe pool, evidencing the distinct genomic composition of the micronuclei. We illustrate the micronuclei genomic diversity as they originated from several different chromosomes following the MMS treatment, and demonstrate the extent of the genotoxic damage to the genome. We provide some insights into micronuclei structure and diversity, and show that they can be further explored in mutagenesis research.
    Keywords:  DNA content; Maize; Microdissection; Mutagenesis; Repetitive DNA sequences
    DOI:  https://doi.org/10.1038/s41598-025-99560-1
  31. Plants (Basel). 2025 Apr 14. pii: 1205. [Epub ahead of print]14(8):
      Retrozymes are a class of non-autonomous plant retrotransposons that have long terminal repeats (LTRs) containing hammerhead ribozymes (HHRs) that facilitate the circularization of the retrozyme RNA. The LTR of Nicotiana benthamiana retrozyme 1 (NbRZ1) has been shown to contain a promoter that directs transcription of this retroelement. In this study, we identified the transcription start site of the promoter contained in the LTR of NbRZ1 and mapped the promoter region essential for its transcriptional activity. Using transgenic Arabidopsis thaliana plants carrying the GUS gene under the control of the NbRZ1 LTR, the NbRZ1 transcript was demonstrated to potentially encode a protein targeted for proteasomal degradation in the plant cell. Overexpression of this protein in plants using a viral expression vector was found to cause severe necrosis. The data presented suggest a tight regulation of the expression of the NbRZ1-encoded polypeptide in plants and its potential functional importance, although further research is needed to determine whether circular and/or linear retrozyme RNA forms can be translated in plants.
    Keywords:  LTR promoter; circRNA; circRNA translation; long terminal repeat; retroelements; retrozymes; ribozymes; transcription
    DOI:  https://doi.org/10.3390/plants14081205