Comput Struct Biotechnol J. 2021 ;19 3069-3076
Codon degeneracy of amino acid sequences permits an additional "mRNP code" layer underlying the genetic code that is related to RNA processing. In pre-mRNA splicing, splice site usage is determined by both intrinsic strength and sequence context providing RNA binding sites for splicing regulatory proteins. In this study, we systematically examined modification of splicing regulatory properties in the neighborhood of a GT site, i.e. potential splice site, without altering the encoded amino acids. We quantified the splicing regulatory properties of the neighborhood around a potential splice site by its Splice Site HEXplorer Weight (SSHW) based on the HEXplorer score algorithm. To systematically modify GT site neighborhoods, either minimizing or maximizing their SSHW, we designed the novel stochastic optimization algorithm ModCon that applies a genetic algorithm with stochastic crossover, insertion and random mutation elements supplemented by a heuristic sliding window approach. To assess the achievable range in SSHW in human splice donors without altering the encoded amino acids, we applied ModCon to a set of 1000 randomly selected Ensembl annotated human splice donor sites, achieving substantial and accurate changes in SSHW. Using ModCon optimization, we successfully switched splice donor usage in a splice site competition reporter containing coding sequences from FANCA, FANCB or BRCA2, while retaining their amino acid coding information. The ModCon algorithm and its R package implementation can assist in reporter design by either introducing novel splice sites, silencing accidental, undesired splice sites, and by generally modifying the entire mRNP code while maintaining the genetic code.
Keywords: A, adenine; F1, filial sequence 1; G, guanine; GA, genetic algorithm; HBS, HBond score; HBond score; HEXplorer score; HZEI, HEXplorer score; P1, parental sequence 1; SA, splice acceptor; SD, splice donor; SR proteins, serine- and arginine-rich proteins; SRP, splicing regulatory protein; SSHW, splice site HEXplorer weight; SW, sliding window; Splice donor; Splicing regulatory proteins; Splicing reporter; T, thymine; eGFP, enhanced green fluorescent protein; hnRNP, heterogeneous nuclear ribonucleoproteins; nt, nucleotides; pre-mRNA splicing; pre-mRNA, precursor messenger RNA; snRNA, small nuclear RNA