Comput Struct Biotechnol J. 2022 ;20 3718-3728
G Mazzocchetti,
A Poletti,
V Solli,
E Borsi,
M Martello,
I Vigliotta,
S Armuzzi,
B Taurisano,
E Zamagni,
M Cavo,
C Terragna.
Human cancer arises from a population of cells that have acquired a wide range of genetic alterations, most of which are targets of therapeutic treatments or are used as prognostic factors for patient's risk stratification. Among these, copy number alterations (CNAs) are quite frequent. Currently, several molecular biology technologies, such as microarrays, NGS and single-cell approaches are used to define the genomic profile of tumor samples. Output data need to be analyzed with bioinformatic approaches and particularly by employing computational algorithms. Molecular biology tools estimate the baseline region by comparing either the mean probe signals, or the number of reads to the reference genome. However, when tumors display complex karyotypes, this type of approach could fail the baseline region estimation and consequently cause errors in the CNAs call. To overcome this issue, we designed an R-package, BoBafit , able to check and, eventually, to adjust the baseline region, according to both the tumor-specific alterations' context and the sample-specific clustered genomic lesions. Several databases have been chosen to set up and validate the designed package, thus demonstrating the potential of BoBafit to adjust copy number (CN) data from different tumors and analysis techniques. Relevantly, the analysis highlighted that up to 25% of samples need a baseline region adjustment and a redefinition of CNAs calls, thus causing a change in the prognostic risk classification of the patients. We support the implementation of BoBafit within CN analysis bioinformatics pipelines to ensure a correct patient's stratification in risk categories, regardless of the tumor type.
Keywords: BAF, B-allele frequency; Baseline region; Bioinformatic pipeline; Breast cancer; CN, Copy number; CNAs, Copy number alterations; CNVs, Copy Number Variations; CR, Correction Factor; Clustering methods; Copy number alteration; Data correction; F-CL, Final Chromosome List; FISH, Fluorescence In Situ Hybridization; HD, Hyperdiploidy; HR, High Risk; LOH, Loss of Heterozygosity; MM, Multiple Myeloma; Multiple myeloma; NGS, Next Generation Sequencing; R-ISS, Revised International Staging System; S-CL, Starting Chromosome List; SNP, Single-Nucleotide Polymorphism; SR, Standard Risk; WES, Whole Exome Sequencing; WGD, Whole-genome doubling