bims-arihec Biomed News
on Artificial intelligence in healthcare
Issue of 2020‒03‒22
thirteen papers selected by
Céline Bélanger
Cogniges Inc.


  1. Database (Oxford). 2020 Jan 01. pii: baaa010. [Epub ahead of print]2020
    Ahmed Z, Mohamed K, Zeeshan S, Dong X.
      Precision medicine is one of the recent and powerful developments in medical care, which has the potential to improve the traditional symptom-driven practice of medicine, allowing earlier interventions using advanced diagnostics and tailoring better and economically personalized treatments. Identifying the best pathway to personalized and population medicine involves the ability to analyze comprehensive patient information together with broader aspects to monitor and distinguish between sick and relatively healthy people, which will lead to a better understanding of biological indicators that can signal shifts in health. While the complexities of disease at the individual level have made it difficult to utilize healthcare information in clinical decision-making, some of the existing constraints have been greatly minimized by technological advancements. To implement effective precision medicine with enhanced ability to positively impact patient outcomes and provide real-time decision support, it is important to harness the power of electronic health records by integrating disparate data sources and discovering patient-specific patterns of disease progression. Useful analytic tools, technologies, databases, and approaches are required to augment networking and interoperability of clinical, laboratory and public health systems, as well as addressing ethical and social issues related to the privacy and protection of healthcare data with effective balance. Developing multifunctional machine learning platforms for clinical data extraction, aggregation, management and analysis can support clinicians by efficiently stratifying subjects to understand specific scenarios and optimize decision-making. Implementation of artificial intelligence in healthcare is a compelling vision that has the potential in leading to the significant improvements for achieving the goals of providing real-time, better personalized and population medicine at lower costs. In this study, we focused on analyzing and discussing various published artificial intelligence and machine learning solutions, approaches and perspectives, aiming to advance academic solutions in paving the way for a new data-centric era of discovery in healthcare.
    DOI:  https://doi.org/10.1093/database/baaa010
  2. NPJ Digit Med. 2020 ;3 31
    Kalra S, Tizhoosh HR, Shah S, Choi C, Damaskinos S, Safarpoor A, Shafiei S, Babaie M, Diamandis P, Campbell CJV, Pantanowitz L.
      The emergence of digital pathology has opened new horizons for histopathology. Artificial intelligence (AI) algorithms are able to operate on digitized slides to assist pathologists with different tasks. Whereas AI-involving classification and segmentation methods have obvious benefits for image analysis, image search represents a fundamental shift in computational pathology. Matching the pathology of new patients with already diagnosed and curated cases offers pathologists a new approach to improve diagnostic accuracy through visual inspection of similar cases and computational majority vote for consensus building. In this study, we report the results from searching the largest public repository (The Cancer Genome Atlas, TCGA) of whole-slide images from almost 11,000 patients. We successfully indexed and searched almost 30,000 high-resolution digitized slides constituting 16 terabytes of data comprised of 20 million 1000 × 1000 pixels image patches. The TCGA image database covers 25 anatomic sites and contains 32 cancer subtypes. High-performance storage and GPU power were employed for experimentation. The results were assessed with conservative "majority voting" to build consensus for subtype diagnosis through vertical search and demonstrated high accuracy values for both frozen section slides (e.g., bladder urothelial carcinoma 93%, kidney renal clear cell carcinoma 97%, and ovarian serous cystadenocarcinoma 99%) and permanent histopathology slides (e.g., prostate adenocarcinoma 98%, skin cutaneous melanoma 99%, and thymoma 100%). The key finding of this validation study was that computational consensus appears to be possible for rendering diagnoses if a sufficiently large number of searchable cases are available for each cancer subtype.
    Keywords:  Cancer imaging; Data mining; Machine learning
    DOI:  https://doi.org/10.1038/s41746-020-0238-2
  3. J Neural Eng. 2020 Mar 19.
    Jin W, Fatehi M, Abhishek K, Mallya M, Toyota B, Hamarneh G.
      Primary brain tumors including gliomas continue to pose significant management challenges to clinicians. While the presentation, the pathology, and the clinical course of these lesions are variable, the initial investigations are usually similar. Patients who are suspected to have a brain tumor will be assessed with computed tomography (CT) and magnetic resonance imaging (MRI). The imaging findings are used by neurosurgeons to determine the feasibility of surgical resection and plan such an undertaking. Imaging studies are also an indispensable tool in tracking tumor progression or its response to treatment. As these imaging studies are non-invasive, relatively cheap and accessible to patients, there have been many efforts over the past two decades to increase the amount of clinically-relevant information that can be extracted from brain imaging. Most recently, artificial intelligence (AI) techniques have been employed to segment and characterize brain tumors, as well as to detect progression or treatment-response. However, the clinical utility of such endeavours remains limited due to challenges in data collection and annotation, model training, and the reliability of AI-generated information. We provide a review of recent advances in addressing the above challenges. First, to overcome the challenge of data paucity, different image imputation and synthesis techniques along with annotation collection efforts are summarized. Next, various training strategies are presented to meet multiple desiderata, such as model performance, generalization ability, data privacy protection, and learning with sparse annotations. Finally, standardized performance evaluation and model interpretability methods have been reviewed. We believe that these technical approaches will facilitate the development of a fully-functional AI tool in the clinical care of patients with gliomas.
    Keywords:  Brain radiomics; Deep learning; Glioma imaging; Machine learning
    DOI:  https://doi.org/10.1088/1741-2552/ab8131
  4. Neurorehabil Neural Repair. 2020 Mar 20. 1545968320909796
    Tozlu C, Edwards D, Boes A, Labar D, Tsagaris KZ, Silverstein J, Pepper Lane H, Sabuncu MR, Liu C, Kuceyeski A.
      Background. Accurate prediction of clinical impairment in upper-extremity motor function following therapy in chronic stroke patients is a difficult task for clinicians but is key in prescribing appropriate therapeutic strategies. Machine learning is a highly promising avenue with which to improve prediction accuracy in clinical practice. Objectives. The objective was to evaluate the performance of 5 machine learning methods in predicting postintervention upper-extremity motor impairment in chronic stroke patients using demographic, clinical, neurophysiological, and imaging input variables. Methods. A total of 102 patients (female: 31%, age 61 ± 11 years) were included. The upper-extremity Fugl-Meyer Assessment (UE-FMA) was used to assess motor impairment of the upper limb before and after intervention. Elastic net (EN), support vector machines, artificial neural networks, classification and regression trees, and random forest were used to predict postintervention UE-FMA. The performances of methods were compared using cross-validated R2. Results. EN performed significantly better than other methods in predicting postintervention UE-FMA using demographic and baseline clinical data (median REN2=0.91,RRF2=0.88,RANN2=0.83,RSVM2=0.79,RCART2=0.70; P < .05). Preintervention UE-FMA and the difference in motor threshold (MT) between the affected and unaffected hemispheres were the strongest predictors. The difference in MT had greater importance than the absence or presence of a motor-evoked potential (MEP) in the affected hemisphere. Conclusion. Machine learning methods may enable clinicians to accurately predict a chronic stroke patient's postintervention UE-FMA. Interhemispheric difference in the MT is an important predictor of chronic stroke patients' response to therapy and, therefore, could be included in prospective studies.
    Keywords:  Fugl-Meyer Assessment; chronic stroke; machine learning; predictive models; white matter disconnectivity
    DOI:  https://doi.org/10.1177/1545968320909796
  5. Eur J Radiol. 2020 Mar 05. pii: S0720-048X(20)30107-8. [Epub ahead of print]126 108918
    Winkel DJ, Weikert TJ, Breit HC, Chabin G, Gibson E, Heye TJ, Comaniciu D, Boll DT.
      PURPOSE: To evaluate the performance of an artificial intelligence (AI) based software solution tested on liver volumetric analyses and to compare the results to the manual contour segmentation.MATERIALS AND METHODS: We retrospectively obtained 462 multiphasic CT datasets with six series for each patient: three different contrast phases and two slice thickness reconstructions (1.5/5 mm), totaling 2772 series. AI-based liver volumes were determined using multi-scale deep-reinforcement learning for 3D body markers detection and 3D structure segmentation. The algorithm was trained for liver volumetry on approximately 5000 datasets. We computed the absolute error of each automatically- and manually-derived volume relative to the mean manual volume. The mean processing time/dataset and method was recorded. Variations of liver volumes were compared using univariate generalized linear model analyses. A subgroup of 60 datasets was manually segmented by three radiologists, with a further subgroup of 20 segmented three times by each, to compare the automatically-derived results with the ground-truth.
    RESULTS: The mean absolute error of the automatically-derived measurement was 44.3 mL (representing 2.37 % of the averaged liver volumes). The liver volume was neither dependent on the contrast phase (p = 0.697), nor on the slice thickness (p = 0.446). The mean processing time/dataset with the algorithm was 9.94 s (sec) compared to manual segmentation with 219.34 s. We found an excellent agreement between both approaches with an ICC value of 0.996.
    CONCLUSION: The results of our study demonstrate that AI-powered fully automated liver volumetric analyses can be done with excellent accuracy, reproducibility, robustness, speed and agreement with the manual segmentation.
    Keywords:  Algorithms; Artificial intelligence; Liver; Reproducibility of results; Tomography; X-ray computed
    DOI:  https://doi.org/10.1016/j.ejrad.2020.108918
  6. Front Oncol. 2020 ;10 248
    Shen X, Yang F, Yang P, Yang M, Xu L, Zhuo J, Wang J, Lu D, Liu Z, Zheng SS, Niu T, Xu X.
      Background: Serous cystadenoma (SCA), mucinous cystadenoma (MCN), and intraductal papillary mucinous neoplasm (IPMN) are three subtypes of pancreatic cystic neoplasm (PCN). Due to the potential of malignant-transforming, patients with MCN and IPMN require radical surgery while patients with SCA need periodic surveillance. However, accurate pre-surgery diagnosis between SCA, MCN, and IPMN remains challenging in the clinic. Methods: This study enrolled 164 patients including 76 with SCA, 40 with MCN and 48 with IPMN. Patients were randomly split into a training cohort (n = 115) and validation cohort (n = 41). We performed statistical analysis and Boruta method to screen significantly distinct clinical factors and radiomics features extracted on pre-surgery contrast-enhanced computed tomography (CECT) images among three subtypes. Three reliable machine-learning algorithms, support vector machine (SVM), random forest (RF) and artificial neural network (ANN), were utilized to construct classifiers based on important radiomics features and clinical parameters. Precision, recall, and F1-score were calculated to assess the performance of the constructed classifiers. Results: Nine of 547 radiomics features and eight clinical factors showed a significant difference among SCA, MCN, and IPMN. Five radiomics features (Histogram_Entropy, Histogram_Skeweness, LLL_GLSZM_GLV, Histogram_Uniformity, HHL_Histogram_Kurtosis), and four clinical factors, including serum carbohydrate antigen 19-9, sex, age, and serum carcinoembryonic antigen, were identified important by Boruta method. The SVM classifier achieved an overall accuracy of 73.04% in training cohort and 71.43% in validation cohort, respectively. The RF classifier achieved overall accuracy of 84.35 and 79.59%, respectively. The constructed ANN model showed an overall accuracy of 77.39% in the training dataset and 71.43% in the validation dataset. All the three classifiers showed high F1 score for differentiation among the three subtypes. Conclusion: Our study proved the feasibility and translational value of CECT-based radiomics classifiers for differentiation among SCA, MCN, and IPMN.
    Keywords:  contrast-enhanced computed tomography; differentiation diagnosis; machine learning; pancreatic cystic neoplasm; radiomics
    DOI:  https://doi.org/10.3389/fonc.2020.00248
  7. Ann Transl Med. 2020 Feb;8(4): 82
    Luo Y, Tang Z, Hu X, Lu S, Miao B, Hong S, Bai H, Sun C, Qiu J, Liang H, Na N.
      Background: Pneumonia accounts for the majority of infection-related deaths after kidney transplantation. We aimed to build a predictive model based on machine learning for severe pneumonia in recipients of deceased-donor transplants within the perioperative period after surgery.Methods: We collected the features of kidney transplant recipients and used a tree-based ensemble classification algorithm (Random Forest or AdaBoost) and a nonensemble classifier (support vector machine, Naïve Bayes, or logistic regression) to build the predictive models. We used the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC) to evaluate the predictive performance via ten-fold cross validation.
    Results: Five hundred nineteen patients who underwent transplantation from January 2015 to December 2018 were included. Forty-three severe pneumonia episodes (8.3%) occurred during hospitalization after surgery. Significant differences in the recipients' age, diabetes status, HBsAg level, operation time, reoperation, usage of anti-fungal drugs, preoperative albumin and immunoglobulin levels, preoperative pulmonary lesions, and delayed graft function, as well as donor age, were observed between patients with and without severe pneumonia (P<0.05). We screened eight important features correlated with severe pneumonia using the recursive feature elimination method and then constructed a predictive model based on these features. The top three features were preoperative pulmonary lesions, reoperation and recipient age (with importance scores of 0.194, 0.124 and 0.078, respectively). Among the machine learning algorithms described above, the Random Forest algorithm displayed better predictive performance, with a sensitivity of 0.67, specificity of 0.97, positive likelihood ratio of 22.33, negative likelihood ratio of 0.34, AUROC of 0.91, and AUPRC of 0.72.
    Conclusions: The Random Forest model is potentially useful for predicting severe pneumonia in kidney transplant recipients. Recipients with a potential preoperative potential pulmonary infection, who are of older age and who require reoperation should be monitored carefully to prevent the occurrence of severe pneumonia.
    Keywords:  Kidney transplantation; deceased donor; machine learning; predictive models; severe pneumonia
    DOI:  https://doi.org/10.21037/atm.2020.01.09
  8. Eur Cardiol. 2020 Feb;15 1-7
    Heseltine TD, Murray SW, Ruzsics B, Fisher M.
      Recent rapid technological advancements in cardiac CT have improved image quality and reduced radiation exposure to patients. Furthermore, key insights from large cohort trials have helped delineate cardiovascular disease risk as a function of overall coronary plaque burden and the morphological appearance of individual plaques. The advent of CT-derived fractional flow reserve promises to establish an anatomical and functional test within one modality. Recent data examining the short-term impact of CT-derived fractional flow reserve on downstream care and clinical outcomes have been published. In addition, machine learning is a concept that is being increasingly applied to diagnostic medicine. Over the coming decade, machine learning will begin to be integrated into cardiac CT, and will potentially make a tangible difference to how this modality evolves. The authors have performed an extensive literature review and comprehensive analysis of the recent advances in cardiac CT. They review how recent advances currently impact on clinical care and potential future directions for this imaging modality.
    Keywords:  CT coronary angiography; Cardiac CT; atherosclerosis; cardiovascular disease risk; coronary artery calcium score; coronary artery disease; fractional flow reserve CT; machine learning
    DOI:  https://doi.org/10.15420/ecr.2019.14.2
  9. Eur J Radiol. 2020 Mar 09. pii: S0720-048X(20)30114-5. [Epub ahead of print]126 108925
    Blüthgen C, Becker AS, Vittoria de Martini I, Meier A, Martini K, Frauenfelder T.
      PURPOSE: To evaluate a deep learning based image analysis software for the detection and localization of distal radius fractures.METHOD: A deep learning system (DLS) was trained on 524 wrist radiographs (166 showing fractures). Performance was tested on internal (100 radiographs, 42 showing fractures) and external test sets (200 radiographs, 100 showing fractures). Single and combined views of the radiographs were shown to DLS and three readers. Readers were asked to indicate fracture location with regions of interest (ROI). The DLS yielded scores (range 0-1) and a heatmap. Detection performance was expressed as AUC, sensitivity and specificity at the optimal threshold and compared to radiologists' performance. Heatmaps were compared to radiologists' ROIs.
    RESULTS: The DLS showed excellent performance on the internal test set (AUC 0.93 (95% confidence interval (CI) 0.82-0.98) - 0.96 (0.87-1.00), sensitivity 0.81 (0.58-0.95) - 0.90 (0.70-0.99), specificity 0.86 (0.68-0.96) - 1.0 (0.88-1.0)). DLS performance decreased on the external test set (AUC 0.80 (0.71-0.88) - 0.89 (0.81-0.94), sensitivity 0.64 (0.49-0.77) - 0.92 (0.81-0.98), specificity 0.60 (0.45-0.74) - 0.90 (0.78-0.97)). Radiologists' performance was comparable on internal data (sensitivity 0.71 (0.48-0.89) - 0.95 (0.76-1.0), specificity 0.52 (0.32-0.71) - 0.97 (0.82-1.0)) and better on external data (sensitivity 0.88 (0.76-0.96) - 0.98 (0.89-1.0), specificities 0.66 (0.51-0.79) - 1.0 (0.93-1.0), p < 0.05). In over 90%, the areas of peak activation aligned with radiologists' annotations.
    CONCLUSIONS: The DLS was able to detect and localize wrist fractures with a performance comparable to radiologists, using only a small dataset for training.
    DOI:  https://doi.org/10.1016/j.ejrad.2020.108925
  10. Otol Neurotol. 2020 Apr;41(4): 452-457
    Waltzman SB, Kelsall DC.
      OBJECTIVE: Cochlear implant (CI) technology and techniques have advanced over the years. There has not been the same degree of change in programming and there remains a lack of standardization techniques. The purpose of this study is to compare performance in cochlear implant subjects using experienced clinician (EC) standard programming methods versus an Artificial Intelligence, FOX based algorithm for programming.STUDY DESIGN: Prospective, nonrandomized, multicenter study using within-subject experimental design SETTING:: Tertiary referral centers.
    PATIENTS: Fifty-five adult patients with ≥ 3 months experience with a Nucleus 5, 6, Kanso, or 7 series sound processor.
    INTERVENTION: Therapeutic Main Outcome Measures: CNC words and AzBio sentences in noise (+10 dB SNR) tests were administered in a soundproof booth followed by a direct connect psychoacoustic battery using the EC program. Tests were repeated 1 month later using the optimized FOX program. Subjective measures of patient satisfaction were also measured.
    RESULTS: Performance for the EC program was compared to the FOX program for both measures. Group mean results revealed equivalent performance (Kruskal-Wallis ANOVA p = 0.934) with both programming methods. While some patients had better performance with the FOX method and some performed more poorly, the majority had equivalent performance and preferred the FOX system.
    CONCLUSION: The study demonstrated that on average, FOX outcomes are equivalent to those using traditional programming techniques. In addition, the FOX programming method can effect standardization across centers and increase access for many individuals who could benefit.
    DOI:  https://doi.org/10.1097/MAO.0000000000002566
  11. BMC Cancer. 2020 Mar 17. 20(1): 227
    Kawauchi K, Furuya S, Hirata K, Katoh C, Manabe O, Kobayashi K, Watanabe S, Shiga T.
      BACKGROUND: As the number of PET/CT scanners increases and FDG PET/CT becomes a common imaging modality for oncology, the demands for automated detection systems on artificial intelligence (AI) to prevent human oversight and misdiagnosis are rapidly growing. We aimed to develop a convolutional neural network (CNN)-based system that can classify whole-body FDG PET as 1) benign, 2) malignant or 3) equivocal.METHODS: This retrospective study investigated 3485 sequential patients with malignant or suspected malignant disease, who underwent whole-body FDG PET/CT at our institute. All the cases were classified into the 3 categories by a nuclear medicine physician. A residual network (ResNet)-based CNN architecture was built for classifying patients into the 3 categories. In addition, we performed a region-based analysis of CNN (head-and-neck, chest, abdomen, and pelvic region).
    RESULTS: There were 1280 (37%), 1450 (42%), and 755 (22%) patients classified as benign, malignant and equivocal, respectively. In the patient-based analysis, CNN predicted benign, malignant and equivocal images with 99.4, 99.4, and 87.5% accuracy, respectively. In region-based analysis, the prediction was correct with the probability of 97.3% (head-and-neck), 96.6% (chest), 92.8% (abdomen) and 99.6% (pelvic region), respectively.
    CONCLUSION: The CNN-based system reliably classified FDG PET images into 3 categories, indicating that it could be helpful for physicians as a double-checking system to prevent oversight and misdiagnosis.
    Keywords:  Convolutional neural network; Deep learning; FDG; PET
    DOI:  https://doi.org/10.1186/s12885-020-6694-x
  12. Am J Gastroenterol. 2020 Mar 18.
    Bang JY, Hough M, Hawes RH, Varadarajulu S.
      OBJECTIVES: Exposure to ionizing radiation remains a hazard for patients and healthcare providers. We evaluated the utility of an artificial intelligence (AI)-enabled fluoroscopy system to minimize radiation exposure during image-guided endoscopic procedures.METHODS: We conducted a prospective study of 100 consecutive patients who underwent fluoroscopy-guided endoscopic procedures. Patients underwent interventions using either conventional or AI-equipped fluoroscopy system that uses ultrafast collimation to limit radiation exposure to the region of interest. The main outcome measure was to compare radiation exposure with patients, which was measured by dose area product. Secondary outcome was radiation scatter to endoscopy personnel measured using dosimeter.
    RESULTS: Of 100 patients who underwent procedures using traditional (n = 50) or AI-enabled (n = 50) fluoroscopy systems, there was no significant difference in demographics, body mass index, procedural type, and procedural or fluoroscopy time between the conventional and the AI-enabled fluoroscopy systems. Radiation exposure to patients was lower (median dose area product 2,178 vs 5,708 mGym, P = 0.001) and scatter effect to endoscopy personnel was less (total deep dose equivalent 0.28 vs 0.69 mSv; difference of 59.4%) for AI-enabled fluoroscopy as compared to conventional system. On multivariate linear regression analysis, after adjusting for patient characteristics, procedural/fluoroscopy duration, and type of fluoroscopy system, only AI-equipped fluoroscopy system (coefficient 3,331.9 [95% confidence interval: 1,926.8-4,737.1, P < 0.001) and fluoroscopy duration (coefficient 813.2 [95% confidence interval: 640.5-985.9], P < 0.001) were associated with radiation exposure.
    DISCUSSION: The AI-enabled fluoroscopy system significantly reduces radiation exposure to patients and scatter effect to endoscopy personnel (see Graphical abstract, Supplementary Digital Content, http://links.lww.com/AJG/B461).
    DOI:  https://doi.org/10.14309/ajg.0000000000000565
  13. Gland Surg. 2020 Feb;9(Suppl 2): S77-S85
    Barczyński M, Stopa-Barczyńska M, Wojtczak B, Czarniecka A, Konturek A.
      Background: In recent years well-recognized scientific societies introduced guidelines for ultrasound (US) malignancy risk stratification of thyroid nodules. These guidelines categorize the risk of malignancy in relation to a combination of several US features. Based on these US image lexicons an US-based computer-aided diagnosis (CAD) systems were developed. Nevertheless, their clinical utility has not been evaluated in any study of surgeon-performed office US of the thyroid. Hence, the aim of this pilot study was to validate s-DetectTM mode in semi-automated US classification of thyroid lesions during surgeon-performed office US.Methods: This is a prospective study of 50 patients who underwent surgeon-performed thyroid US (basic US skills without CAD vs. with CAD vs. expert US skills without CAD) in the out-patient office as part of the preoperative workup. The real-time CAD system software using artificial intelligence (S-DetectTM for Thyroid; Samsung Medison Co.) was integrated into the RS85 US system. Primary outcome was CAD system added-value to the surgeon-performed office US evaluation. Secondary outcomes were: diagnostic accuracy of CAD system, intra and interobserver variability in the US assessment of thyroid nodules. Surgical pathology report was used to validate the pre-surgical diagnosis.
    Results: CAD system added-value to thyroid assessment by a surgeon with basic US skills was equal to 6% (overall accuracy of 82% for evaluation with CAD vs. 76% for evaluation without CAD system; P<0.001), and final diagnosis was different than predicted by US assessment in 3 patients (1 more true-positive and 2 more true-negative results). However, CAD system was inferior to thyroid assessment by a surgeon with expert US skills in 6 patients who had false-positive results (P<0.001).
    Conclusions: The sensitivity and negative predictive value of CAD system for US classification of thyroid lesions were similar as surgeon with expert US skills whereas specificity and positive predictive value were significantly inferior but markedly better than judgement of a surgeon with basic US skills alone.
    Keywords:  Thyroid lesions; artificial intelligence; computer-aided diagnosis (CAD); thyroid cancer; thyroid ultrasound
    DOI:  https://doi.org/10.21037/gs.2019.12.23