bims-arihec Biomed News
on Artificial Intelligence in Healthcare
Issue of 2020‒01‒12
twenty-six papers selected by
Céline Bélanger
Cogniges Inc.


  1. Neurospine. 2019 Dec;16(4): 643-653
      Developments in machine learning in recent years have precipitated a surge in research on the applications of artificial intelligence within medicine. Machine learning algorithms are beginning to impact medicine broadly, and the field of spine surgery is no exception. Electronic medical records are a key source of medical data that can be leveraged for the creation of clinically valuable machine learning algorithms. This review examines the current state of machine learning using electronic medical records as it applies to spine surgery. Studies across the electronic medical record data domains of imaging, text, and structured data are reviewed. Discussed applications include clinical prognostication, preoperative planning, diagnostics, and dynamic clinical assistance, among others. The limitations and future challenges for machine learning research using electronic medical records are also discussed.
    Keywords:  Artificial intelligence; Deep learning; Electronic medical records; Machine learning; Spine surgery
    DOI:  https://doi.org/10.14245/ns.1938386.193
  2. Front Neurosci. 2019 ;13 1346
      The use of Artificial Intelligence and machine learning in basic research and clinical neuroscience is increasing. AI methods enable the interpretation of large multimodal datasets that can provide unbiased insights into the fundamental principles of brain function, potentially paving the way for earlier and more accurate detection of brain disorders and better informed intervention protocols. Despite AI's ability to create accurate predictions and classifications, in most cases it lacks the ability to provide a mechanistic understanding of how inputs and outputs relate to each other. Explainable Artificial Intelligence (XAI) is a new set of techniques that attempts to provide such an understanding, here we report on some of these practical approaches. We discuss the potential value of XAI to the field of neurostimulation for both basic scientific inquiry and therapeutic purposes, as well as, outstanding questions and obstacles to the success of the XAI approach.
    Keywords:  behavioral paradigms; closed-loop neurostimulation; computational psychiatry; data-driven discoveries of brain circuit theories; explain AI; machine learning; neuro-behavioral decisions systems
    DOI:  https://doi.org/10.3389/fnins.2019.01346
  3. Neurospine. 2019 Dec;16(4): 686-694
      Adult spinal deformity (ASD) is a complex disease that significantly affects the lives of many patients. Surgical correction has proven to be effective in achieving improvement of spinopelvic parameters as well as improving quality of life (QoL) for these patients. However, given the relatively high complication risk associated with ASD correction, it is of paramount importance to develop robust prognostic tools for predicting risk profile and outcomes. Historically, statistical models such as linear and logistic regression models were used to identify preoperative factors associated with postoperative outcomes. While these tools were useful for looking at simple associations, they represent generalizations across large populations, with little applicability to individual patients. More recently, predictive analytics utilizing artificial intelligence (AI) through machine learning for comprehensive processing of large amounts of data have become available for surgeons to implement. The use of these computational techniques has given surgeons the ability to leverage far more accurate and individualized predictive tools to better inform individual patients regarding predicted outcomes after ASD correction surgery. Applications range from predicting QoL measures to predicting the risk of major complications, hospital readmission, and reoperation rates. In addition, AI has been used to create a novel classification system for ASD patients, which will help surgeons identify distinct patient subpopulations with unique risk-benefit profiles. Overall, these tools will help surgeons tailor their clinical practice to address patients' individual needs and create an opportunity for personalized medicine within spine surgery.
    Keywords:  Artificial intelligence; Machine learning; Spinal deformity; Technology
    DOI:  https://doi.org/10.14245/ns.1938414.207
  4. Med Res Rev. 2020 Jan 10.
      Discovery and development of biopeptides are time-consuming, laborious, and dependent on various factors. Data-driven computational methods, especially machine learning (ML) approach, can rapidly and efficiently predict the utility of therapeutic peptides. ML methods offer an array of tools that can accelerate and enhance decision making and discovery for well-defined queries with ample and sophisticated data quality. Various ML approaches, such as support vector machines, random forest, extremely randomized tree, and more recently deep learning methods, are useful in peptide-based drug discovery. These approaches leverage the peptide data sets, created via high-throughput sequencing and computational methods, and enable the prediction of functional peptides with increased levels of accuracy. The use of ML approaches in the development of peptide-based therapeutics is relatively recent; however, these techniques are already revolutionizing protein research by unraveling their novel therapeutic peptide functions. In this review, we discuss several ML-based state-of-the-art peptide-prediction tools and compare these methods in terms of their algorithms, feature encodings, prediction scores, evaluation methodologies, and software utilities. We also assessed the prediction performance of these methods using well-constructed independent data sets. In addition, we discuss the common pitfalls and challenges of using ML approaches for peptide therapeutics. Overall, we show that using ML models in peptide research can streamline the development of targeted peptide therapies.
    Keywords:  artificial intelligence; disease; machine learning; peptide therapeutics; random forest; support vector machine
    DOI:  https://doi.org/10.1002/med.21658
  5. Hepatology. 2020 Jan 06.
      Machine learning utilizes artificial intelligence to generate predictive models efficiently and more effectively than conventional methods through detection of hidden patterns within large data sets. With this in mind, there are several areas within hepatology where these methods can be applied. In this review, we examine the literature pertaining to machine learning in hepatology and liver transplant medicine. We provide an overview of the strengths and limitations of machine learning tools, and their potential applications to both clinical and molecular data in hepatology. Machine learning has been applied to various types of data in Liver disease research, including clinical, demographic, molecular, radiologic and pathologic data. We anticipate that the use of ML tools to generate predictive algorithms will change the face of clinical practice in Hepatology and transplantation. This review will provide readers with the opportunity to learn about the ML tools available, and potential applications to questions of interest in Hepatology.
    DOI:  https://doi.org/10.1002/hep.31103
  6. BMC Med Inform Decis Mak. 2020 Jan 06. 20(1): 3
      BACKGROUND: We used the Surveillance, Epidemiology, and End Results (SEER) database to develop and validate deep survival neural network machine learning (ML) algorithms to predict survival following a spino-pelvic chondrosarcoma diagnosis.METHODS: The SEER 18 registries were used to apply the Risk Estimate Distance Survival Neural Network (RED_SNN) in the model. Our model was evaluated at each time window with receiver operating characteristic curves and areas under the curves (AUCs), as was the concordance index (c-index).
    RESULTS: The subjects (n = 1088) were separated into training (80%, n = 870) and test sets (20%, n = 218). The training data were randomly sorted into training and validation sets using 5-fold cross validation. The median c-index of the five validation sets was 0.84 (95% confidence interval 0.79-0.87). The median AUC of the five validation subsets was 0.84. This model was evaluated with the previously separated test set. The c-index was 0.82 and the mean AUC of the 30 different time windows was 0.85 (standard deviation 0.02). According to the estimated survival probability (by 62 months), we divided the test group into five subgroups. The survival curves of the subgroups showed statistically significant separation (p < 0.001).
    CONCLUSIONS: This study is the first to analyze population-level data using artificial neural network ML algorithms for the role and outcomes of surgical resection and radiation therapy in spino-pelvic chondrosarcoma.
    Keywords:  Artificial intelligence; Chondrosarcoma; Neural network; Prediction; Survival
    DOI:  https://doi.org/10.1186/s12911-019-1008-4
  7. Neurospine. 2019 Dec;16(4): 669-677
      The potential of big data analytics to improve the quality of care for patients with spine tumors is significant. At this moment, the application of big data analytics to oncology and spine surgery is at a nascent stage. As such, efforts are underway to advance data-driven oncologic care, improve patient outcomes, and guide clinical decision making. This is both relevant and critical in the practice of spine oncology as clinical decision making is often made in isolation looking at select variables deemed relevant by the physician. With rapidly evolving therapeutics in surgery, radiation, interventional radiology, and oncology, there is a need to better develop decision-making algorithms utilizing the vast data available for each patient. The challenges and limitations inherent to big data analyses are presented with an eye towards future directions.
    Keywords:  Artificial intelligence; Machine learning; Predictive analytics; Primary spine tumor; Spine metastases; Spine tumor
    DOI:  https://doi.org/10.14245/ns.1938402.201
  8. BMC Med Inform Decis Mak. 2020 Jan 08. 20(1): 8
      BACKGROUND: Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data.METHODS: NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set.
    RESULTS: Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5.
    CONCLUSIONS: Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.
    Keywords:  Database; Outcomes research; Real-world evidence
    DOI:  https://doi.org/10.1186/s12911-019-1010-x
  9. World J Urol. 2020 Jan 10.
      BACKGROUND: Optimal detection and surveillance of bladder cancer (BCa) rely primarily on the cystoscopic visualization of bladder lesions. AI-assisted cystoscopy may improve image recognition and accelerate data acquisition.OBJECTIVE: To provide a comprehensive review of machine learning (ML), deep learning (DL) and convolutional neural network (CNN) applications in cystoscopic image recognition.
    EVIDENCE ACQUISITION: A detailed search of original articles was performed using the PubMed-MEDLINE database to identify recent English literature relevant to ML, DL and CNN applications in cystoscopic image recognition.
    EVIDENCE SYNTHESIS: In total, two articles and one conference abstract were identified addressing the application of AI methods in cystoscopic image recognition. These investigations showed accuracies exceeding 90% for tumor detection; however, future work is necessary to incorporate these methods into AI-aided cystoscopy and compared to other tumor visualization tools. Furthermore, we present results from the RaVeNNA-4pi consortium initiative which has extracted 4200 frames from 62 videos, analyzed them with the U-Net network and achieved an average dice score of 0.67. Improvements in its precision can be achieved by augmenting the video/frame database.
    CONCLUSION: AI-aided cystoscopy has the potential to outperform urologists at recognizing and classifying bladder lesions. To ensure their real-life implementation, however, these algorithms require external validation to generalize their results across other data sets.
    Keywords:  Cystoscopic images; Deep learning; Medical image analysis; Neural networks
    DOI:  https://doi.org/10.1007/s00345-019-03059-0
  10. JAMA Netw Open. 2020 Jan 03. 3(1): e1918962
      Importance: Accurate risk stratification of patients with heart failure (HF) is critical to deploy targeted interventions aimed at improving patients' quality of life and outcomes.Objectives: To compare machine learning approaches with traditional logistic regression in predicting key outcomes in patients with HF and evaluate the added value of augmenting claims-based predictive models with electronic medical record (EMR)-derived information.
    Design, Setting, and Participants: A prognostic study with a 1-year follow-up period was conducted including 9502 Medicare-enrolled patients with HF from 2 health care provider networks in Boston, Massachusetts ("providers" includes physicians, clinicians, other health care professionals, and their institutions that comprise the networks). The study was performed from January 1, 2007, to December 31, 2014; data were analyzed from January 1 to December 31, 2018.
    Main Outcomes and Measures: All-cause mortality, HF hospitalization, top cost decile, and home days loss greater than 25% were modeled using logistic regression, least absolute shrinkage and selection operation regression, classification and regression trees, random forests, and gradient-boosted modeling (GBM). All models were trained using data from network 1 and tested in network 2. After selecting the most efficient modeling approach based on discrimination, Brier score, and calibration, area under precision-recall curves (AUPRCs) and net benefit estimates from decision curves were calculated to focus on the differences when using claims-only vs claims + EMR predictors.
    Results: A total of 9502 patients with HF with a mean (SD) age of 78 (8) years were included: 6113 from network 1 (training set) and 3389 from network 2 (testing set). Gradient-boosted modeling consistently provided the highest discrimination, lowest Brier scores, and good calibration across all 4 outcomes; however, logistic regression had generally similar performance (C statistics for logistic regression based on claims-only predictors: mortality, 0.724; 95% CI, 0.705-0.744; HF hospitalization, 0.707; 95% CI, 0.676-0.737; high cost, 0.734; 95% CI, 0.703-0.764; and home days loss claims only, 0.781; 95% CI, 0.764-0.798; C statistics for GBM: mortality, 0.727; 95% CI, 0.708-0.747; HF hospitalization, 0.745; 95% CI, 0.718-0.772; high cost, 0.733; 95% CI, 0.703-0.763; and home days loss, 0.790; 95% CI, 0.773-0.807). Higher AUPRCs were obtained for claims + EMR vs claims-only GBMs predicting mortality (0.484 vs 0.423), HF hospitalization (0.413 vs 0.403), and home time loss (0.575 vs 0.521) but not cost (0.249 vs 0.252). The net benefit for claims + EMR vs claims-only GBMs was higher at various threshold probabilities for mortality and home time loss outcomes but similar for the other 2 outcomes.
    Conclusions and Relevance: Machine learning methods offered only limited improvement over traditional logistic regression in predicting key HF outcomes. Inclusion of additional predictors from EMRs to claims-based models appeared to improve prediction for some, but not all, outcomes.
    DOI:  https://doi.org/10.1001/jamanetworkopen.2019.18962
  11. Int J Med Inform. 2019 Dec 28. pii: S1386-5056(19)30614-8. [Epub ahead of print]136 104068
      BACKGROUND: The proper estimate of the risk of recurrences in early-stage oral tongue squamous cell carcinoma (OTSCC) is mandatory for individual treatment-decision making. However, this remains a challenge even for experienced multidisciplinary centers.OBJECTIVES: We compared the performance of four machine learning (ML) algorithms for predicting the risk of locoregional recurrences in patients with OTSCC. These algorithms were Support Vector Machine (SVM), Naive Bayes (NB), Boosted Decision Tree (BDT), and Decision Forest (DF).
    MATERIALS AND METHODS: The study cohort comprised 311 cases from the five University Hospitals in Finland and A.C. Camargo Cancer Center, São Paulo, Brazil. For comparison of the algorithms, we used the harmonic mean of precision and recall called F1 score, specificity, and accuracy values. These algorithms and their corresponding permutation feature importance (PFI) with the input parameters were externally tested on 59 new cases. Furthermore, we compared the performance of the algorithm that showed the highest prediction accuracy with the prognostic significance of depth of invasion (DOI).
    RESULTS: The results showed that the average specificity of all the algorithms was 71% . The SVM showed an accuracy of 68% and F1 score of 0.63, NB an accuracy of 70% and F1 score of 0.64, BDT an accuracy of 81% and F1 score of 0.78, and DF an accuracy of 78% and F1 score of 0.70. Additionally, these algorithms outperformed the DOI-based approach, which gave an accuracy of 63%. With PFI-analysis, there was no significant difference in the overall accuracies of three of the algorithms; PFI-BDT accuracy increased to 83.1%, PFI-DF increased to 80%, PFI-SVM decreased to 64.4%, while PFI-NB accuracy increased significantly to 81.4%.
    CONCLUSIONS: Our findings show that the best classification accuracy was achieved with the boosted decision tree algorithm. Additionally, these algorithms outperformed the DOI-based approach. Furthermore, with few parameters identified in the PFI analysis, ML technique still showed the ability to predict locoregional recurrence. The application of boosted decision tree machine learning algorithm can stratify OTSCC patients and thus aid in their individual treatment planning.
    Keywords:  Artificial intelligence; Machine learning; Oral tongue cancer; Prediction
    DOI:  https://doi.org/10.1016/j.ijmedinf.2019.104068
  12. Intensive Care Med. 2020 Jan 07.
      PURPOSE: We aimed to develop a machine-learning (ML) algorithm that can predict intensive care unit (ICU)-acquired bloodstream infections (BSI) among patients suspected of infection in the ICU.METHODS: The study was based on patients' electronic health records at Beth Israel Deaconess Medical Center (BIDMC) in Boston, Massachusetts, USA, and at Rambam Health Care Campus (RHCC), Haifa, Israel. We included adults from whom blood cultures were collected for suspected BSI at least 48 h after admission. Clinical data, including time-series variables and their interactions, were analyzed by an ML algorithm at each site. Prediction ability for ICU-acquired BSI was assessed by the area under the receiver operating characteristics (AUROC) of ten-fold cross-validation and validation sets with 95% confidence intervals.
    RESULTS: The datasets comprised 2351 patients from BIDMC (151 with BSI) and 1021 from RHCC (162 with BSI). The median (inter-quartile range) age was 62 (51-75) and 56 (38-69) years, respectively; the median Acute Physiology and Chronic Health Evaluation II scores were 26 (21-32) and 24 (20-29), respectively. The means of the cross-validation AUROCs were 0.87 ± 0.02 for BIDMC and 0.93 ± 0.03 for RHCC. AUROCs of 0.89 ± 0.01 and 0.92 ± 0.02 were maintained in both centers with internal validation, while external validation deteriorated. Valuable predictors were mainly the trends of time-series variables such as laboratory results and vital signs.
    CONCLUSION: An ML approach that uses temporal and site-specific data achieved high performance in recognizing BC samples with a high probability for ICU-acquired BSI.
    Keywords:  Bacteremia; Early diagnosis; Intensive care unit; Machine learning; Nosocomial infection
    DOI:  https://doi.org/10.1007/s00134-019-05876-8
  13. Spine J. 2019 Dec 31. pii: S1529-9430(19)31157-X. [Epub ahead of print]
      IMPORTANCE: Preoperative determination of the potential for postoperative opioid dependence in previously naïve patients undergoing elective spine surgery may facilitate targeted interventions.OBJECTIVE: The purpose of this study was to develop supervised machine learning algorithms for preoperative prediction of prolonged opioid prescription use in opioid naïve patients following lumbar spine surgery.
    DESIGN: Retrospective review of clinical registry data. Variables considered for prediction included demographics, insurance status, preoperative medications, surgical factors, laboratory values, comorbidities, neighborhood characteristics. Five supervised machine learning algorithms were developed and assessed by discrimination, calibration, Brier score, and decision curve analysis.
    SETTING: One healthcare entity (two academic medical centers, three community hospitals), 2000 - 2018 PARTICIPANTS: Opioid naïve patients undergoing decompression and/or fusion for lumbar disc herniation, stenosis, and spondylolisthesis MAIN OUTCOME: Sustained prescription opioid use exceeding 90 days after surgery RESULTS: Overall, of 8435 patients included, 359 (4.3%) were found to have prolonged postoperative opioid prescriptions. The elastic-net penalized logistic regression achieved the best performance in the independent testing set not used for algorithm development with c-statistic = 0.70, calibration intercept = 0.06, calibration slope = 1.02, and Brier score = 0.039. The five most important factors for prolonged opioid prescriptions were use of instrumented spinal fusion, preoperative benzodiazepine use, preoperative anti-depressant use, preoperative gabapentin use, and uninsured status. Individual patient-level explanations were provided for the algorithm predictions and the algorithms were incorporated into an open access digital application available here: https://sorg-apps.shinyapps.io/lumbaropioidnaive/ CONCLUSION AND RELEVANCE: The clinician decision aid developed in this study may be helpful to preoperatively risk-stratify opioid-naïve patients undergoing lumbar spine surgery. The tool demonstrates moderate discriminative capacity for identifying those at greatest risk of prolonged prescription opioid use. External validation is required to further support the potential utility of this tool in practice.
    Keywords:  disc herniation; opioid; prediction; spine; spondylolisthesis; stenosis
    DOI:  https://doi.org/10.1016/j.spinee.2019.12.019
  14. BJU Int. 2020 Jan 04.
      OBJECTIVES: To develop and evaluate the feasibility of an objective method utilizing artificial intelligence (AI) and image processing in a semi-automated fashion for tumor-to-cortex peak early-phase enhancement ratio (PEER) to differentiate CD117(+) oncocytoma from ChRCC using convolutional neural networks (CNN) on computed tomography (CT) imaging.METHODS: The CNN was trained and validated to identify the kidney + tumor areas from 192 patients. The tumor type was differentiated through automated PEER after manual segmentation of tumors. The performance of this diagnostic model was compared to the manual expert identification and the tumor pathology through accuracy, sensitivity, and specificity along with the root mean square error (RMSE) for the remaining 20 patients with CD117(+) oncocytoma or ChRCC.
    RESULTS: The Dice similarity score (DSS) of segmentation ± SD was 0.66 ± 0.14 for the CNN model to identify the kidney + tumor areas. PEER evaluation achieved the accuracy of 95% in tumor type classification (100% sensitivity and 89% specificity) compared to the final pathology results (RMSE of 0.15 for PEER ratio).
    CONCLUSIONS: We demonstrate that deep learning could help to produce reliable discrimination for CD117(+) benign oncocytoma and malignant ChRCC through PEER measurements by computer vision.
    Keywords:  artificial intelligence; chromophobe; computer vision; convolutional neural network; deep learning; kidney cancer; oncocytoma; renal cell carcinoma
    DOI:  https://doi.org/10.1111/bju.14985
  15. Pituitary. 2020 Jan 06.
      PURPOSE: To provide an overview of fundamental concepts in machine learning (ML), review the literature on ML applications in imaging analysis of pituitary tumors for the last 10 years, and highlight the future directions on potential applications of ML for pituitary tumor patients.METHOD: We presented an overview of the fundamental concepts in ML, its various stages used in healthcare, and highlighted the key components typically present in an imaging-based tumor analysis pipeline. A search was conducted across four databases (PubMed, Ovid, Embase, and Google Scholar) to gather research articles from the past 10 years (2009-2019) involving imaging related to pituitary tumor and ML. We grouped the studies by imaging modalities and analyzed the ML tasks in terms of the data inputs, reference standards, methodologies, and limitations.
    RESULTS: Of the 16 studies included in our analysis, 10 appeared in 2018-2019. Most of the studies utilized retrospective data and followed a semi-automatic ML pipeline. The studies included use of magnetic resonance imaging (MRI), facial photographs, surgical microscopic video, spectrometry, and spectroscopy imaging. The objectives of the studies covered 14 distinct applications and majority of the studies addressed a binary classification problem. Only five of the 11 MRI-based studies had an external validation or a holdout set to test the performance of a final trained model.
    CONCLUSION: Through our concise evaluation and comparison of the studies using the concepts presented, we highlight future directions so that potential ML applications using different imaging modalities can be developed to benefit the clinical care of pituitary tumor patients.
    Keywords:  Image processing; Machine learning; Magnetic resonance imaging; Medical imaging; Pituitary adenoma; Pituitary tumor
    DOI:  https://doi.org/10.1007/s11102-019-01026-x
  16. Radiother Oncol. 2020 Jan 03. pii: S0167-8140(19)33489-9. [Epub ahead of print]144 189-200
      BACKGROUND AND PURPOSE: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute.MATERIALS AND METHODS: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots.
    RESULTS: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015.
    CONCLUSION: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy.
    Keywords:  Big data; Distributed learning; FAIR data; Federated learning; Lung cancer; Machine learning; Prediction modeling; Survival analysis
    DOI:  https://doi.org/10.1016/j.radonc.2019.11.019
  17. Dermatol Pract Concept. 2020 ;10(1): e2020011
      Background: Malignant melanoma can most successfully be cured when diagnosed at an early stage in the natural history. However, there is controversy over screening programs and many advocate screening only for high-risk individuals.Objectives: This study aimed to evaluate the accuracy of an artificial intelligence neural network (Deep Ensemble for Recognition of Melanoma [DERM]) to identify malignant melanoma from dermoscopic images of pigmented skin lesions and to show how this compared to doctors' performance assessed by meta-analysis.
    Methods: DERM was trained and tested using 7,102 dermoscopic images of both histologically confirmed melanoma (24%) and benign pigmented lesions (76%). A meta-analysis was conducted of studies examining the accuracy of naked-eye examination, with or without dermoscopy, by specialist and general physicians whose clinical diagnosis was compared to histopathology. The meta-analysis was based on evaluation of 32,226 pigmented lesions including 3,277 histopathology-confirmed malignant melanoma cases. The receiver operating characteristic (ROC) curve was used to examine and compare the diagnostic accuracy.
    Results: DERM achieved a ROC area under the curve (AUC) of 0.93 (95% confidence interval: 0.92-0.94), and sensitivity and specificity of 85.0% and 85.3%, respectively. Avoidance of false-negative results is essential, so different decision thresholds were examined. At 95% sensitivity DERM achieved a specificity of 64.1% and at 95% specificity the sensitivity was 67%. The meta-analysis showed primary care physicians (10 studies) achieve an AUC of 0.83 (95% confidence interval: 0.79-0.86), with sensitivity and specificity of 79.9% and 70.9%; and dermatologists (92 studies) 0.91 (0.88-0.93), 87.5%, and 81.4%, respectively.
    Conclusions: DERM has the potential to be used as a decision support tool in primary care, by providing dermatologist-grade recommendation on the likelihood of malignant melanoma.
    Keywords:  artificial intelligence; detection; identification; melanoma; primary care
    DOI:  https://doi.org/10.5826/dpc.1001a11
  18. Eur Heart J. 2020 Jan 10. pii: ehz902. [Epub ahead of print]
      AIMS: Our aim was to develop a machine learning (ML)-based risk stratification system to predict 1-, 2-, 3-, 4-, and 5-year all-cause mortality from pre-implant parameters of patients undergoing cardiac resynchronization therapy (CRT).METHODS AND RESULTS: Multiple ML models were trained on a retrospective database of 1510 patients undergoing CRT implantation to predict 1- to 5-year all-cause mortality. Thirty-three pre-implant clinical features were selected to train the models. The best performing model [SEMMELWEIS-CRT score (perSonalizEd assessMent of estiMatEd risk of mortaLity With machinE learnIng in patientS undergoing CRT implantation)], along with pre-existing scores (Seattle Heart Failure Model, VALID-CRT, EAARN, ScREEN, and CRT-score), was tested on an independent cohort of 158 patients. There were 805 (53%) deaths in the training cohort and 80 (51%) deaths in the test cohort during the 5-year follow-up period. Among the trained classifiers, random forest demonstrated the best performance. For the prediction of 1-, 2-, 3-, 4-, and 5-year mortality, the areas under the receiver operating characteristic curves of the SEMMELWEIS-CRT score were 0.768 (95% CI: 0.674-0.861; P < 0.001), 0.793 (95% CI: 0.718-0.867; P < 0.001), 0.785 (95% CI: 0.711-0.859; P < 0.001), 0.776 (95% CI: 0.703-0.849; P < 0.001), and 0.803 (95% CI: 0.733-0.872; P < 0.001), respectively. The discriminative ability of our model was superior to other evaluated scores.
    CONCLUSION: The SEMMELWEIS-CRT score (available at semmelweiscrtscore.com) exhibited good discriminative capabilities for the prediction of all-cause death in CRT patients and outperformed the already existing risk scores. By capturing the non-linear association of predictors, the utilization of ML approaches may facilitate optimal candidate selection and prognostication of patients undergoing CRT implantation.
    Keywords:  Cardiac resynchronization therapy; Heart failure; Machine learning; Mortality prediction; Precision medicine; Risk stratification
    DOI:  https://doi.org/10.1093/eurheartj/ehz902
  19. Front Neurol. 2019 ;10 1305
      Purpose: Amino acid PET has shown high accuracy for the diagnosis and prognostication of malignant gliomas, however, this imaging modality is not widely available in clinical practice. This study explores a novel end-to-end deep learning framework ("U-Net") for its feasibility to detect high amino acid uptake glioblastoma regions (i.e., metabolic tumor volume) using clinical multimodal MRI sequences. Methods: T2, fluid-attenuated inversion recovery (FLAIR), apparent diffusion coefficient map, contrast-enhanced T1, and alpha-[11C]-methyl-L-tryptophan (AMT)-PET images were analyzed in 21 patients with newly-diagnosed glioblastoma. U-Net system with data augmentation was implemented to deeply learn non-linear voxel-wise relationships between intensities of multimodal MRI as the input and metabolic tumor volume from AMT-PET as the output. The accuracy of the MRI- and PET-based volume measures to predict progression-free survival was tested. Results: In the augmented dataset using all four MRI modalities to investigate the upper limit of U-Net accuracy in the full study cohort, U-Net achieved high accuracy (sensitivity/specificity/positive predictive value [PPV]/negative predictive value [NPV]: 0.85/1.00/0.81/1.00, respectively) to predict PET-defined tumor volumes. Exclusion of FLAIR from the MRI input set had a strong negative effect on sensitivity (0.60). In repeated hold out validation in randomly selected subjects, specificity and NPV remained high (1.00), but mean sensitivity (0.62), and PPV (0.68) were moderate. AMT-PET-learned MRI tumor volume from this U-net model within the contrast-enhancing volume predicted 6-month progression-free survival with 0.86/0.63 sensitivity/specificity. Conclusions: These data indicate the feasibility of PET-based deep learning for enhanced pretreatment glioblastoma delineation and prognostication by clinical multimodal MRI.
    Keywords:  amino acid; deep learning; glioblastoma; multimodal MRI; positron emission tomography; tryptophan
    DOI:  https://doi.org/10.3389/fneur.2019.01305
  20. Front Comput Neurosci. 2019 ;13 84
      An important challenge in segmenting real-world biomedical imaging data is the presence of multiple disease processes within individual subjects. Most adults above age 60 exhibit a variable degree of small vessel ischemic disease, as well as chronic infarcts, which will manifest as white matter hyperintensities (WMH) on brain MRIs. Subjects diagnosed with gliomas will also typically exhibit some degree of abnormal T2 signal due to WMH, rather than just due to tumor. We sought to develop a fully automated algorithm to distinguish and quantify these distinct disease processes within individual subjects' brain MRIs. To address this multi-disease problem, we trained a 3D U-Net to distinguish between abnormal signal arising from tumors vs. WMH in the 3D multi-parametric MRI (mpMRI, i.e., native T1-weighted, T1-post-contrast, T2, T2-FLAIR) scans of the International Brain Tumor Segmentation (BraTS) 2018 dataset (n training = 285, n validation = 66). Our trained neuroradiologist manually annotated WMH on the BraTS training subjects, finding that 69% of subjects had WMH. Our 3D U-Net model had a 4-channel 3D input patch (80 × 80 × 80) from mpMRI, four encoding and decoding layers, and an output of either four [background, active tumor (AT), necrotic core (NCR), peritumoral edematous/infiltrated tissue (ED)] or five classes (adding WMH as the fifth class). For both the four- and five-class output models, the median Dice for whole tumor (WT) extent (i.e., union of AT, ED, NCR) was 0.92 in both training and validation sets. Notably, the five-class model achieved significantly (p = 0.002) lower/better Hausdorff distances for WT extent in the training subjects. There was strong positive correlation between manually segmented and predicted volumes for WT (r = 0.96) and WMH (r = 0.89). Larger lesion volumes were positively correlated with higher/better Dice scores for WT (r = 0.33), WMH (r = 0.34), and across all lesions (r = 0.89) on a log(10) transformed scale. While the median Dice for WMH was 0.42 across training subjects with WMH, the median Dice was 0.62 for those with at least 5 cm3 of WMH. We anticipate the development of computational algorithms that are able to model multiple diseases within a single subject will be a critical step toward translating and integrating artificial intelligence systems into the heterogeneous real-world clinical workflow.
    Keywords:  convolutional neural network; deep learning; glioblastoma; multi-disease classification; radiology; segmentation; white matter hyperintensities
    DOI:  https://doi.org/10.3389/fncom.2019.00084
  21. Neurospine. 2019 Dec;16(4): 697-702
      The use of artificial intelligence (AI) as a tool supporting the diagnosis and treatment of spinal diseases is eagerly anticipated. In the field of diagnostic imaging, the possible application of AI includes diagnostic support for diseases requiring highly specialized expertise, such as trauma in children, scoliosis, symptomatic diseases, and spinal cord tumors. Moiré topography, which describes the 3-dimensional surface of the trunk with band patterns, has been used to screen students for scoliosis, but the interpretation of the band patterns can be ambiguous. Thus, we created a scoliosis screening system that estimates spinal alignment, the Cobb angle, and vertebral rotation from moiré images. In our system, a convolutional neural network (CNN) estimates the positions of 12 thoracic and 5 lumbar vertebrae, 17 spinous processes, and the vertebral rotation angle of each vertebra. We used this information to estimate the Cobb angle. The mean absolute error (MAE) of the estimated vertebral positions was 3.6 pixels (~5.4 mm) per person. T1 and L5 had smaller MAEs than the other levels. The MAE per person between the Cobb angle measured by doctors and the estimated Cobb angle was 3.42°. The MAE was 4.38° in normal spines, 3.13° in spines with a slight deformity, and 2.74° in spines with a mild to severe deformity. The MAE of the angle of vertebral rotation was 2.9°±1.4°, and was smaller when the deformity was milder. The proposed method of estimating the Cobb angle and AVR from moiré images using a CNN is expected to enhance the accuracy of scoliosis screening.
    Keywords:  Adolescent idiopathic scoliosis; Artificial intelligence; Cobb angle; Estimation; Moiré; Vertebral rotation
    DOI:  https://doi.org/10.14245/ns.1938426.213
  22. Neurospine. 2019 Dec;16(4): 678-685
      Machine learning represents a promising frontier in epidemiological research on spine surgery. It consists of a series of algorithms that determines relationships between data. Machine learning maintains numerous advantages over conventional regression techniques, such as a reduced requirement for a priori knowledge on predictors and better ability to manage large datasets. Current studies have made extensive strides in employing machine learning to a greater capacity in spinal cord injury (SCI). Analyses using machine learning algorithms have been done on both traumatic SCI and nontraumatic SCI, the latter of which typically represents degenerative spine disease resulting in spinal cord compression, such as degenerative cervical myelopathy. This article is a literature review of current studies published in traumatic and nontraumatic SCI that employ machine learning for the prediction of a host of outcomes. The studies described utilize machine learning in a variety of capacities, including imaging analysis and prediction in large epidemiological data sets. We discuss the performance of these machine learning-based clinical prognostic models relative to conventional statistical prediction models. Finally, we detail the future steps needed for machine learning to become a more common modality for statistical analysis in SCI.
    Keywords:  Degenerative cervical myelopathy; Machine learning; Magnetic resonance imaging; Outcomes; Spinal cord injury
    DOI:  https://doi.org/10.14245/ns.1938390.195
  23. Cancer Manag Res. 2019 ;11 10851-10858
      Radiomics is a novel concept that relies on obtaining image data from examinations such as computed tomography (CT), magnetic resonance imaging (MRI), or positron emission tomography (PET). With the appropriate algorithm, the extracted results have broad applicability and potential for a massive positive impact in radiology. For example, clinicians can verify treatment efficiency, predict the location of tumor metastasis, correlate results with a histopathological examination, or more accurately define the type of cancer. Combining radiomics with other testing techniques allows every patient to have a personalized treatment plan that is essential for advanced examination and treatment. This article explains the process of radiomics, including data collection mechanisms, combined use with genomics, and artificial intelligence and immunology techniques, which may solve many of the challenges faced by doctors in diagnosing and treating their patients.
    Keywords:  artificial intelligence; genomics; immunology; personalized therapy; radiology; workflow
    DOI:  https://doi.org/10.2147/CMAR.S232473
  24. Gut. 2020 Jan 08. pii: gutjnl-2019-320056. [Epub ahead of print]
      BACKGROUND: The objective evaluation of endoscopic disease activity is key in ulcerative colitis (UC). A composite of endoscopic and histological factors is the goal in UC treatment. We aimed to develop an operator-independent computer-based tool to determine UC activity based on endoscopic images.METHODS: First, we built a computer algorithm using data from 29 consecutive patients with UC and 6 healthy controls (construction cohort). The algorithm (red density: RD) was based on the red channel of the red-green-blue pixel values and pattern recognition from endoscopic images. The algorithm was refined in sequential steps to optimise correlation with endoscopic and histological disease activity. In a second phase, the operating properties were tested in patients with UC flares requiring treatment escalation. To validate the algorithm, we tested the correlation between RD score and clinical, endoscopic and histological features in a validation cohort.
    RESULTS: We constructed the algorithm based on the integration of pixel colour data from the redness colour map along with vascular pattern detection. These data were linked with Robarts histological index (RHI) in a multiple regression analysis. In the construction cohort, RD correlated with RHI (r=0.74, p<0.0001), Mayo endoscopic subscores (r=0.76, p<0.0001) and UC Endoscopic Index of Severity scores (r=0.74, p<0.0001). The RD sensitivity to change had a standardised effect size of 1.16. In the validation set, RD correlated with RHI (r=0.65, p=0.00002).
    CONCLUSIONS: RD provides an objective computer-based score that accurately assesses disease activity in UC. In a validation study, RD correlated with endoscopic and histological disease activity.
    Keywords:  artificial intelligence; evaluation; remission; response to treatment
    DOI:  https://doi.org/10.1136/gutjnl-2019-320056
  25. Front Med. 2020 Jan 07.
      Mesial temporal lobe epilepsy (mTLE), the most common type of focal epilepsy, is associated with functional and structural brain alterations. Machine learning (ML) techniques have been successfully used in discriminating mTLE from healthy controls. However, either functional or structural neuroimaging data are mostly used separately as input, and the opportunity to combine both has not been exploited yet. We conducted a multimodal ML study based on functional and structural neuroimaging measures. We enrolled 37 patients with left mTLE, 37 patients with right mTLE, and 74 healthy controls and trained a support vector ML model to distinguish them by using each measure and the combinations of the measures. For each single measure, we obtained a mean accuracy of 74% and 69% for discriminating left mTLE and right mTLE from controls, respectively, and 64% when all patients were combined. We achieved an accuracy of 78% by integrating functional data and 79% by integrating structural data for left mTLE, and the highest accuracy of 84% was obtained when all functional and structural measures were combined. These findings suggest that combining multimodal measures within a single model is a promising direction for improving the classification of individual patients with mTLE.
    Keywords:  functional magnetic resonance imaging; machine learning; mesial temporal lobe epilepsy; structural magnetic resonance imaging; support vector machine
    DOI:  https://doi.org/10.1007/s11684-019-0718-4
  26. Ann Surg Oncol. 2020 Jan 08.
      OBJECTIVE: The aim of this study was to develop quantitative feature-based models from histopathological images to distinguish hepatocellular carcinoma (HCC) from adjacent normal tissue and predict the prognosis of HCC patients after surgical resection.METHODS: A fully automated pipeline was constructed using computational approaches to analyze the quantitative features of histopathological slides of HCC patients, in which the features were extracted from the hematoxylin and eosin (H&E)-stained whole-slide images of HCC patients from The Cancer Genome Atlas and tissue microarray images from West China Hospital. The extracted features were used to train the statistical models that classify tissue slides and predict patients' survival outcomes by machine-learning methods.
    RESULTS: A total of 1733 quantitative image features were extracted from each histopathological slide. The diagnostic classifier based on 31 features was able to successfully distinguish HCC from adjacent normal tissues in both the test [area under the receiver operating characteristic curve (AUC) 0.988] and external validation sets (AUC 0.886). The random-forest prognostic model using 46 features was able to significantly stratify patients in each set into longer- or shorter-term survival groups according to their assigned risk scores. Moreover, the prognostic model we constructed showed comparable predicting accuracy as TNM staging systems in predicting patients' survival at different time points after surgery.
    CONCLUSIONS: Our findings suggest that machine-learning models derived from image features can assist clinicians in HCC diagnosis and its prognosis prediction after hepatectomy.
    DOI:  https://doi.org/10.1245/s10434-019-08190-1