bims-arihec Biomed News
on Artificial intelligence in healthcare
Issue of 2020–01–05
23 papers selected by
Céline Bélanger, Cogniges Inc.



  1. Biomed Res Int. 2019 ;2019 8427042
      Artificial intelligence (AI) proves to have enormous potential in many areas of healthcare including research and chemical discoveries. Using large amounts of aggregated data, the AI can discover and learn further transforming these data into "usable" knowledge. Being well aware of this, the world's leading pharmaceutical companies have already begun to use artificial intelligence to improve their research regarding new drugs. The goal is to exploit modern computational biology and machine learning systems to predict the molecular behaviour and the likelihood of getting a useful drug, thus saving time and money on unnecessary tests. Clinical studies, electronic medical records, high-resolution medical images, and genomic profiles can be used as resources to aid drug development. Pharmaceutical and medical researchers have extensive data sets that can be analyzed by strong AI systems. This review focused on how computational biology and artificial intelligence technologies can be implemented by integrating the knowledge of cancer drugs, drug resistance, next-generation sequencing, genetic variants, and structural biology in the cancer precision drug discovery.
    DOI:  https://doi.org/10.1155/2019/8427042
  2. Intern Emerg Med. 2020 Jan 02.
      Length of stay (LOS) and discharge destination predictions are key parts of the discharge planning process for general medical hospital inpatients. It is possible that machine learning, using natural language processing, may be able to assist with accurate LOS and discharge destination prediction for this patient group. Emergency department triage and doctor notes were retrospectively collected on consecutive general medical and acute medical unit admissions to a single tertiary hospital from a 2-month period in 2019. These data were used to assess the feasibility of predicting LOS and discharge destination using natural language processing and a variety of machine learning models. 313 patients were included in the study. The artificial neural network achieved the highest accuracy on the primary outcome of predicting whether a patient would remain in hospital for > 2 days (accuracy 0.82, area under the received operator curve 0.75, sensitivity 0.47 and specificity 0.97). When predicting LOS as an exact number of days, the artificial neural network achieved a mean absolute error of 2.9 and a mean squared error of 16.8 on the test set. For the prediction of home as a discharge destination (vs any non-home alternative), all models performed similarly with an accuracy of approximately 0.74. This study supports the feasibility of using natural language processing to predict general medical inpatient LOS and discharge destination. Further research is indicated with larger, more detailed, datasets from multiple centres to optimise and examine the accuracy that may be achieved with such predictions.
    Keywords:  Artificial intelligence; Deep learning; Machine learning; Natural language processing; Neural network; Prognostication
    DOI:  https://doi.org/10.1007/s11739-019-02265-3
  3. Int J Retina Vitreous. 2019 ;5 52
      Eye surgery, specifically retinal micro-surgery involves sensory and motor skill that approaches human boundaries and physiological limits for steadiness, accuracy, and the ability to detect the small forces involved. Despite assumptions as to the benefit of robots in surgery and also despite great development effort, numerous challenges to the full development and adoption of robotic assistance in surgical ophthalmology, remain. Historically, the first in-human-robot-assisted retinal surgery occurred nearly 30 years after the first experimental papers on the subject. Similarly, artificial intelligence emerged decades ago and it is only now being more fully realized in ophthalmology. The delay between conception and application has in part been due to the necessary technological advances required to implement new processing strategies. Chief among these has been the better matched processing power of specialty graphics processing units for machine learning. Transcending the classic concept of robots performing repetitive tasks, artificial intelligence and machine learning are related concepts that has proven their abilities to design concepts and solve problems. The implication of such abilities being that future machines may further intrude on the domain of heretofore "human-reserved" tasks. Although the potential of artificial intelligence/machine learning is profound, present marketing promises and hype exceeds its stage of development, analogous to the seventieth century mathematical "boom" with algebra. Nevertheless robotic systems augmented by machine learning may eventually improve robot-assisted retinal surgery and could potentially transform the discipline. This commentary analyzes advances in retinal robotic surgery, its current drawbacks and limitations, and the potential role of artificial intelligence in robotic retinal surgery.
    Keywords:  Artificial intelligence; Ophthalmology; Retina; Robotic surgical procedures; Robotics
    DOI:  https://doi.org/10.1186/s40942-019-0202-y
  4. J Med Internet Res. 2020 Jan 03. 22(1): e15645
       BACKGROUND: Timely, precise, and localized surveillance of nonfatal events is needed to improve response and prevention of opioid-related problems in an evolving opioid crisis in the United States. Records of naloxone administration found in prehospital emergency medical services (EMS) data have helped estimate opioid overdose incidence, including nonhospital, field-treated cases. However, as naloxone is often used by EMS personnel in unconsciousness of unknown cause, attributing naloxone administration to opioid misuse and heroin use (OM) may misclassify events. Better methods are needed to identify OM.
    OBJECTIVE: This study aimed to develop and test a natural language processing method that would improve identification of potential OM from paramedic documentation.
    METHODS: First, we searched Denver Health paramedic trip reports from August 2017 to April 2018 for keywords naloxone, heroin, and both combined, and we reviewed narratives of identified reports to determine whether they constituted true cases of OM. Then, we used this human classification as reference standard and trained 4 machine learning models (random forest, k-nearest neighbors, support vector machines, and L1-regularized logistic regression). We selected the algorithm that produced the highest area under the receiver operating curve (AUC) for model assessment. Finally, we compared positive predictive value (PPV) of the highest performing machine learning algorithm with PPV of searches of keywords naloxone, heroin, and combination of both in the binary classification of OM in unseen September 2018 data.
    RESULTS: In total, 54,359 trip reports were filed from August 2017 to April 2018. Approximately 1.09% (594/54,359) indicated naloxone administration. Among trip reports with reviewer agreement regarding OM in the narrative, 57.6% (292/516) were considered to include information revealing OM. Approximately 1.63% (884/54,359) of all trip reports mentioned heroin in the narrative. Among trip reports with reviewer agreement, 95.5% (784/821) were considered to include information revealing OM. Combined results accounted for 2.39% (1298/54,359) of trip reports. Among trip reports with reviewer agreement, 77.79% (907/1166) were considered to include information consistent with OM. The reference standard used to train and test machine learning models included details of 1166 trip reports. L1-regularized logistic regression was the highest performing algorithm (AUC=0.94; 95% CI 0.91-0.97) in identifying OM. Tested on 5983 unseen reports from September 2018, the keyword naloxone inaccurately identified and underestimated probable OM trip report cases (63 cases; PPV=0.68). The keyword heroin yielded more cases with improved performance (129 cases; PPV=0.99). Combined keyword and L1-regularized logistic regression classifier further improved performance (146 cases; PPV=0.99).
    CONCLUSIONS: A machine learning application enhanced the effectiveness of finding OM among documented paramedic field responses. This approach to refining OM surveillance may lead to improved first-responder and public health responses toward prevention of overdoses and other opioid-related problems in US communities.
    Keywords:  artificial intelligence; emergency medical services; heroin; naloxone; natural language processing; opioid crisis; substance-related disorders
    DOI:  https://doi.org/10.2196/15645
  5. Knee. 2019 Dec 26. pii: S0968-0160(19)30310-2. [Epub ahead of print]
       BACKGROUND: Preoperative identification of knee arthroplasty is important for planning revision surgery. However, up to 10% of implants are not identified prior to surgery. The purposes of this study were to develop and test the performance of a deep learning system (DLS) for the automated radiographic 1) identification of the presence or absence of a total knee arthroplasty (TKA); 2) classification of TKA vs. unicompartmental knee arthroplasty (UKA); and 3) differentiation between two different primary TKA models.
    METHOD: We collected 237 anteroposterior (AP) knee radiographs with equal proportions of native knees, TKA, and UKA and 274 AP knee radiographs with equal proportions of two TKA models. Data augmentation was used to increase the number of images for deep convolutional neural network (DCNN) training. A DLS based on DCNNs was trained on these images. Receiver operating characteristic (ROC) curves with area under the curve (AUC) were generated. Heatmaps were created using class activation mapping (CAM) to identify image features most important for DCNN decision-making.
    RESULTS: DCNNs trained to detect TKA and distinguish between TKA and UKA both achieved AUC of 1. Heatmaps demonstrated appropriate emphasis of arthroplasty components in decision-making. The DCNN trained to distinguish between the two TKA models achieved AUC of 1. Heatmaps showed emphasis of specific unique features of the TKA model designs, such as the femoral component anterior flange shape.
    CONCLUSIONS: DCNNs can accurately identify presence of TKA and distinguish between specific arthroplasty designs. This proof-of-concept could be applied towards identifying other prosthesis models and prosthesis-related complications.
    Keywords:  Artificial intelligence; Deep learning; Knee Arthroplasty; Knee prosthesis; Neural networks
    DOI:  https://doi.org/10.1016/j.knee.2019.11.020
  6. Health Informatics J. 2019 Dec 30. 1460458219894494
      In order to evaluate mortality predictions based on boosted trees, this retrospective study uses electronic medical record data from three academic health centers for inpatients 18 years or older with at least one observation of each vital sign. Predictions were made 12, 24, and 48 hours before death. Models fit to training data from each institution were evaluated using hold-out test data from the same institution, and from the other institutions. Gradient-boosted trees (GBT) were compared to regularized logistic regression (LR) predictions, support vector machine (SVM) predictions, quick Sepsis-Related Organ Failure Assessment (qSOFA), and Modified Early Warning Score (MEWS) using area under the receiver operating characteristic curve (AUROC). For training and testing GBT on data from the same institution, the average AUROCs were 0.96, 0.95, and 0.94 across institutional test sets for 12-, 24-, and 48-hour predictions, respectively. When trained and tested on data from different hospitals, GBT AUROCs achieved up to 0.98, 0.96, and 0.96, for 12-, 24-, and 48-hour predictions, respectively. Average AUROC for 48-hour predictions for LR, SVM, MEWS, and qSOFA were 0.85, 0.79, 0.86 and 0.82, respectively. GBT predictions may help identify patients who would benefit from increased clinical care.
    Keywords:  electronic health record; machine learning; mortality; prediction
    DOI:  https://doi.org/10.1177/1460458219894494
  7. Nature. 2020 Jan;577(7788): 89-94
      Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful1. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives2. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening.
    DOI:  https://doi.org/10.1038/s41586-019-1799-6
  8. Acta Neurol Scand. 2019 Dec 30.
       OBJECTIVE: People with epilepsy are at increased risk for mental health comorbidities. Machine-learning methods based on spoken language can detect suicidality in adults. This study's purpose was to use spoken words to create machine-learning classifiers that identify current or lifetime history of comorbid psychiatric conditions in teenagers and young adults with epilepsy.
    MATERIALS AND METHODS: Eligible participants were >12 years old with epilepsy. All participants were interviewed using the Mini International Neuropsychiatric Interview (MINI) or the MINI Kid Tracking and asked five open-ended conversational questions. N-grams and Linguistic Inquiry and Word Count (LIWC) word categories were used to construct machine learning classification models from language harvested from interviews. Data was analyzed for four individual MINI identified disorders and for three mutually exclusive groups: participants with no psychiatric disorders, participants with non-suicidal psychiatric disorders, and participants with any degree of suicidality. Performance was measured using areas under the receiver operating characteristic curve (AROCs).
    RESULTS: Classifiers were constructed from 227 interviews with 122 participants (7.5 ±3.1 minutes and 454 ±299 words). AROCs for models differentiating the non-overlapping groups and individual disorders ranged 57%-78% (many with p < 0.02).
    DISCUSSION AND CONCLUSION: Machine learning classifiers of spoken language can reliably identify current or lifetime history of suicidality and depression in people with epilepsy. Data suggests identification of anxiety and bipolar disorders may be achieved with larger data sets. Machine learning analysis of spoken language can be promising as a useful screening alternative when traditional approaches are unwieldy (e.g. telephone calls, primary care offices, school health clinics).
    Keywords:  artificial intelligence; childhood absence epilepsy; natural language processing; psychiatric screening
    DOI:  https://doi.org/10.1111/ane.13216
  9. Eur Radiol. 2020 Jan 03.
       OBJECTIVES: We aimed to establish and validate an artificial intelligence-based radiomics strategy for predicting personalized responses of hepatocellular carcinoma (HCC) to first transarterial chemoembolization (TACE) session by quantitatively analyzing contrast-enhanced ultrasound (CEUS) cines.
    METHODS: One hundred and thirty HCC patients (89 for training, 41 for validation), who received ultrasound examination (CEUS and B-mode) within 1 week before the first TACE session, were retrospectively enrolled. Ultrasonographic data was used for building and validating deep learning radiomics-based CEUS model (R-DLCEUS), machine learning radiomics-based time-intensity curve of CEUS model (R-TIC), and machine learning radiomics-based B-Mode images model (R-BMode), respectively, to predict responses (objective-response and non-response) to TACE with reference to modified response evaluation criteria in solid tumor. The performance of models was compared by areas under the receiver operating characteristic curve (AUC) and the DeLong test was used to compare different AUCs. The prediction robustness was assessed for each model.
    RESULTS: AUCs of R-DLCEUS, R-TIC, and R-BMode were 0.93 (95% CI, 0.80-0.98), 0.80 (95% CI, 0.64-0.90), and 0.81 (95% CI, 0.67-0.95) in the validation cohort, respectively. AUC of R-DLCEUS shows significant difference compared with that of R-TIC (p = 0.034) and R-BMode (p = 0.039), whereas R-TIC was not significantly different from R-BMode. The performance was highly reproducible with different training and validation cohorts.
    CONCLUSIONS: DL-based radiomics method can effectively utilize CEUS cines to achieve accurate and personalized prediction. It is easy to operate and holds good potential for benefiting TACE candidates in clinical practice.
    KEY POINTS: • Deep learning (DL) radiomics-based CEUS model can accurately predict responses of HCC patients to their first TACE session by quantitatively analyzing their pre-operative CEUS cines. • The visualization of the 3D CNN analysis adopted in CEUS model provided direct insight into what computers "see" on CEUS cines, which can help people understand the interpretation of CEUS data. • The proposed prediction method is easy to operate and labor-saving for clinical practice, facilitating the clinical treatment decision of HCCs with very few time costs.
    Keywords:  Deep learning; Hepatocellular carcinoma; Therapeutic chemoembolization; Ultrasonography
    DOI:  https://doi.org/10.1007/s00330-019-06553-6
  10. J Am Med Inform Assoc. 2019 Dec 30. pii: ocz204. [Epub ahead of print]
       OBJECTIVE: To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population.
    MATERIALS AND METHODS: We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013-2014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and resampled training datasets built using 4 resampling methods. Predictive models were tested on internal validation data (n = 3172) and external validation data (n = 3000) prepared from National Health and Nutrition Examination Survey 2011-2012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance.
    RESULTS: Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (≥ 70% AUROC) models identified 25 predictors including 4 potentially novel associations; 20 by both logistic and other nonlinear/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P < 0.05).
    DISCUSSION: Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified.
    CONCLUSION: This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.
    Keywords:  NHANES; feature selection; machine learning; prediabetes; predictors
    DOI:  https://doi.org/10.1093/jamia/ocz204
  11. Eur Radiol. 2020 Jan 03.
       OBJECTIVES: To perform test-retest reproducibility analyses for deep learning-based automatic detection algorithm (DLAD) using two stationary chest radiographs (CRs) with short-term intervals, to analyze influential factors on test-retest variations, and to investigate the robustness of DLAD to simulated post-processing and positional changes.
    METHODS: This retrospective study included patients with pulmonary nodules resected in 2017. Preoperative CRs without interval changes were used. Test-retest reproducibility was analyzed in terms of median differences of abnormality scores, intraclass correlation coefficients (ICC), and 95% limits of agreement (LoA). Factors associated with test-retest variation were investigated using univariable and multivariable analyses. Shifts in classification between the two CRs were analyzed using pre-determined cutoffs. Radiograph post-processing (blurring and sharpening) and positional changes (translations in x- and y-axes, rotation, and shearing) were simulated and agreement of abnormality scores between the original and simulated CRs was investigated.
    RESULTS: Our study analyzed 169 patients (median age, 65 years; 91 men). The median difference of abnormality scores was 1-2% and ICC ranged from 0.83 to 0.90. The 95% LoA was approximately ± 30%. Test-retest variation was negatively associated with solid portion size (β, - 0.50; p = 0.008) and good nodule conspicuity (β, - 0.94; p < 0.001). A small fraction (15/169) showed discordant classifications when the high-specificity cutoff (46%) was applied to the model outputs (p = 0.04). DLAD was robust to the simulated positional change (ICC, 0.984, 0.996), but relatively less robust to post-processing (ICC, 0.872, 0.968).
    CONCLUSIONS: DLAD was robust to the test-retest variation. However, inconspicuous nodules may cause fluctuations of the model output and subsequent misclassifications.
    KEY POINTS: • The deep learning-based automatic detection algorithm was robust to the test-retest variation of the chest radiographs in general. • The test-retest variation was negatively associated with solid portion size and good nodule conspicuity. • High-specificity cutoff (46%) resulted in discordant classifications of 8.9% (15/169; p = 0.04) between the test-retest radiographs.
    Keywords:  Artificial intelligence; Computer-assisted radiographic image interpretation; Radiography; Reproducibility of results; Solitary pulmonary nodule
    DOI:  https://doi.org/10.1007/s00330-019-06589-8
  12. Stroke. 2019 Dec 30. STROKEAHA119027457
      Background and Purpose- Selection of patients with acute ischemic stroke for endovascular treatment generally relies on dynamic susceptibility contrast magnetic resonance imaging or computed tomography perfusion. Dynamic susceptibility contrast magnetic resonance imaging requires injection of contrast, whereas computed tomography perfusion requires high doses of ionizing radiation. The purpose of this work was to develop and evaluate a deep learning (DL)-based algorithm for assisting the selection of suitable patients with acute ischemic stroke for endovascular treatment based on 3-dimensional pseudo-continuous arterial spin labeling (pCASL). Methods- A total of 167 image sets of 3-dimensional pCASL data from 137 patients with acute ischemic stroke scanned on 1.5T and 3.0T Siemens MR systems were included for neural network training. The concurrently acquired dynamic susceptibility contrast magnetic resonance imaging was used to produce labels of hypoperfused brain regions, analyzed using commercial software. The DL and 6 machine learning (ML) algorithms were trained with 10-fold cross-validation. The eligibility for endovascular treatment was determined retrospectively based on the criteria of perfusion/diffusion mismatch in the DEFUSE 3 trial (Endovascular Therapy Following Imaging Evaluation for Ischemic Stroke). The trained DL algorithm was further applied on twelve 3-dimensional pCASL data sets acquired on 1.5T and 3T General Electric MR systems, without fine-tuning of parameters. Results- The DL algorithm can predict the dynamic susceptibility contrast-defined hypoperfusion region in pCASL with a voxel-wise area under the curve of 0.958, while the 6 ML algorithms ranged from 0.897 to 0.933. For retrospective determination for subject-level endovascular treatment eligibility, the DL algorithm achieved an accuracy of 92%, with a sensitivity of 0.89 and specificity of 0.95. When applied to the GE pCASL data, the DL algorithm achieved a voxel-wise area under the curve of 0.94 and a subject-level accuracy of 92% for endovascular treatment eligibility. Conclusions- pCASL perfusion magnetic resonance imaging in conjunction with the DL algorithm provides a promising approach for assisting decision-making for endovascular treatment in patients with acute ischemic stroke.
    Keywords:  arterial spin labeling; deep learning; magnetic resonance imaging; perfusion imaging; stroke
    DOI:  https://doi.org/10.1161/STROKEAHA.119.027457
  13. Curr Opin Urol. 2019 Dec 27.
       PURPOSE OF REVIEW: To investigate the application of artificial intelligence in the management of nephrolithiasis.
    RECENT FINDINGS: Although rising, the number of publications on artificial intelligence for the management of urinary stone disease is still low. Most publications focus on diagnostic tools and prediction of outcomes after clinical interventions. Artificial intelligence can, however, play a major role in development of surgical skills and automated data extraction to support clinical research.
    SUMMARY: The combination of artificial intelligence with new technological developments in the field of endourology will create new possibilities in the management of urinary stones. The implication of artificial intelligence can lead to better patient selection, higher success rates, and furthermore improve patient safety.
    DOI:  https://doi.org/10.1097/MOU.0000000000000707
  14. Arthritis Res Ther. 2019 Dec 30. 21(1): 305
       BACKGROUND: Systemic sclerosis (SSc) is a rare disease with studies limited by small sample sizes. Electronic health records (EHRs) represent a powerful tool to study patients with rare diseases such as SSc, but validated methods are needed. We developed and validated EHR-based algorithms that incorporate billing codes and clinical data to identify SSc patients in the EHR.
    METHODS: We used a de-identified EHR with over 3 million subjects and identified 1899 potential SSc subjects with at least 1 count of the SSc ICD-9 (710.1) or ICD-10-CM (M34*) codes. We randomly selected 200 as a training set for chart review. A subject was a case if diagnosed with SSc by a rheumatologist, dermatologist, or pulmonologist. We selected the following algorithm components based on clinical knowledge and available data: SSc ICD-9 and ICD-10-CM codes, positive antinuclear antibody (ANA) (titer ≥ 1:80), and a keyword of Raynaud's phenomenon (RP). We performed both rule-based and machine learning techniques for algorithm development. Positive predictive values (PPVs), sensitivities, and F-scores (which account for PPVs and sensitivities) were calculated for the algorithms.
    RESULTS: PPVs were low for algorithms using only 1 count of the SSc ICD-9 code. As code counts increased, the PPVs increased. PPVs were higher for algorithms using ICD-10-CM codes versus the ICD-9 code. Adding a positive ANA and RP keyword increased the PPVs of algorithms only using ICD billing codes. Algorithms using ≥ 3 or ≥ 4 counts of the SSc ICD-9 or ICD-10-CM codes and ANA positivity had the highest PPV at 100% but a low sensitivity at 50%. The algorithm with the highest F-score of 91% was ≥ 4 counts of the ICD-9 or ICD-10-CM codes with an internally validated PPV of 90%. A machine learning method using random forests yielded an algorithm with a PPV of 84%, sensitivity of 92%, and F-score of 88%. The most important feature was RP keyword.
    CONCLUSIONS: Algorithms using only ICD-9 codes did not perform well to identify SSc patients. The highest performing algorithms incorporated clinical data with billing codes. EHR-based algorithms can identify SSc patients across a healthcare system, enabling researchers to examine important outcomes.
    Keywords:  Algorithms; Bioinformatics; Electronic health records; Systemic sclerosis
    DOI:  https://doi.org/10.1186/s13075-019-2092-7
  15. Eur Radiol. 2020 Jan 03.
       OBJECTIVES: Patients with multiple sclerosis (MS) regularly undergo MRI for assessment of disease burden. However, interpretation may be time consuming and prone to intra- and interobserver variability. Here, we evaluate the potential of artificial neural networks (ANN) for automated volumetric assessment of MS disease burden and activity on MRI.
    METHODS: A single-institutional dataset with 334 MS patients (334 MRI exams) was used to develop and train an ANN for automated identification and volumetric segmentation of T2/FLAIR-hyperintense and contrast-enhancing (CE) lesions. Independent testing was performed in a single-institutional longitudinal dataset with 82 patients (266 MRI exams). We evaluated lesion detection performance (F1 scores), lesion segmentation agreement (DICE coefficients), and lesion volume agreement (concordance correlation coefficients [CCC]). Independent evaluation was performed on the public ISBI-2015 challenge dataset.
    RESULTS: The F1 score was maximized in the training set at a detection threshold of 7 mm3 for T2/FLAIR lesions and 14 mm3 for CE lesions. In the training set, mean F1 scores were 0.867 for T2/FLAIR lesions and 0.636 for CE lesions, as compared to 0.878 for T2/FLAIR lesions and 0.715 for CE lesions in the test set. Using these thresholds, the ANN yielded mean DICE coefficients of 0.834 and 0.878 for segmentation of T2/FLAIR and CE lesions in the training set (fivefold cross-validation). Corresponding DICE coefficients in the test set were 0.846 for T2/FLAIR lesions and 0.908 for CE lesions, and the CCC was ≥ 0.960 in each dataset.
    CONCLUSIONS: Our results highlight the capability of ANN for quantitative state-of-the-art assessment of volumetric lesion load on MRI and potentially enable a more accurate assessment of disease burden in patients with MS.
    KEY POINTS: • Artificial neural networks (ANN) can accurately detect and segment both T2/FLAIR and contrast-enhancing MS lesions in MRI data. • Performance of the ANN was consistent in a clinically derived dataset, with patients presenting all possible disease stages in MRI scans acquired from standard clinical routine rather than with high-quality research sequences. • Computer-aided evaluation of MS with ANN could streamline both clinical and research procedures in the volumetric assessment of MS disease burden as well as in lesion detection.
    Keywords:  Artificial intelligence; Diagnosis, computer-assisted; Magnetic resonance imaging; Multiple sclerosis; Neural networks (computer)
    DOI:  https://doi.org/10.1007/s00330-019-06593-y
  16. Alzheimers Dement (N Y). 2019 ;5 933-938
       Introduction: Machine learning (ML) may harbor the potential to capture the metabolic complexity in Alzheimer Disease (AD). Here we set out to test the performance of metabolites in blood to categorize AD when compared to CSF biomarkers.
    Methods: This study analyzed samples from 242 cognitively normal (CN) people and 115 with AD-type dementia utilizing plasma metabolites (n = 883). Deep Learning (DL), Extreme Gradient Boosting (XGBoost) and Random Forest (RF) were used to differentiate AD from CN. These models were internally validated using Nested Cross Validation (NCV).
    Results: On the test data, DL produced the AUC of 0.85 (0.80-0.89), XGBoost produced 0.88 (0.86-0.89) and RF produced 0.85 (0.83-0.87). By comparison, CSF measures of amyloid, p-tau and t-tau (together with age and gender) produced with XGBoost the AUC values of 0.78, 0.83 and 0.87, respectively.
    Discussion: This study showed that plasma metabolites have the potential to match the AUC of well-established AD CSF biomarkers in a relatively small cohort. Further studies in independent cohorts are needed to validate whether this specific panel of blood metabolites can separate AD from controls, and how specific it is for AD as compared with other neurodegenerative disorders.
    Keywords:  Alzheimer's disease; Biomarkers; EMIF-AD; Machine-Learning; Metabolomics
    DOI:  https://doi.org/10.1016/j.trci.2019.11.001
  17. J Alzheimers Dis. 2019 Dec 26.
    Alzheimer’s Disease Neuroimaging Initiative
       BACKGROUND: Amyloid-β positivity (Aβ+) based on PET imaging is part of the enrollment criteria for many of the clinical trials of Alzheimer's disease (AD), particularly in trials for amyloid-targeted therapy. Predicting Aβ positivity prior to PET imaging can decrease unnecessary patient burden and costs of running these trials.
    OBJECTIVE: The aim of this study was to evaluate the performance of a machine learning model in estimating the individual risk of Aβ+ based on gold-standard of PET imaging.
    METHODS: We used data from an amnestic mild cognitive impairment (aMCI) subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort to develop and validate the models. The predictors of Aβ status included demographic and ApoE4 status in all models plus a combination of neuropsychological tests (NP), MRI volumetrics, and cerebrospinal fluid (CSF) biomarkers.
    RESULTS: The models that included NP and MRI measures separately showed an area under the receiver operating characteristics (AUC) of 0.74 and 0.72, respectively. However, using NP and MRI measures jointly in the model did not improve prediction. The models including CSF biomarkers significantly outperformed other models with AUCs between 0.89 to 0.92.
    CONCLUSIONS: Predictive models can be effectively used to identify persons with aMCI likely to be amyloid positive on a subsequent PET scan.
    Keywords:  Alzheimer’s disease; amyloid imaging; machine learning; mild cognitive impairment; predictive analytics
    DOI:  https://doi.org/10.3233/JAD-191038
  18. J Neurosurg. 2020 Jan 03. pii: 2019.10.JNS191400. [Epub ahead of print] 1-10
       OBJECTIVE: Cushing's disease (CD) involves brain impairments caused by excessive cortisol. Whether these impairments are reversible in remitted CD after surgery has long been controversial due to a lack of high-quality longitudinal studies. In this study the authors aimed to assess the reversibility of whole-brain changes in remitted CD after transsphenoidal surgery (TSS), and its correlations with clinical and hormonal parameters, in the largest longitudinal study cohort to date for CD patient brain analysis.
    METHODS: Fifty patients with pathologically diagnosed CD and 36 matched healthy controls (HCs) were enrolled in a tertiary comprehensive hospital and national pituitary disease registry center in China. 3-T MRI studies were analyzed using an artificial intelligence-assisted web-based autosegmentation tool to quantify 3D brain volumes. Clinical parameters as well as levels of serum cortisol, adrenocorticotrophic hormone (ACTH), and 24-hour urinary free cortisol were collected for the correlation analysis. All CD patients underwent TSS and 46 patients achieved remission. All clinical, hormonal, and MRI parameters were reevaluated at the 3-month follow-up after surgery.
    RESULTS: Widespread brain volume loss was observed in active CD patients compared with HCs, including total gray matter (p = 0.003, with false discovery rate [FDR] correction) and the frontal, parietal, occipital, and temporal lobes; insula; cingulate lobe; and enlargement of lateral and third ventricles (p < 0.05, corrected with FDR). All affected brain regions improved significantly after TSS (p < 0.05, corrected with FDR). In patients with remitted CD, total gray matter and most brain regions (except the frontal and temporal lobes) showed full recovery of volume, with volumes that did not differ from those of HCs (p > 0.05, corrected with FDR). ACTH and serum cortisol changes were negatively correlated with brain volume changes during recovery (p < 0.05).
    CONCLUSIONS: This study demonstrates the rapid reversal of total gray matter loss in remitted CD. The combination of full recovery areas and partial recovery areas after TSS is consistent with the incomplete recovery of memory and cognitive function observed in CD patients in clinical practice. Correlation analyses suggest that ACTH and serum cortisol levels are reliable serum biomarkers of brain recovery for clinical use after surgery.
    Keywords:  24hUFC = 24-hour urinary free cortisol; ACTH = adrenocorticotrophic hormone; CD = Cushing’s disease; CS = Cushing’s syndrome; Cushing’s disease; DTI = diffusion tensor imaging; FDR = false discovery rate; GC = glucocorticoid; HC = healthy control; TSS = transsphenoidal surgery; artificial intelligence; brain imaging; pituitary surgery; transsphenoidal surgery
    DOI:  https://doi.org/10.3171/2019.10.JNS191400
  19. Kidney Int. 2019 Nov 09. pii: S0085-2538(19)31116-0. [Epub ahead of print]
      Symptoms are common in patients on maintenance hemodialysis but identification is challenging. New informatics approaches including natural language processing (NLP) can be utilized to identify symptoms from narrative clinical documentation. Here we utilized NLP to identify seven patient symptoms from notes of maintenance hemodialysis patients of the BioMe Biobank and validated our findings using a separate cohort and the MIMIC-III database. NLP performance was compared for symptom detection with International Classification of Diseases (ICD)-9/10 codes and the performance of both methods were validated against manual chart review. From 1034 and 519 hemodialysis patients within BioMe and MIMIC-III databases, respectively, the most frequently identified symptoms by NLP were fatigue, pain, and nausea/vomiting. In BioMe, sensitivity for NLP (0.85 - 0.99) was higher than for ICD codes (0.09 - 0.59) for all symptoms with similar results in the BioMe validation cohort and MIMIC-III. ICD codes were significantly more specific for nausea/vomiting in BioMe and more specific for fatigue, depression, and pain in the MIMIC-III database. A majority of patients in both cohorts had four or more symptoms. Patients with more symptoms identified by NLP, ICD, and chart review had more clinical encounters. NLP had higher specificity in inpatient notes but higher sensitivity in outpatient notes and performed similarly across pain severity subgroups. Thus, NLP had higher sensitivity compared to ICD codes for identification of seven common hemodialysis-related symptoms, with comparable specificity between the two methods. Hence, NLP may be useful for the high-throughput identification of patient-centered outcomes when using electronic health records.
    Keywords:  geriatric nephrology; hemodialysis; natural language processing; patient-centered outcomes; symptoms
    DOI:  https://doi.org/10.1016/j.kint.2019.10.023
  20. Cancer Med. 2020 Jan 01.
      Early identification of metastatic or recurrent colorectal cancer (CRC) patients who will be sensitive to FOLFOX (5-FU, leucovorin and oxaliplatin) therapy is very important. We performed microarray meta-analysis to identify differentially expressed genes (DEGs) between FOLFOX responders and nonresponders in metastatic or recurrent CRC patients, and found that the expression levels of WASHC4, HELZ, ERN1, RPS6KB1, and APPBP2 were downregulated, while the expression levels of IRF7, EML3, LYPLA2, DRAP1, RNH1, PKP3, TSPAN17, LSS, MLKL, PPP1R7, GCDH, C19ORF24, and CCDC124 were upregulated in FOLFOX responders compared with nonresponders. Subsequent functional annotation showed that DEGs were significantly enriched in autophagy, ErbB signaling pathway, mitophagy, endocytosis, FoxO signaling pathway, apoptosis, and antifolate resistance pathways. Based on those candidate genes, several machine learning algorithms were applied to the training set, then performances of models were assessed via the cross validation method. Candidate models with the best tuning parameters were applied to the test set and the final model showed satisfactory performance. In addition, we also reported that MLKL and CCDC124 gene expression were independent prognostic factors for metastatic CRC patients undergoing FOLFOX therapy.
    Keywords:  FOLFOX; colorectal cancer; machine learning algorithm; microarray meta-analysis
    DOI:  https://doi.org/10.1002/cam4.2786
  21. JAMA Netw Open. 2020 Jan 03. 3(1): e1918377
       Importance: Social and economic costs of depression are exacerbated by prolonged periods spent identifying treatments that would be effective for a particular patient. Thus, a tool that reliably predicts an individual patient's response to treatment could significantly reduce the burden of depression.
    Objective: To estimate how accurately an outcome of escitalopram treatment can be predicted from electroencephalographic (EEG) data on patients with depression.
    Design, Setting, and Participants: This prognostic study used a support vector machine classifier to predict treatment outcome using data from the first Canadian Biomarker Integration Network in Depression (CAN-BIND-1) study. The CAN-BIND-1 study comprised 180 patients (aged 18-60 years) diagnosed with major depressive disorder who had completed 8 weeks of treatment. Of this group, 122 patients had EEG data recorded before the treatment; 115 also had EEG data recorded after the first 2 weeks of treatment.
    Interventions: All participants completed 8 weeks of open-label escitalopram (10-20 mg) treatment.
    Main Outcomes and Measures: The ability of EEG data to predict treatment outcome, measured as accuracy, specificity, and sensitivity of the classifier at baseline and after the first 2 weeks of treatment. The treatment outcome was defined in terms of change in symptom severity, measured by the Montgomery-Åsberg Depression Rating Scale, before and after 8 weeks of treatment. A patient was designated as a responder if the Montgomery-Åsberg Depression Rating Scale score decreased by at least 50% during the 8 weeks and as a nonresponder if the score decrease was less than 50%.
    Results: Of the 122 participants who completed a baseline EEG recording (mean [SD] age, 36.3 [12.7] years; 76 [62.3%] female), the classifier was able to identify responders with an estimated accuracy of 79.2% (sensitivity, 67.3%; specificity, 91.0%) when using only the baseline EEG data. For a subset of 115 participants who had additional EEG data recorded after the first 2 weeks of treatment, use of these data increased the accuracy to 82.4% (sensitivity, 79.2%; specificity, 85.5%).
    Conclusions and Relevance: These findings demonstrate the potential utility of EEG as a treatment planning tool for escitalopram therapy. Further development of the classification tools presented in this study holds the promise of expediting the search for optimal treatment for each patient.
    DOI:  https://doi.org/10.1001/jamanetworkopen.2019.18377
  22. PLoS One. 2019 ;14(12): e0227324
       BACKGROUND: Initiation of the antiarrhythmic medication dofetilide requires an FDA-mandated 3 days of telemetry monitoring due to heightened risk of toxicity within this time period. Although a recommended dose management algorithm for dofetilide exists, there is a range of real-world approaches to dosing the medication.
    METHODS AND RESULTS: In this multicenter investigation, clinical data from the Antiarrhythmic Drug Genetic (AADGEN) study was examined for 354 patients undergoing dofetilide initiation. Univariate logistic regression identified a starting dofetilide dose of 500 mcg (OR 5.0, 95%CI 2.5-10.0, p<0.001) and sinus rhythm at the start of dofetilide loading (OR 2.8, 95%CI 1.8-4.2, p<0.001) as strong positive predictors of successful loading. Any dose-adjustment during loading (OR 0.19, 95%CI 0.12-0.31, p<0.001) and a history coronary artery disease (OR 0.33, 95%CI 0.19-0.59, p<0.001) were strong negative predictors of successful dofetilide loading. Based on the observation that any dose adjustment was a significant negative predictor of successful initiation, we applied multiple supervised approaches to attempt to predict the dose adjustment decision, but none of these approaches identified dose adjustments better than a probabilistic guess. Principal component analysis and cluster analysis identified 8 clusters as a reasonable data reduction method. These 8 clusters were then used to define patient states in a tabular reinforcement learning model trained on 80% of dosing decisions. Testing of this model on the remaining 20% of dosing decisions revealed good accuracy of the reinforcement learning model, with only 16/410 (3.9%) instances of disagreement.
    CONCLUSIONS: Dose adjustments are a strong determinant of whether patients are able to successfully initiate dofetilide. A reinforcement learning algorithm informed by unsupervised learning was able to predict dosing decisions with 96.1% accuracy. Future studies will apply this algorithm prospectively as a data-driven decision aid.
    DOI:  https://doi.org/10.1371/journal.pone.0227324
  23. Anticancer Res. 2020 Jan;40(1): 271-280
       BACKGROUND/AIM: To investigate whether a radiomic machine learning (ML) approach employing texture-analysis (TA) features extracted from primary tumor lesions (PTLs) is able to predict tumor grade (TG) and nodal status (NS) in patients with oropharyngeal (OP) and oral cavity (OC) squamous-cell carcinoma (SCC).
    PATIENTS AND METHODS: Contrast-enhanced CT images of 40 patients with OP and OC SCC were post-processed to extract TA features from PTLs. A feature selection method and different ML algorithms were applied to find the most accurate subset of features to predict TG and NS.
    RESULTS: For the prediction of TG, the best accuracy (92.9%) was achieved by Naïve Bayes (NB), bagging of NB and K Nearest Neighbor (KNN). For the prediction of NS, J48, NB, bagging of NB and boosting of J48 overcame the accuracy of 90%.
    CONCLUSION: A radiomic ML approach applied to PTLs is able to predict TG and NS in patients with OC and OP SCC.
    Keywords:  Head and neck squamous cell carcinoma; computed tomography; machine learning; texture analysis
    DOI:  https://doi.org/10.21873/anticanres.13949