bims-arihec Biomed News
on Artificial Intelligence in Healthcare
Issue of 2020‒01‒26
twenty-two papers selected by
Céline Bélanger
Cogniges Inc.


  1. Psychiatry Res. 2019 Dec 09. pii: S0165-1781(19)32083-9. [Epub ahead of print]284 112732
    Graham SA, Lee EE, Jeste DV, Van Patten R, Twamley EW, Nebeker C, Yamada Y, Kim HC, Depp CA.
      Preserving cognition and mental capacity is critical to aging with autonomy. Early detection of pathological cognitive decline facilitates the greatest impact of restorative or preventative treatments. Artificial Intelligence (AI) in healthcare is the use of computational algorithms that mimic human cognitive functions to analyze complex medical data. AI technologies like machine learning (ML) support the integration of biological, psychological, and social factors when approaching diagnosis, prognosis, and treatment of disease. This paper serves to acquaint clinicians and other stakeholders with the use, benefits, and limitations of AI for predicting, diagnosing, and classifying mild and major neurocognitive impairments, by providing a conceptual overview of this topic with emphasis on the features explored and AI techniques employed. We present studies that fell into six categories of features used for these purposes: (1) sociodemographics; (2) clinical and psychometric assessments; (3) neuroimaging and neurophysiology; (4) electronic health records and claims; (5) novel assessments (e.g., sensors for digital data); and (6) genomics/other omics. For each category we provide examples of AI approaches, including supervised and unsupervised ML, deep learning, and natural language processing. AI technology, still nascent in healthcare, has great potential to transform the way we diagnose and treat patients with neurocognitive disorders.
    Keywords:  Dementia; Machine learning; Mild cognitive impairment; Natural language processing; Sensors
    DOI:  https://doi.org/10.1016/j.psychres.2019.112732
  2. PLoS One. 2020 ;15(1): e0224445
    Engle E, Gabrielian A, Long A, Hurt DE, Rosenthal A.
      Availability of trained radiologists for fast processing of CXRs in regions burdened with tuberculosis always has been a challenge, affecting both timely diagnosis and patient monitoring. The paucity of annotated images of lungs of TB patients hampers attempts to apply data-oriented algorithms for research and clinical practices. The TB Portals Program database (TBPP, https://TBPortals.niaid.nih.gov) is a global collaboration curating a large collection of the most dangerous, hard-to-cure drug-resistant tuberculosis (DR-TB) patient cases. TBPP, with 1,179 (83%) DR-TB patient cases, is a unique collection that is well positioned as a testing ground for deep learning classifiers. As of January 2019, the TBPP database contains 1,538 CXRs, of which 346 (22.5%) are annotated by a radiologist and 104 (6.7%) by a pulmonologist-leaving 1,088 (70.7%) CXRs without annotations. The Qure.ai qXR artificial intelligence automated CXR interpretation tool, was blind-tested on the 346 radiologist-annotated CXRs from the TBPP database. Qure.ai qXR CXR predictions for cavity, nodule, pleural effusion, hilar lymphadenopathy was successfully matching human expert annotations. In addition, we tested the 12 Qure.ai classifiers to find whether they correlate with treatment success (information provided by treating physicians). Ten descriptors were found as significant: abnormal CXR (p = 0.0005), pleural effusion (p = 0.048), nodule (p = 0.0004), hilar lymphadenopathy (p = 0.0038), cavity (p = 0.0002), opacity (p = 0.0006), atelectasis (p = 0.0074), consolidation (p = 0.0004), indicator of TB disease (p = < .0001), and fibrosis (p = < .0001). We conclude that applying fully automated Qure.ai CXR analysis tool is useful for fast, accurate, uniform, large-scale CXR annotation assistance, as it performed well even for DR-TB cases that were not used for initial training. Testing artificial intelligence algorithms (encapsulating both machine learning and deep learning classifiers) on diverse data collections, such as TBPP, is critically important toward progressing to clinically adopted automatic assistants for medical data analysis.
    DOI:  https://doi.org/10.1371/journal.pone.0224445
  3. Br J Dermatol. 2020 Jan 20.
    Du-Harpur X, Watt FM, Luscombe NM, Lynch MD.
      In the past, the skills required to make an accurate dermatological diagnosis have required exposure to thousands of patients over many years. However, in recent years, artificial intelligence (AI) has made enormous advances, particularly in the area of image classification. This has led computer scientists to apply these techniques to develop algorithms that are able to recognise skin lesions, particularly melanoma. Since 2017, there have been numerous studies assessing the accuracy of algorithms with some reporting that accuracy matches or surpasses that of a dermatologist. Whilst the principles underlying these methods are relatively straightforward, it can be challenging for the practising dermatologist to make sense of a plethora of unfamiliar terms in this domain. Here, we explain the concepts of artificial intelligence, machine learning, neural networks and deep learning, and explore the principles of how these tasks are accomplished. We critically evaluate the studies that assess the efficacy of these methods and discuss limitations and potential ethical issues. The burden of skin cancer is growing within the Western world, with major implications for both population skin health, and the provision of dermatology services. AI has the potential to assist in the diagnosis of skin lesions and may have particular value at the interface between primary and secondary care. The emerging technology represents an exciting opportunity for dermatologists, who are the individuals best informed to explore the utility of this powerful novel diagnostic tool, and facilitate its safe and ethical implementation within healthcare systems.
    DOI:  https://doi.org/10.1111/bjd.18880
  4. J Clin Med. 2020 Jan 17. pii: E248. [Epub ahead of print]9(1):
    Chumbita M, Cillóniz C, Puerta-Alcalde P, Moreno-García E, Sanjuan G, Garcia-Pouton N, Soriano A, Torres A, Garcia-Vidal C.
      The use of artificial intelligence (AI) to support clinical medical decisions is a rather promising concept. There are two important factors that have driven these advances: the availability of data from electronic health records (EHR) and progress made in computational performance. These two concepts are interrelated with respect to complex mathematical functions such as machine learning (ML) or neural networks (NN). Indeed, some published articles have already demonstrated the potential of these approaches in medicine. When considering the diagnosis and management of pneumonia, the use of AI and chest X-ray (CXR) images primarily have been indicative of early diagnosis, prompt antimicrobial therapy, and ultimately, better prognosis. Coupled with this is the growing research involving empirical therapy and mortality prediction, too. Maximizing the power of NN, the majority of studies have reported high accuracy rates in their predictions. As AI can handle large amounts of data and execute mathematical functions such as machine learning and neural networks, AI can be revolutionary in supporting the clinical decision-making processes. In this review, we describe and discuss the most relevant studies of AI in pneumonia.
    Keywords:  artificial intelligence; pneumonia
    DOI:  https://doi.org/10.3390/jcm9010248
  5. Artif Intell Med. 2020 Jan;pii: S0933-3657(18)30648-1. [Epub ahead of print]102 101771
    Ben Miled Z, Haas K, Black CM, Khandker RK, Chandrasekaran V, Lipton R, Boustani MA.
      Our aim is to develop a machine learning (ML) model that can predict dementia in a general patient population from multiple health care institutions one year and three years prior to the onset of the disease without any additional monitoring or screening. The purpose of the model is to automate the cost-effective, non-invasive, digital pre-screening of patients at risk for dementia. Towards this purpose, routine care data, which is widely available through Electronic Medical Record (EMR) systems is used as a data source. These data embody a rich knowledge and make related medical applications easy to deploy at scale in a cost-effective manner. Specifically, the model is trained by using structured and unstructured data from three EMR data sets: diagnosis, prescriptions, and medical notes. Each of these three data sets is used to construct an individual model along with a combined model which is derived by using all three data sets. Human-interpretable data processing and ML techniques are selected in order to facilitate adoption of the proposed model by health care providers from multiple institutions. The results show that the combined model is generalizable across multiple institutions and is able to predict dementia within one year of its onset with an accuracy of nearly 80% despite the fact that it was trained using routine care data. Moreover, the analysis of the models identified important predictors for dementia. Some of these predictors (e.g., age and hypertensive disorders) are already confirmed by the literature while others, especially the ones derived from the unstructured medical notes, require further clinical analysis.
    Keywords:  Dementia; EMR; Machine learning; Prediction; Random forest
    DOI:  https://doi.org/10.1016/j.artmed.2019.101771
  6. Curr Treat Options Gastroenterol. 2020 Jan 21.
    Hoerter N, Gross SA, Liang PS.
      PURPOSE OF REVIEW: This review highlights the history, recent advances, and ongoing challenges of artificial intelligence (AI) technology in colonic polyp detection.RECENT FINDINGS: Hand-crafted AI algorithms have recently given way to convolutional neural networks with the ability to detect polyps in real-time. The first randomized controlled trial comparing an AI system to standard colonoscopy found a 9% increase in adenoma detection rate, but the improvement was restricted to polyps smaller than 10 mm and the results need validation. As this field rapidly evolves, important issues to consider include standardization of outcomes, dataset availability, real-world applications, and regulatory approval. AI has shown great potential for improving colonic polyp detection while requiring minimal training for endoscopists. The question of when AI will enter endoscopic practice depends on whether the technology can be integrated into existing hardware and an assessment of its added value for patient care.
    Keywords:  Artificial intelligence; Colonic neoplasm; Computer-aided detection; Convolutional neural network; Machine learning
    DOI:  https://doi.org/10.1007/s11938-020-00274-2
  7. Artif Intell Med. 2020 Jan;pii: S0933-3657(19)30617-7. [Epub ahead of print]102 101779
    Abdelaziz Ismael SA, Mohammed A, Hefny H.
      Cancer is the second leading cause of death after cardiovascular diseases. Out of all types of cancer, brain cancer has the lowest survival rate. Brain tumors can have different types depending on their shape, texture, and location. Proper diagnosis of the tumor type enables the doctor to make the correct treatment choice and help save the patient's life. There is a high need in the Artificial Intelligence field for a Computer Assisted Diagnosis (CAD) system to assist doctors and radiologists with the diagnosis and classification of tumors. Over recent years, deep learning has shown an optimistic performance in computer vision systems. In this paper, we propose an enhanced approach for classifying brain tumor types using Residual Networks. We evaluate the proposed model on a benchmark dataset containing 3064 MRI images of 3 brain tumor types (Meningiomas, Gliomas, and Pituitary tumors). We have achieved the highest accuracy of 99% outperforming the other previous work on the same dataset.
    Keywords:  Artificial neural network; Cancer classification; Convolutional neural network; Deep residual network; Machine learning
    DOI:  https://doi.org/10.1016/j.artmed.2019.101779
  8. Intensive Care Med. 2020 Jan 21.
    Fleuren LM, Klausch TLT, Zwager CL, Schoonmade LJ, Guo T, Roggeveen LF, Swart EL, Girbes ARJ, Thoral P, Ercole A, Hoogendoorn M, Elbers PWG.
      PURPOSE: Early clinical recognition of sepsis can be challenging. With the advancement of machine learning, promising real-time models to predict sepsis have emerged. We assessed their performance by carrying out a systematic review and meta-analysis.METHODS: A systematic search was performed in PubMed, Embase.com and Scopus. Studies targeting sepsis, severe sepsis or septic shock in any hospital setting were eligible for inclusion. The index test was any supervised machine learning model for real-time prediction of these conditions. Quality of evidence was assessed using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) methodology, with a tailored Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) checklist to evaluate risk of bias. Models with a reported area under the curve of the receiver operating characteristic (AUROC) metric were meta-analyzed to identify strongest contributors to model performance.
    RESULTS: After screening, a total of 28 papers were eligible for synthesis, from which 130 models were extracted. The majority of papers were developed in the intensive care unit (ICU, n = 15; 54%), followed by hospital wards (n = 7; 25%), the emergency department (ED, n = 4; 14%) and all of these settings (n = 2; 7%). For the prediction of sepsis, diagnostic test accuracy assessed by the AUROC ranged from 0.68-0.99 in the ICU, to 0.96-0.98 in-hospital and 0.87 to 0.97 in the ED. Varying sepsis definitions limit pooling of the performance across studies. Only three papers clinically implemented models with mixed results. In the multivariate analysis, temperature, lab values, and model type contributed most to model performance.
    CONCLUSION: This systematic review and meta-analysis show that on retrospective data, individual machine learning models can accurately predict sepsis onset ahead of time. Although they present alternatives to traditional scoring systems, between-study heterogeneity limits the assessment of pooled results. Systematic reporting and clinical implementation studies are needed to bridge the gap between bytes and bedside.
    Keywords:  Machine learning; Meta-analysis; Prediction; Sepsis; Septic shock; Systematic review
    DOI:  https://doi.org/10.1007/s00134-019-05872-y
  9. Artif Intell Med. 2020 Jan;pii: S0933-3657(18)30712-7. [Epub ahead of print]102 101742
    Zahia S, Garcia Zapirain MB, Sevillano X, González A, Kim PJ, Elmaghraby A.
      Pressure injuries represent a tremendous healthcare challenge in many nations. Elderly and disabled people are the most affected by this fast growing disease. Hence, an accurate diagnosis of pressure injuries is paramount for efficient treatment. The characteristics of these wounds are crucial indicators for the progress of the healing. While invasive methods to retrieve information are not only painful to the patients but may also increase the risk of infections, non-invasive techniques by means of imaging systems provide a better monitoring of the wound healing processes without causing any harm to the patients. These systems should include an accurate segmentation of the wound, the classification of its tissue types, the metrics including the diameter, area and volume, as well as the healing evaluation. Therefore, the aim of this survey is to provide the reader with an overview of imaging techniques for the analysis and monitoring of pressure injuries as an aid to their diagnosis, and proof of the efficiency of Deep Learning to overcome this problem and even outperform the previous methods. In this paper, 114 out of 199 papers retrieved from 8 databases have been analyzed, including also contributions on chronic wounds and skin lesions.
    Keywords:  Deep learning; Machine learning algorithms; Pressure injury; Wound image analysis
    DOI:  https://doi.org/10.1016/j.artmed.2019.101742
  10. JCO Clin Cancer Inform. 2020 Jan;4 50-59
    Beck JT, Rammage M, Jackson GP, Preininger AM, Dankwa-Mullan I, Roebuck MC, Torres A, Holtzen H, Coverdill SE, Williamson MP, Chau Q, Rhee K, Vinegra M.
      PURPOSE: Less than 5% of patients with cancer enroll in clinical trials, and 1 in 5 trials are stopped for poor accrual. We evaluated an automated clinical trial matching system that uses natural language processing to extract patient and trial characteristics from unstructured sources and machine learning to match patients to clinical trials.PATIENTS AND METHODS: Medical records from 997 patients with breast cancer were assessed for trial eligibility at Highlands Oncology Group between May and August 2016. System and manual attribute extraction and eligibility determinations were compared using the percentage of agreement for 239 patients and 4 trials. Sensitivity and specificity of system-generated eligibility determinations were measured, and the time required for manual review and system-assisted eligibility determinations were compared.
    RESULTS: Agreement between system and manual attribute extraction ranged from 64.3% to 94.0%. Agreement between system and manual eligibility determinations was 81%-96%. System eligibility determinations demonstrated specificities between 76% and 99%, with sensitivities between 91% and 95% for 3 trials and 46.7% for the 4th. Manual eligibility screening of 90 patients for 3 trials took 110 minutes; system-assisted eligibility determinations of the same patients for the same trials required 24 minutes.
    CONCLUSION: In this study, the clinical trial matching system displayed a promising performance in screening patients with breast cancer for trial eligibility. System-assisted trial eligibility determinations were substantially faster than manual review, and the system reliably excluded ineligible patients for all trials and identified eligible patients for most trials.
    DOI:  https://doi.org/10.1200/CCI.19.00079
  11. Physiol Meas. 2020 Jan 24.
    Oster J, Hopewell JC, Ziberna K, Wijesurendra R, Camm CF, Casadei B, Tarassenko L.
      Atrial Fibrillation (AF) is the most common cardiac arrhythmia, with an estimated prevalence of around 1.6% in the adult population. The analysis of the Electrocardiogram (ECG) data acquired in the UK Biobank represents an opportunity to screen for AF in a large sub-population in the UK. The main objective of this paper is to assess ten machine-learning methods for automated detection of subjects with AF in the UK Biobank dataset. Six classical machine-learning methods based on Support Vector Machines are proposed and compared with state-of-the-art techniques (including a deep-learning algorithm), and finally a combination of a classical machine-learning and deep learning approaches. Evaluation is carried out on a subset of the UK Biobank dataset, manually annotated by human experts. The combined classical machine-learning and deep learning method achieved an F1 score of 84.8% on the test subset, and a Cohen's Kappa coefficient of 0.83, which is similar to the inter-observer agreement of two human experts. The level of performance indicates that the automated detection of AF in patients whose data have been stored in a large database, such as the UK Biobank, is possible. Such automated identification of AF patients would enable further investigations aimed at identifying the different phenotypes associated with AF.
    Keywords:  Electrocardiogram; atrial fibrillation; big data; biobank; machine learning; signal processing
    DOI:  https://doi.org/10.1088/1361-6579/ab6f9a
  12. Artif Intell Med. 2020 Jan;pii: S0933-3657(19)30363-X. [Epub ahead of print]102 101746
    Lorencin I, Anđelić N, Španjol J, Car Z.
      In this paper, the urinary bladder cancer diagnostic method which is based on Multi-Layer Perceptron and Laplacian edge detector is presented. The aim of this paper is to investigate the implementation possibility of a simpler method (Multi-Layer Perceptron) alongside commonly used methods, such as Deep Learning Convolutional Neural Networks, for the urinary bladder cancer detection. The dataset used for this research consisted of 1997 images of bladder cancer and 986 images of non-cancer tissue. The results of the conducted research showed that using Multi-Layer Perceptron trained and tested with images pre-processed with Laplacian edge detector are achieving AUC value up to 0.99. When different image sizes are compared it can be seen that the best results are achieved if 50×50 and 100×100 images were used.
    Keywords:  Artificial intelligence; Image pre-processing; Laplacian edge detector; Multi-layer perceptron; Urinary bladder cancer
    DOI:  https://doi.org/10.1016/j.artmed.2019.101746
  13. Artif Intell Med. 2020 Jan;pii: S0933-3657(19)30585-8. [Epub ahead of print]102 101758
    Sengupta S, Singh A, Leopold HA, Gulati T, Lakshminarayanan V.
      An overview of the applications of deep learning for ophthalmic diagnosis using retinal fundus images is presented. We describe various retinal image datasets that can be used for deep learning purposes. Applications of deep learning for segmentation of optic disk, optic cup, blood vessels as well as detection of lesions are reviewed. Recent deep learning models for classification of diseases such as age-related macular degeneration, glaucoma, and diabetic retinopathy are also discussed. Important critical insights and future research directions are given.
    Keywords:  Classification; Deep learning; Fundus image datasets; Fundus photos; Image segmentation; Ophthalmology; Retina
    DOI:  https://doi.org/10.1016/j.artmed.2019.101758
  14. J Neurol Sci. 2020 Jan 03. pii: S0022-510X(20)30003-4. [Epub ahead of print]410 116667
    Chung CC, Hong CT, Huang YH, Su EC, Chan L, Hu CJ, Chiu HW.
      OBJECTIVE: To develop artificial neural network (ANN)-based functional outcome prediction models for patients with acute ischemic stroke (AIS) receiving intravenous thrombolysis based on immediate pretreatment parameters.METHODS: The derived cohort consisted of 196 patients with AIS treated with intravenous thrombolysis between 2009 and 2017 at Shuang Ho Hospital in Taiwan. We evaluated the predictive value of parameters associated with major neurologic improvement (MNI) at 24 h after thrombolysis as well as the 3-month outcome. ANN models were applied for outcome prediction. The generalizability of the model was assessed through 5-fold cross-validation. The performance of the models was assessed according to the accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), RESULTS: The parameters associated with MNI were blood pressure (BP), heart rate, glucose level, consciousness level, National Institutes of Health Stroke Scale (NIHSS) score, and history of diabetes mellitus (DM). The parameters associated with the 3-month outcome were age, consciousness level, BP, glucose level, hemoglobin A1c, history of DM, stroke subtype, and NIHSS score. After adequate training, ANN Model 1 to predict MNI achieved an AUC of 0.944. Accuracy, sensitivity, and specificity were 94.6%, 89.8%, and 95.9%, respectively. ANN Model 2 to predict the 3-month outcome achieved an AUC of 0.933, with accuracy, sensitivity, and specificity of 88.8%, 94.7%, and 86.5%, respectively.
    CONCLUSIONS: The ANN-based models achieved reliable performance to predict MNI and 3-month outcomes after thrombolysis for AIS. The models proposed have clinical value to assist in decision-making, especially when invasive adjuvant strategies are considered.
    Keywords:  Artificial intelligence; Artificial neural network; Outcome; Prediction; Stroke; Thrombolysis
    DOI:  https://doi.org/10.1016/j.jns.2020.116667
  15. Invest Radiol. 2020 Jan 21.
    Finck T, Li H, Grundl L, Eichinger P, Bussas M, Mühlau M, Menze B, Wiestler B.
      OBJECTIVES: The aim of the study was to implement a deep-learning tool to produce synthetic double inversion recovery (synthDIR) images and compare their diagnostic performance to conventional sequences in patients with multiple sclerosis (MS).MATERIALS AND METHODS: For this retrospective analysis, 100 MS patients (65 female, 37 [22-68] years) were randomly selected from a prospective observational cohort between 2014 and 2016. In a subset of 50 patients, an artificial neural network (DiamondGAN) was trained to generate a synthetic DIR (synthDIR) from standard acquisitions (T1, T2, and fluid-attenuated inversion recovery [FLAIR]). With the resulting network, synthDIR was generated for the remaining 50 subjects. These images as well as conventionally acquired DIR (trueDIR) and FLAIR images were assessed for MS lesions by 2 independent readers, blinded to the source of the DIR image. Lesion counts in the different modalities were compared using a Wilcoxon signed-rank test, and interrater analysis was performed. Contrast-to-noise ratios were compared for objective image quality.
    RESULTS: Utilization of synthDIR allowed to detect significantly more lesions compared with the use of FLAIR images (31.4 ± 20.7 vs 22.8 ± 12.7, P < 0.001). This improvement was mainly attributable to an improved depiction of juxtacortical lesions (12.3 ± 10.8 vs 7.2 ± 5.6, P < 0.001). Interrater reliability was excellent in FLAIR 0.92 (95% confidence interval [CI], 0.85-0.95), synthDIR 0.93 (95% CI, 0.87-0.96), and trueDIR 0.95 (95% CI, 0.85-0.98).Contrast-to-noise ratio in synthDIR exceeded that of FLAIR (22.0 ± 6.4 vs 16.7 ± 3.6, P = 0.009); no significant difference was seen in comparison to trueDIR (22.0 ± 6.4 vs 22.4 ± 7.9, P = 0.87).
    CONCLUSIONS: Computationally generated DIR images improve lesion depiction compared with the use of standard modalities. This method demonstrates how artificial intelligence can help improving imaging in specific pathologies.
    DOI:  https://doi.org/10.1097/RLI.0000000000000640
  16. Health Informatics J. 2020 Jan 22. 1460458219900452
    Cresswell K, Callaghan M, Khan S, Sheikh Z, Mozaffar H, Sheikh A.
      There is growing interest in the potential of artificial intelligence to support decision-making in health and social care settings. There is, however, currently limited evidence of the effectiveness of these systems. The aim of this study was to investigate the effectiveness of artificial intelligence-based computerised decision support systems in health and social care settings. We conducted a systematic literature review to identify relevant randomised controlled trials conducted between 2013 and 2018. We searched the following databases: MEDLINE, EMBASE, CINAHL, PsycINFO, Web of Science, Cochrane Library, ASSIA, Emerald, Health Business Fulltext Elite, ProQuest Public Health, Social Care Online, and grey literature sources. Search terms were conceptualised into three groups: artificial intelligence-related terms, computerised decision support -related terms, and terms relating to health and social care. Terms within groups were combined using the Boolean operator OR, and groups were combined using the Boolean operator AND. Two reviewers independently screened studies against the eligibility criteria and two independent reviewers extracted data on eligible studies onto a customised sheet. We assessed the quality of studies through the Critical Appraisal Skills Programme checklist for randomised controlled trials. We then conducted a narrative synthesis. We identified 68 hits of which five studies satisfied the inclusion criteria. These studies varied substantially in relation to quality, settings, outcomes, and technologies. None of the studies was conducted in social care settings, and three randomised controlled trials showed no difference in patient outcomes. Of these, one investigated the use of Bayesian triage algorithms on forced expiratory volume in 1 second (FEV1) and health-related quality of life in lung transplant patients. Another investigated the effect of image pattern recognition on neonatal development outcomes in pregnant women, and another investigated the effect of the Kalman filter technique for warfarin dosing suggestions on time in therapeutic range. The remaining two randomised controlled trials, investigating computer vision and neural networks on medication adherence and the impact of learning algorithms on assessment time of patients with gestational diabetes, showed statistically significant and clinically important differences to the control groups receiving standard care. However, these studies tended to be of low quality lacking detailed descriptions of methods and only one study used a double-blind design. Although the evidence of effectiveness of data-driven artificial intelligence to support decision-making in health and social care settings is limited, this work provides important insights on how a meaningful evidence base in this emerging field needs to be developed going forward. It is unlikely that any single overall message surrounding effectiveness will emerge - rather effectiveness of interventions is likely to be context-specific and calls for inclusion of a range of study designs to investigate mechanisms of action.
    Keywords:  artificial intelligence; decision support systems; narrative synthesis; randomised controlled trial; systematic review
    DOI:  https://doi.org/10.1177/1460458219900452
  17. Clin Radiol. 2020 Jan 21. pii: S0009-9260(20)30001-5. [Epub ahead of print]
    Alis D, Bagcilar O, Senli YD, Isler C, Yergin M, Kocer N, Islak C, Kizilkilic O.
      AIM: To explore the value of quantitative texture analysis of conventional magnetic resonance imaging (MRI) sequences using artificial neural networks (ANN) for the differentiation of high-grade gliomas (HGG) and low-grade gliomas (LGG).MATERIALS AND METHODS: A total of 181 patients, 97 with HGG (53.5%) and 84 with LGG (46.5%) with brain MRI having T2-weighted (W) fluid attenuation inversion recovery (FLAIR), and contrast-enhanced T1W images were enrolled in the present study. Histogram parameters and high-order texture features were extracted using manually placed regions of interest (ROIs) on T2W-FLAIR and contrast-enhanced T1W images covering the whole volume of the tumours. The reproducibility of the features was assessed by interobserver reliability analyses. The cohort was divided into training (n=121) and test partitions (n=60). The training set was used for attribute selection and model development, and the test set was used to evaluate the diagnostic performance of the pre-trained ANNs in discriminating HGG and LGG.
    RESULTS: In the test cohort, the ANN models using texture data of T2W-FLAIR and contrast-enhanced T1W images achieved an area under the receiver operating characteristic curve (AUC) of 0.87 and 0.86, respectively. The combined ANN model with selected texture features achieved the highest diagnostic accuracy equating 88.3% with an AUC of 0.92.
    CONCLUSIONS: Quantitative texture analysis of T2W-FLAIR and contrast-enhanced T1W enhanced by ANN can accurately discriminate HGG from LGG and might be of clinical value in tailoring the management strategies in patients with gliomas.
    DOI:  https://doi.org/10.1016/j.crad.2019.12.008
  18. JAMA Cardiol. 2020 Jan 22.
    Toba S, Mitani Y, Yodoya N, Ohashi H, Sawada H, Hayakawa H, Hirayama M, Futsuki A, Yamamoto N, Ito H, Konuma T, Shimpo H, Takao M.
      Importance: Chest radiography is a useful noninvasive modality to evaluate pulmonary blood flow status in patients with congenital heart disease. However, the predictive value of chest radiography is limited by the subjective and qualitive nature of the interpretation. Recently, deep learning has been used to analyze various images, but it has not been applied to analyzing chest radiographs in such patients.Objective: To develop and validate a quantitative method to predict the pulmonary to systemic flow ratio from chest radiographs using deep learning.
    Design, Setting, and Participants: This retrospective observational study included 1031 cardiac catheterizations performed for 657 patients from January 1, 2005, to April 30, 2019, at a tertiary center. Catheterizations without the Fick-derived pulmonary to systemic flow ratio or chest radiography performed within 1 month before catheterization were excluded. Seventy-eight patients (100 catheterizations) were randomly assigned for evaluation. A deep learning model that predicts the pulmonary to systemic flow ratio from chest radiographs was developed using the method of transfer learning.
    Main Outcomes and Measures: Whether the model can predict the pulmonary to systemic flow ratio from chest radiographs was evaluated using the intraclass correlation coefficient and Bland-Altman analysis. The diagnostic concordance rate was compared with 3 certified pediatric cardiologists. The diagnostic performance for a high pulmonary to systemic flow ratio of 2.0 or more was evaluated using cross tabulation and a receiver operating characteristic curve.
    Results: The study included 1031 catheterizations in 657 patients (522 males [51%]; median age, 3.4 years [interquartile range, 1.2-8.6 years]), in whom the mean (SD) Fick-derived pulmonary to systemic flow ratio was 1.43 (0.95). Diagnosis included congenital heart disease in 1008 catheterizations (98%). The intraclass correlation coefficient for the Fick-derived and deep learning-derived pulmonary to systemic flow ratio was 0.68, the log-transformed bias was 0.02, and the log-transformed precision was 0.12. The diagnostic concordance rate of the deep learning model was significantly higher than that of the experts (correctly classified 64 of 100 vs 49 of 100 chest radiographs; P = .02 [McNemar test]). For detecting a high pulmonary to systemic flow ratio, the sensitivity of the deep learning model was 0.47, the specificity was 0.95, and the area under the receiver operating curve was 0.88.
    Conclusions and Relevance: The present investigation demonstrated that deep learning-based analysis of chest radiographs predicted the pulmonary to systemic flow ratio in patients with congenital heart disease. These findings suggest that the deep learning-based approach may confer an objective and quantitative evaluation of chest radiographs in the congenital heart disease clinic.
    DOI:  https://doi.org/10.1001/jamacardio.2019.5620
  19. JMIR Med Inform. 2020 Jan 20. 8(1): e16912
    Tao L, Zhang C, Zeng L, Zhu S, Li N, Li W, Zhang H, Zhao Y, Zhan S, Ji H.
      BACKGROUND: Clinical decision support systems (CDSS) are an integral component of health information technologies and can assist disease interpretation, diagnosis, treatment, and prognosis. However, the utility of CDSS in the clinic remains controversial.OBJECTIVE: The aim is to assess the effects of CDSS integrated with British Medical Journal (BMJ) Best Practice-aided diagnosis in real-world research.
    METHODS: This was a retrospective, longitudinal observational study using routinely collected clinical diagnosis data from electronic medical records. A total of 34,113 hospitalized patient records were successively selected from December 2016 to February 2019 in six clinical departments. The diagnostic accuracy of the CDSS was verified before its implementation. A self-controlled comparison was then applied to detect the effects of CDSS implementation. Multivariable logistic regression and single-group interrupted time series analysis were used to explore the effects of CDSS. The sensitivity analysis was conducted using the subgroup data from January 2018 to February 2019.
    RESULTS: The total accuracy rates of the recommended diagnosis from CDSS were 75.46% in the first-rank diagnosis, 83.94% in the top-2 diagnosis, and 87.53% in the top-3 diagnosis in the data before CDSS implementation. Higher consistency was observed between admission and discharge diagnoses, shorter confirmed diagnosis times, and shorter hospitalization days after the CDSS implementation (all P<.001). Multivariable logistic regression analysis showed that the consistency rates after CDSS implementation (OR 1.078, 95% CI 1.015-1.144) and the proportion of hospitalization time 7 days or less (OR 1.688, 95% CI 1.592-1.789) both increased. The interrupted time series analysis showed that the consistency rates significantly increased by 6.722% (95% CI 2.433%-11.012%, P=.002) after CDSS implementation. The proportion of hospitalization time 7 days or less significantly increased by 7.837% (95% CI 1.798%-13.876%, P=.01). Similar results were obtained in the subgroup analysis.
    CONCLUSIONS: The CDSS integrated with BMJ Best Practice improved the accuracy of clinicians' diagnoses. Shorter confirmed diagnosis times and hospitalization days were also found to be associated with CDSS implementation in retrospective real-world studies. These findings highlight the utility of artificial intelligence-based CDSS to improve diagnosis efficiency, but these results require confirmation in future randomized controlled trials.
    Keywords:  BMJ Best Practice; accuracy and effect; aided diagnosis; artificial intelligence; clinical decision support systems
    DOI:  https://doi.org/10.2196/16912
  20. J Eur Acad Dermatol Venereol. 2020 Jan 22.
    Damiani G, Grossi E, Berti E, Conic RR, Radhakrishna U, Pacifico A, Bragazzi NL, Piccinno R, Linder D.
      BACKGROUND: Epithelial neoplasms of the scalp account for approximately 2% of all skin cancers and for about 10-20% of the tumors affecting the head and neck area. Radiotherapy is suggested for localized cutaneous squamous cell carcinomas (cSCC) without lymph node involvement, multiple or extensive lesions, for patients refusing surgery, for patients with a poor general medical status, as adjuvant for incompletely excised lesions and/or as a palliative treatment. To date, prognostic risk factors in scalp cSCC patients are poorly characterized.OBJECTIVE: To identify patterns of patients with higher risk of post-radiotherapy recurrence METHODS: A retrospective observational study was performed on scalp cSCC patients with histological diagnosis who underwent conventional radiotherapy (50-120 kV) (between 1996 and 2008, follow-up from 1 to 140 months, median 14 months). Out of the 79 enrolled patients, 22(27.8%) had previously undergone a surgery. Two months after radiotherapy, 66(83.5%) patients achieved a complete remission, 6(7.6%) a partial remission, whereas 2(2.5%) proved non-responsive to the treatment and 5 cases were lost to follow-up. Demographical and clinical data were preliminarily analyzed with classical descriptive statistics and with principal component analysis. All data were then re-evaluated with a machine learning-based approach using a 4th generation artificial neural networks(ANNs)-based algorithm.
    RESULTS: ANNs analysis revealed four scalp cSCC profiles among radiotherapy responsive patients, not previously described: namely, 1) stage T2 cSCC type, aged 70-80 years; 2) frontal cSCC type, aged <70 years; 3) non-recurrent nodular or nodulo-ulcerated, stage T3 cSCC type, of the vertex and treated with >60 Grays (Gy); and 4) flat, occipital, stage T1 cSCC type, treated with 50-59 Gy. The model uncovering these four predictive profiles displayed 85.7% sensitivity, 97.6% specificity, and 91.7% overall accuracy.
    CONCLUSIONS: Patient profiling/phenotyping with machine learning may be a new, helpful method to stratify patients with scalp cSCCs who may benefit from a RT-treatment.
    Keywords:  Squamous cell carcinoma; artificial neural networks; machine learning; precision medicine; radiotherapy; scalp
    DOI:  https://doi.org/10.1111/jdv.16210
  21. PLoS One. 2020 ;15(1): e0227401
    Lown M, Brown M, Brown C, Yue AM, Shah BN, Corbett SJ, Lewith G, Stuart B, Moore M, Little P.
      BACKGROUND: Atrial Fibrillation is the most common arrhythmia worldwide with a global age adjusted prevalence of 0.5% in 2010. Anticoagulation treatment using warfarin or direct oral anticoagulants is effective in reducing the risk of AF-related stroke by approximately two-thirds and can provide a 10% reduction in overall mortality. There has been increased interest in detecting AF due to its increased incidence and the possibility to prevent AF-related strokes. Inexpensive consumer devices which measure the ECG may have the potential to accurately detect AF but do not generally incorporate diagnostic algorithms. Machine learning algorithms have the potential to improve patient outcomes particularly where diagnoses are made from large volumes or complex patterns of data such as in AF.METHODS: We designed a novel AF detection algorithm using a de-correlated Lorenz plot of 60 consecutive RR intervals. In order to reduce the volume of data, the resulting images were compressed using a wavelet transformation (JPEG200 algorithm) and the compressed images were used as input data to a Support Vector Machine (SVM) classifier. We used the Massachusetts Institute of Technology (MIT)-Beth Israel Hospital (BIH) Atrial Fibrillation database and the MIT-BIH Arrhythmia database as training data and verified the algorithm performance using RR intervals collected using an inexpensive consumer heart rate monitor device (Polar-H7) in a case-control study.
    RESULTS: The SVM algorithm yielded excellent discrimination in the training data with a sensitivity of 99.2% and a specificity of 99.5% for AF. In the validation data, the SVM algorithm correctly identified AF in 79/79 cases; sensitivity 100% (95% CI 95.4%-100%) and non-AF in 328/336 cases; specificity 97.6% (95% CI 95.4%-99.0%).
    CONCLUSIONS: An inexpensive wearable heart rate monitor and machine learning algorithm can be used to detect AF with very high accuracy and has the capability to transmit ECG data which could be used to confirm AF. It could potentially be used for intermittent screening or continuously for prolonged periods to detect paroxysmal AF. Further work could lead to cost-effective and accurate estimation of AF burden and improved risk stratification in AF.
    DOI:  https://doi.org/10.1371/journal.pone.0227401
  22. JAMA Netw Open. 2020 Jan 03. 3(1): e1919657
    Pépin JL, Letesson C, Le-Dong NN, Dedave A, Denison S, Cuthbert V, Martinot JB, Gozal D.
      Importance: Given the high prevalence of obstructive sleep apnea (OSA), there is a need for simpler and automated diagnostic approaches.Objective: To evaluate whether mandibular movement (MM) monitoring during sleep coupled with an automated analysis by machine learning is appropriate for OSA diagnosis.
    Design, Setting, and Participants: Diagnostic study of adults undergoing overnight in-laboratory polysomnography (PSG) as the reference method compared with simultaneous MM monitoring at a sleep clinic in an academic institution (Sleep Laboratory, Centre Hospitalier Universitaire Université Catholique de Louvain Namur Site Sainte-Elisabeth, Namur, Belgium). Patients with suspected OSA were enrolled from July 5, 2017, to October 31, 2018.
    Main Outcomes and Measures: Obstructive sleep apnea diagnosis required either evoking signs or symptoms or related medical or psychiatric comorbidities coupled with a PSG-derived respiratory disturbance index (PSG-RDI) of at least 5 events/h. A PSG-RDI of at least 15 events/h satisfied the diagnosis criteria even in the absence of associated symptoms or comorbidities. Patients who did not meet these criteria were classified as not having OSA. Agreement analysis and diagnostic performance were assessed by Bland-Altman plot comparing PSG-RDI and the Sunrise system RDI (Sr-RDI) with diagnosis threshold optimization via receiver operating characteristic curves, allowing for evaluation of the device sensitivity and specificity in detecting OSA at 5 events/h and 15 events/h.
    Results: Among 376 consecutive adults with suspected OSA, the mean (SD) age was 49.7 (13.2) years, the mean (SD) body mass index was 31.0 (7.1), and 207 (55.1%) were men. Reliable agreement was found between PSG-RDI and Sr-RDI in patients without OSA (n = 46; mean difference, 1.31; 95% CI, -1.05 to 3.66 events/h) and in patients with OSA with a PSG-RDI of at least 5 events/h with symptoms (n = 107; mean difference, -0.69; 95% CI, -3.77 to 2.38 events/h). An Sr-RDI underestimation of -11.74 (95% CI, -20.83 to -2.67) events/h in patients with OSA with a PSG-RDI of at least 15 events/h was detected and corrected by optimization of the Sunrise system diagnostic threshold. The Sr-RDI showed diagnostic capability, with areas under the receiver operating characteristic curve of 0.95 (95% CI, 0.92-0.96) and 0.93 (95% CI, 0.90-0.93) for corresponding PSG-RDIs of 5 events/h and 15 events/h, respectively. At the 2 optimal cutoffs of 7.63 events/h and 12.65 events/h, Sr-RDI had accuracy of 0.92 (95% CI, 0.90-0.94) and 0.88 (95% CI, 0.86-0.90) as well as posttest probabilities of 0.99 (95% CI, 0.99-0.99) and 0.89 (95% CI, 0.88-0.91) at PSG-RDIs of at least 5 events/h and at least 15 events/h, respectively, corresponding to positive likelihood ratios of 14.86 (95% CI, 9.86-30.12) and 5.63 (95% CI, 4.92-7.27), respectively.
    Conclusions and Relevance: Automatic analysis of MM patterns provided reliable performance in RDI calculation. The use of this index in OSA diagnosis appears to be promising.
    DOI:  https://doi.org/10.1001/jamanetworkopen.2019.19657