bims-arihec Biomed News
on Artificial intelligence in healthcare
Issue of 2020–04–19
eightteen papers selected by
Céline Bélanger, Cogniges Inc.



  1. NPJ Digit Med. 2020 ;3 51
      Hospital systems, payers, and regulators have focused on reducing length of stay (LOS) and early readmission, with uncertain benefit. Interpretable machine learning (ML) may assist in transparently identifying the risk of important outcomes. We conducted a retrospective cohort study of hospitalizations at a tertiary academic medical center and its branches from January 2011 to May 2018. A consecutive sample of all hospitalizations in the study period were included. Algorithms were trained on medical, sociodemographic, and institutional variables to predict readmission, length of stay (LOS), and death within 48-72 h. Prediction performance was measured by area under the receiver operator characteristic curve (AUC), Brier score loss (BSL), which measures how well predicted probability matches observed probability, and other metrics. Interpretations were generated using multiple feature extraction algorithms. The study cohort included 1,485,880 hospitalizations for 708,089 unique patients (median age of 59 years, first and third quartiles (QI) [39, 73]; 55.6% female; 71% white). There were 211,022 30-day readmissions for an overall readmission rate of 14% (for patients ≥65 years: 16%). Median LOS, including observation and labor and delivery patients, was 2.94 days (QI [1.67, 5.34]), or, if these patients are excluded, 3.71 days (QI [2.15, 6.51]). Predictive performance was as follows: 30-day readmission (AUC 0.76/BSL 0.11); LOS > 5 days (AUC 0.84/BSL 0.15); death within 48-72 h (AUC 0.91/BSL 0.001). Explanatory diagrams showed factors that impacted each prediction.
    Keywords:  Health care economics; Outcomes research; Risk factors
    DOI:  https://doi.org/10.1038/s41746-020-0249-z
  2. J Clin Med. 2020 Apr 13. pii: E1107. [Epub ahead of print]9(4):
      Kidney diseases form part of the major health burdens experienced all over the world. Kidney diseases are linked to high economic burden, deaths, and morbidity rates. The great importance of collecting a large quantity of health-related data among human cohorts, what scholars refer to as "big data", has increasingly been identified, with the establishment of a large group of cohorts and the usage of electronic health records (EHRs) in nephrology and transplantation. These data are valuable, and can potentially be utilized by researchers to advance knowledge in the field. Furthermore, progress in big data is stimulating the flourishing of artificial intelligence (AI), which is an excellent tool for handling, and subsequently processing, a great amount of data and may be applied to highlight more information on the effectiveness of medicine in kidney-related complications for the purpose of more precise phenotype and outcome prediction. In this article, we discuss the advances and challenges in big data, the use of EHRs and AI, with great emphasis on the usage of nephrology and transplantation.
    Keywords:  acute kidney injury; artificial intelligence; big data; chronic kidney disease; kidney transplantation; machine learning; nephrology; transplantation
    DOI:  https://doi.org/10.3390/jcm9041107
  3. Front Med (Lausanne). 2020 ;7 100
      Artificial intelligence (AI) has become a progressively prevalent Research Topic in medicine and is increasingly being applied to dermatology. There is a need to understand this technology's progress to help guide and shape the future for medical care providers and recipients. We reviewed the literature to evaluate the types of publications on the subject, the specific dermatological topics addressed by AI, and the most challenging barriers to its implementation. A substantial number of original articles and commentaries have been published to date and only few detailed reviews exist. Most AI applications focus on differentiating between benign and malignant skin lesions, however; others exist pertaining to ulcers, inflammatory skin diseases, allergen exposure, dermatopathology, and gene expression profiling. Applications commonly analyze and classify images, however, other tools such as risk assessment calculators are becoming increasingly available. Although many applications are technologically feasible, important implementation barriers have been identified including systematic biases, difficulty of standardization, interpretability, and acceptance by physicians and patients alike. This review provides insight into future research needs and possibilities. There is a strong need for clinical investigation in dermatology providing evidence of success overcoming the identified barriers. With these research goals in mind, an appropriate role for AI in dermatology may be achieved in not so distant future.
    Keywords:  artificial intelligence; barriers; contact allergens; dermatology; machine learning; melanoma; nevi; psoriasis
    DOI:  https://doi.org/10.3389/fmed.2020.00100
  4. NPJ Digit Med. 2020 ;3 53
      Artificial intelligence (AI) and Machine learning (ML) systems in medicine are poised to significantly improve health care, for example, by offering earlier diagnoses of diseases or recommending optimally individualized treatment plans. However, the emergence of AI/ML in medicine also creates challenges, which regulators must pay attention to. Which medical AI/ML-based products should be reviewed by regulators? What evidence should be required to permit marketing for AI/ML-based software as a medical device (SaMD)? How can we ensure the safety and effectiveness of AI/ML-based SaMD that may change over time as they are applied to new data? The U.S. Food and Drug Administration (FDA), for example, has recently proposed a discussion paper to address some of these issues. But it misses an important point: we argue that regulators like the FDA need to widen their scope from evaluating medical AI/ML-based products to assessing systems. This shift in perspective-from a product view to a system view-is central to maximizing the safety and efficacy of AI/ML in health care, but it also poses significant challenges for agencies like the FDA who are used to regulating products, not systems. We offer several suggestions for regulators to make this challenging but important transition.
    Keywords:  Health policy; Law
    DOI:  https://doi.org/10.1038/s41746-020-0262-2
  5. Biomed Eng Online. 2020 Apr 15. 19(1): 20
       INTRODUCTION: This is a systematic review on the main algorithms using machine learning (ML) in retinal image processing for glaucoma diagnosis and detection. ML has proven to be a significant tool for the development of computer aided technology. Furthermore, secondary research has been widely conducted over the years for ophthalmologists. Such aspects indicate the importance of ML in the context of retinal image processing.
    METHODS: The publications that were chosen to compose this review were gathered from Scopus, PubMed, IEEEXplore and Science Direct databases. Then, the papers published between 2014 and 2019 were selected . Researches that used the segmented optic disc method were excluded. Moreover, only the methods which applied the classification process were considered. The systematic analysis was performed in such studies and, thereupon, the results were summarized.
    DISCUSSION: Based on architectures used for ML in retinal image processing, some studies applied feature extraction and dimensionality reduction to detect and isolate important parts of the analyzed image. Differently, other works utilized a deep convolutional network. Based on the evaluated researches, the main difference between the architectures is the number of images demanded for processing and the high computational cost required to use deep learning techniques.
    CONCLUSIONS: All the analyzed publications indicated it was possible to develop an automated system for glaucoma diagnosis. The disease severity and its high occurrence rates justify the researches which have been carried out. Recent computational techniques, such as deep learning, have shown to be promising technologies in fundus imaging. Although such a technique requires an extensive database and high computational costs, the studies show that the data augmentation and transfer learning techniques have been applied as an alternative way to optimize and reduce networks training.
    Keywords:  Classification; Deep learning; Glaucoma; Machine learning; Retinal image processing
    DOI:  https://doi.org/10.1186/s12938-020-00767-2
  6. Bull World Health Organ. 2020 Apr 01. 98(4): 257-262
      Artificial intelligence holds great promise in terms of beneficial, accurate and effective preventive and curative interventions. At the same time, there is also awareness of potential risks and harm that may be caused by unregulated developments of artificial intelligence. Guiding principles are being developed around the world to foster trustworthy development and application of artificial intelligence systems. These guidelines can support developers and governing authorities when making decisions about the use of artificial intelligence. The High-Level Expert Group on Artificial Intelligence set up by the European Commission launched the report Ethical guidelines for trustworthy artificial intelligence in2019. The report aims to contribute to reflections and the discussion on the ethics of artificial intelligence technologies also beyond the countries of the European Union (EU). In this paper, we use the global health sector as a case and argue that the EU's guidance leaves too much room for local, contextualized discretion for it to foster trustworthy artificial intelligence globally. We point to the urgency of shared globalized efforts to safeguard against the potential harms of artificial intelligence technologies in health care.
    DOI:  https://doi.org/10.2471/BLT.19.237289
  7. N Engl J Med. 2020 Apr 14.
    BONSAI Group
       BACKGROUND: Nonophthalmologist physicians do not confidently perform direct ophthalmoscopy. The use of artificial intelligence to detect papilledema and other optic-disk abnormalities from fundus photographs has not been well studied.
    METHODS: We trained, validated, and externally tested a deep-learning system to classify optic disks as being normal or having papilledema or other abnormalities from 15,846 retrospectively collected ocular fundus photographs that had been obtained with pharmacologic pupillary dilation and various digital cameras in persons from multiple ethnic populations. Of these photographs, 14,341 from 19 sites in 11 countries were used for training and validation, and 1505 photographs from 5 other sites were used for external testing. Performance at classifying the optic-disk appearance was evaluated by calculating the area under the receiver-operating-characteristic curve (AUC), sensitivity, and specificity, as compared with a reference standard of clinical diagnoses by neuro-ophthalmologists.
    RESULTS: The training and validation data sets from 6779 patients included 14,341 photographs: 9156 of normal disks, 2148 of disks with papilledema, and 3037 of disks with other abnormalities. The percentage classified as being normal ranged across sites from 9.8 to 100%; the percentage classified as having papilledema ranged across sites from zero to 59.5%. In the validation set, the system discriminated disks with papilledema from normal disks and disks with nonpapilledema abnormalities with an AUC of 0.99 (95% confidence interval [CI], 0.98 to 0.99) and normal from abnormal disks with an AUC of 0.99 (95% CI, 0.99 to 0.99). In the external-testing data set of 1505 photographs, the system had an AUC for the detection of papilledema of 0.96 (95% CI, 0.95 to 0.97), a sensitivity of 96.4% (95% CI, 93.9 to 98.3), and a specificity of 84.7% (95% CI, 82.3 to 87.1).
    CONCLUSIONS: A deep-learning system using fundus photographs with pharmacologically dilated pupils differentiated among optic disks with papilledema, normal disks, and disks with nonpapilledema abnormalities. (Funded by the Singapore National Medical Research Council and the SingHealth Duke-NUS Ophthalmology and Visual Sciences Academic Clinical Program.).
    DOI:  https://doi.org/10.1056/NEJMoa1917130
  8. PLoS One. 2020 ;15(4): e0231468
      We present a case study for implementing a machine learning algorithm with an incremental value framework in the domain of lung cancer research. Machine learning methods have often been shown to be competitive with prediction models in some domains; however, implementation of these methods is in early development. Often these methods are only directly compared to existing methods; here we present a framework for assessing the value of a machine learning model by assessing the incremental value. We developed a machine learning model to identify and classify lung nodules and assessed the incremental value added to existing risk prediction models. Multiple external datasets were used for validation. We found that our image model, trained on a dataset from The Cancer Imaging Archive (TCIA), improves upon existing models that are restricted to patient characteristics, but it was inconclusive about whether it improves on models that consider nodule features. Another interesting finding is the variable performance on different datasets, suggesting population generalization with machine learning models may be more challenging than is often considered.
    DOI:  https://doi.org/10.1371/journal.pone.0231468
  9. Bull World Health Organ. 2020 Apr 01. 98(4): 251-256
      The prospect of patient harm caused by the decisions made by an artificial intelligence-based clinical tool is something to which current practices of accountability and safety worldwide have not yet adjusted. We focus on two aspects of clinical artificial intelligence used for decision-making: moral accountability for harm to patients; and safety assurance to protect patients against such harm. Artificial intelligence-based tools are challenging the standard clinical practices of assigning blame and assuring safety. Human clinicians and safety engineers have weaker control over the decisions reached by artificial intelligence systems and less knowledge and understanding of precisely how the artificial intelligence systems reach their decisions. We illustrate this analysis by applying it to an example of an artificial intelligence-based system developed for use in the treatment of sepsis. The paper ends with practical suggestions for ways forward to mitigate these concerns. We argue for a need to include artificial intelligence developers and systems safety engineers in our assessments of moral accountability for patient harm. Meanwhile, none of the actors in the model robustly fulfil the traditional conditions of moral accountability for the decisions of an artificial intelligence system. We should therefore update our conceptions of moral accountability in this context. We also need to move from a static to a dynamic model of assurance, accepting that considerations of safety are not fully resolvable during the design of the artificial intelligence system before the system has been deployed.
    DOI:  https://doi.org/10.2471/BLT.19.237487
  10. Clin Neurophysiol. 2020 Apr 02. pii: S1388-2457(20)30116-4. [Epub ahead of print]131(6): 1174-1179
       OBJECTIVE: To validate an artificial intelligence-based computer algorithm for detection of epileptiform EEG discharges (EDs) and subsequent identification of patients with epilepsy.
    METHODS: We developed an algorithm for automatic detection of EDs, based on a novel deep learning method that requires a low amount of labeled EEG data for training. Detected EDs are automatically grouped into clusters, consisting of the same type of EDs, for rapid visual inspection. We validated the algorithm on an independent dataset of 100 patients with sharp transients in their EEG recordings (54 with epilepsy and 46 with non-epileptic paroxysmal events). The diagnostic gold standard was derived from the video-EEG recordings of the patients' habitual events.
    RESULTS: The algorithm had a sensitivity of 89% for identifying EEGs with EDs recorded from patients with epilepsy, a specificity of 70%, and an overall accuracy of 80%.
    CONCLUSIONS: Automated detection of EDs using an artificial intelligence-based computer algorithm had a high sensitivity. Human (expert) supervision is still necessary for confirming the clusters of detected EDs and for describing clinical correlations. Further studies on different patient populations will be needed to confirm our results.
    SIGNIFICANCE: The automated algorithm we describe here is a useful tool, assisting neurophysiologist in rapid assessment of EEG recordings.
    Keywords:  Automatic spike detection; Biomarker; Deep learning; EEG; Epilepsy; Interictal epileptiform discharges
    DOI:  https://doi.org/10.1016/j.clinph.2020.02.032
  11. NPJ Digit Med. 2020 ;3 54
      The diagnosis of heart failure can be difficult, even for heart failure specialists. Artificial Intelligence-Clinical Decision Support System (AI-CDSS) has the potential to assist physicians in heart failure diagnosis. The aim of this work was to evaluate the diagnostic accuracy of an AI-CDSS for heart failure. AI-CDSS for cardiology was developed with a hybrid (expert-driven and machine-learning-driven) approach of knowledge acquisition to evolve the knowledge base with heart failure diagnosis. A retrospective cohort of 1198 patients with and without heart failure was used for the development of AI-CDSS (training dataset, n = 600) and to test the performance (test dataset, n = 598). A prospective clinical pilot study of 97 patients with dyspnea was used to assess the diagnostic accuracy of AI-CDSS compared with that of non-heart failure specialists. The concordance rate between AI-CDSS and heart failure specialists was evaluated. In retrospective cohort, the concordance rate was 98.3% in the test dataset. The concordance rate for patients with heart failure with reduced ejection fraction, heart failure with mid-range ejection fraction, heart failure with preserved ejection fraction, and no heart failure was 100%, 100%, 99.6%, and 91.7%, respectively. In a prospective pilot study of 97 patients presenting with dyspnea to the outpatient clinic, 44% had heart failure. The concordance rate between AI-CDSS and heart failure specialists was 98%, whereas that between non-heart failure specialists and heart failure specialists was 76%. In conclusion, AI-CDSS showed a high diagnostic accuracy for heart failure. Therefore, AI-CDSS may be useful for the diagnosis of heart failure, especially when heart failure specialists are not available.
    Keywords:  Heart failure; Outcomes research
    DOI:  https://doi.org/10.1038/s41746-020-0261-3
  12. PLoS One. 2020 ;15(4): e0231172
      Arterial hypotension during the early phase of anesthesia can lead to adverse outcomes such as a prolonged postoperative stay or even death. Predicting hypotension during anesthesia induction is complicated by its diverse causes. We investigated the feasibility of developing a machine-learning model to predict postinduction hypotension. Naïve Bayes, logistic regression, random forest, and artificial neural network models were trained to predict postinduction hypotension, occurring between tracheal intubation and incision, using data for the period from between the start of anesthesia induction and immediately before tracheal intubation obtained from an anesthesia monitor, a drug administration infusion pump, an anesthesia machine, and from patients' demographics, together with preexisting disease information from electronic health records. Among 222 patients, 126 developed postinduction hypotension. The random-forest model showed the best performance, with an area under the receiver operating characteristic curve of 0.842 (95% confidence interval [CI]: 0.736-0.948). This was higher than that for the Naïve Bayes (0.778; 95% CI: 0.65-0.898), logistic regression (0.756; 95% CI: 0.630-0.881), and artificial-neural-network (0.760; 95% CI: 0.640-0.880) models. The most important features affecting the accuracy of machine-learning prediction were a patient's lowest systolic blood pressure, lowest mean blood pressure, and mean systolic blood pressure before tracheal intubation. We found that machine-learning models using data obtained from various anesthesia machines between the start of anesthesia induction and immediately before tracheal intubation can predict hypotension occurring during the period between tracheal intubation and incision.
    DOI:  https://doi.org/10.1371/journal.pone.0231172
  13. Yearb Med Inform. 2020 Apr 17.
       OBJECTIVE: To create practical recommendations for the curation of routinely collected health data and artificial intelligence (AI) in primary care with a focus on ensuring their ethical use.
    METHODS: We defined data curation as the process of management of data throughout its lifecycle to ensure it can be used into the future. We used a literature review and Delphi exercises to capture insights from the Primary Care Informatics Working Group (PCIWG) of the International Medical Informatics Association (IMIA).
    RESULTS: We created six recommendations: (1) Ensure consent and formal process to govern access and sharing throughout the data life cycle; (2) Sustainable data creation/collection requires trust and permission; (3) Pay attention to Extract-Transform-Load (ETL) processes as they may have unrecognised risks; (4) Integrate data governance and data quality management to support clinical practice in integrated care systems; (5) Recognise the need for new processes to address the ethical issues arising from AI in primary care; (6) Apply an ethical framework mapped to the data life cycle, including an assessment of data quality to achieve effective data curation.
    CONCLUSIONS: The ethical use of data needs to be integrated within the curation process, hence running throughout the data lifecycle. Current information systems may not fully detect the risks associated with ETL and AI; they need careful scrutiny. With distributed integrated care systems where data are often used remote from documentation, harmonised data quality assessment, management, and governance is important. These recommendations should help maintain trust and connectedness in contemporary information systems and planned developments.
    DOI:  https://doi.org/10.1055/s-0040-1701980
  14. Dig Endosc. 2020 Apr 13.
       OBJECTIVES: Detecting early gastric cancer is difficult, and it may even be overlooked by experienced endoscopists. Recently, artificial intelligence based on deep learning through convolutional neural networks (CNNs) has enabled significant advancements in the field of gastroenterology. However, it remains unclear whether a CNN can outperform endoscopists. In this study, we evaluated whether the performance of a CNN in detecting early gastric cancer is better than that of endoscopists.
    METHODS: The CNN was constructed using 13,584 endoscopic images from 2,639 lesions of gastric cancer. Subsequently, its diagnostic ability was compared to that of 67 endoscopists using an independent test dataset (2,940 images from 140 cases).
    RESULTS: The average diagnostic time for analyzing 2,940 test endoscopic images by the CNN and endoscopists were 45.5 ± 1.8 s and 173.0 ± 66.0 min, respectively. The sensitivity, specificity, and positive and negative predictive values for the CNN were 58.4%, 87.3%, 26.0%, and 96.5%, respectively. These values for the 67 endoscopists were 31.9%, 97.2%, 46.2%, and 94.9%, respectively. The CNN had a significantly higher sensitivity than the endoscopists (by 26.5%; 95% confidence interval, 14.9-32.5%).
    CONCLUSION: The CNN detected more early gastric cancer cases in a shorter time than the endoscopists. The CNN needs further training to achieve higher diagnostic accuracy. However, a diagnostic support tool for gastric cancer using a CNN will be realized in the near future.
    DOI:  https://doi.org/10.1111/den.13688
  15. Curr Med Imaging. 2020 Apr 15.
       BACKGROUND: Colon cancer generally begins as a neoplastic growth of tissue, called polyps, originating from the inner lining of the colon wall. Most colon polyps are considered harmless but over time, they can evolve into colon cancer, which when diagnosed in later stages is often fatal. Hence, time is of the essence in the early detection of polyps and the prevention of colon cancer.
    METHODS: To aid this endeavour, many computer-aided methods have been developed, which use a wide array of techniques to detect, localize and segment polyps from CT Colonography images. In this paper, a comprehensive state-of-the-art method is proposed and categorize this work broadly using the available classification techniques using Machine Learning and Deep Learning.
    CONCLUSIONS: The performance of each of the proposed approach is analyzed with existing methods and also how they can be used to tackle the timely and accurate detection of colon polyps.
    Keywords:  CNN.; CNN.CT Colonography (CTC); CT Colonography (CTC); Computer Aided Detection (CADe); Deep Learning; Machine Learning (ML); polyps
    DOI:  https://doi.org/10.2174/2213335607999200415141427