bims-librar Biomed News
on Biomedical librarianship
Issue of 2024–08–04
27 papers selected by
Thomas Krichel, Open Library Society



  1. J Med Internet Res. 2024 Jul 31. 26 e58764
      Evidence-based medicine (EBM) emerged from McMaster University in the 1980-1990s, which emphasizes the integration of the best research evidence with clinical expertise and patient values. The Health Information Research Unit (HiRU) was created at McMaster University in 1985 to support EBM. Early on, digital health informatics took the form of teaching clinicians how to search MEDLINE with modems and phone lines. Searching and retrieval of published articles were transformed as electronic platforms provided greater access to clinically relevant studies, systematic reviews, and clinical practice guidelines, with PubMed playing a pivotal role. In the early 2000s, the HiRU introduced Clinical Queries-validated search filters derived from the curated, gold-standard, human-appraised Hedges dataset-to enhance the precision of searches, allowing clinicians to hone their queries based on study design, population, and outcomes. Currently, almost 1 million articles are added to PubMed annually. To filter through this volume of heterogenous publications for clinically important articles, the HiRU team and other researchers have been applying classical machine learning, deep learning, and, increasingly, large language models (LLMs). These approaches are built upon the foundation of gold-standard annotated datasets and humans in the loop for active machine learning. In this viewpoint, we explore the evolution of health informatics in supporting evidence search and retrieval processes over the past 25+ years within the HiRU, including the evolving roles of LLMs and responsible artificial intelligence, as we continue to facilitate the dissemination of knowledge, enabling clinicians to integrate the best available evidence into their clinical practice.
    Keywords:  Boolean; Health Information Research Unit; HiRU; NLP; article; evidence-based; evidence-based medicine; health informatics; health information; information retrieval; journal; natural language processing
    DOI:  https://doi.org/10.2196/58764
  2. Br J Hosp Med (Lond). 2024 Jul 30. 85(7): 1-3
      Academic hospitalists play an integral role in the day-to-day care of hospitalized patients, education and research. They are well-positioned to engage in scholarly and research activities and inform clinical practice. Hospital medicine also offers a compelling career path for those seeking to maintain a broad clinical focus while also pursuing opportunities in quality improvement (QI), clinical research, and medical education (MedEd) projects. Participation in these endeavors not only foster scholarly growth but also enhances career satisfaction for hospitalists. Therefore, there is a need to explore and implement feasible strategies to equip hospitalists with the knowledge and resources necessary to generate scholarship and promote academic growth within the field.
    Keywords:  Academic hospitalist; Faculty development; Mentorship; Promotion; Scholarship
    DOI:  https://doi.org/10.12968/hmed.2024.0323
  3. J Dent. 2024 Jul 25. pii: S0300-5712(24)00428-7. [Epub ahead of print]149 105259
       OBJECTIVES: Artificial intelligence (AI) tools utilizing machine learning (ML) have gained increasing utility in medicine and academia as a means of enhancing efficiency. ASReview is one such AI program designed to streamline the systematic review process through the automated prioritization of relevant articles for screening. This study examined the screening efficiency of ASReview when conducting systematic reviews and the potential factors that could influence its efficiency.
    METHODS: Six distinct topics within the field of periodontics were searched in PubMed and Web of Science to obtain articles for screening within ASReview. Through a "training" process, relevant and irrelevant articles were manually incorporated to develop "prior knowledge" and facilitate ML optimization. Screening was then conducted following ASReview's algorithmically-generated relevance rankings. Screening efficiency was evaluated based on the normalized number of articles not requiring detailed review and on the total time expenditure.
    RESULTS: Across the six topics, an average of 60.2 % of articles did not warrant extensive screening, given that all relevant articles were discovered within the first 39.8 % of publication reviewed. No significant variations in efficiencies were observed with differing methods of assembling prior knowledge articles or via modifications in article ratios and numbers.
    CONCLUSIONS: On average, ASReview conferred a 60.2 % improvement in screening efficiency, largely attributed to its dynamic ML capabilities. While advanced technologies like ASReview promise enhanced efficiencies, the accurate human discernment of article relevancy and quality remains indispensable when training these AI tools.
    CLINICAL SIGNIFICANCE: Using ASReview has the potential to save approximately 60 % of time and effort required for screening articles.
    Keywords:  ASReview; Artificial intelligence; Efficiency; Systematic review
    DOI:  https://doi.org/10.1016/j.jdent.2024.105259
  4. Syst Rev. 2024 Aug 02. 13(1): 206
       BACKGROUND: To describe the algorithm and investigate the efficacy of a novel systematic review automation tool "the Deduplicator" to remove duplicate records from a multi-database systematic review search.
    METHODS: We constructed and tested the efficacy of the Deduplicator tool by using 10 previous Cochrane systematic review search results to compare the Deduplicator's 'balanced' algorithm to a semi-manual EndNote method. Two researchers each performed deduplication on the 10 libraries of search results. For five of those libraries, one researcher used the Deduplicator, while the other performed semi-manual deduplication with EndNote. They then switched methods for the remaining five libraries. In addition to this analysis, comparison between the three different Deduplicator algorithms ('balanced', 'focused' and 'relaxed') was performed on two datasets of previously deduplicated search results.
    RESULTS: Before deduplication, the mean library size for the 10 systematic reviews was 1962 records. When using the Deduplicator, the mean time to deduplicate was 5 min per 1000 records compared to 15 min with EndNote. The mean error rate with Deduplicator was 1.8 errors per 1000 records in comparison to 3.1 with EndNote. Evaluation of the different Deduplicator algorithms found that the 'balanced' algorithm had the highest mean F1 score of 0.9647. The 'focused' algorithm had the highest mean accuracy of 0.9798 and the highest recall of 0.9757. The 'relaxed' algorithm had the highest mean precision of 0.9896.
    CONCLUSIONS: This demonstrates that using the Deduplicator for duplicate record detection reduces the time taken to deduplicate, while maintaining or improving accuracy compared to using a semi-manual EndNote method. However, further research should be performed comparing more deduplication methods to establish relative performance of the Deduplicator against other deduplication methods.
    Keywords:  Automatic; Deduplication; Duplicate article; Duplicate record; Searching; Systematic review
    DOI:  https://doi.org/10.1186/s13643-024-02619-9
  5. JMIR Med Inform. 2024 Jul 31. 12 e54345
       BACKGROUND: Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI chatbots was found to have varying degrees of hallucination in content and references. Such hallucinations generate doubts about their output and their implementation.
    OBJECTIVE: The aim of our study was to propose a reference hallucination score (RHS) to evaluate the authenticity of AI chatbots' citations.
    METHODS: Six AI chatbots were challenged with the same 10 medical prompts, requesting 10 references per prompt. The RHS is composed of 6 bibliographic items and the reference's relevance to prompts' keywords. RHS was calculated for each reference, prompt, and type of prompt (basic vs complex). The average RHS was calculated for each AI chatbot and compared across the different types of prompts and AI chatbots.
    RESULTS: Bard failed to generate any references. ChatGPT 3.5 and Bing generated the highest RHS (score=11), while Elicit and SciSpace generated the lowest RHS (score=1), and Perplexity generated a middle RHS (score=7). The highest degree of hallucination was observed for reference relevancy to the prompt keywords (308/500, 61.6%), while the lowest was for reference titles (169/500, 33.8%). ChatGPT and Bing had comparable RHS (β coefficient=-0.069; P=.32), while Perplexity had significantly lower RHS than ChatGPT (β coefficient=-0.345; P<.001). AI chatbots generally had significantly higher RHS when prompted with scenarios or complex format prompts (β coefficient=0.486; P<.001).
    CONCLUSIONS: The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots' RHS could contribute to ongoing efforts to enhance AI's general reliability in medical research.
    Keywords:  Bing; ChatGPT; Elicit; Perplexity; SciSpace; artificial intelligence (AI) chatbots; bibliographic verification; reference hallucination
    DOI:  https://doi.org/10.2196/54345
  6. South Med J. 2024 Aug;117(8): 467-473
       OBJECTIVES: Our aim was to compare the usability and reliability of answers to clinical questions posed of Chat-Generative Pre-Trained Transformer (ChatGPT) compared to those of a human-authored Web source (www.Pearls4Peers.com) in response to "real-world" clinical questions raised during the care of patients.
    METHODS: Two domains of clinical information quality were studied: usability, based on organization/readability, relevance, and usefulness, and reliability, based on clarity, accuracy, and thoroughness. The top 36 most viewed real-world questions from a human-authored Web site (www.Pearls4Peers.com [P4P]) were posed to ChatGPT 3.5. Anonymized answers by ChatGPT and P4P (without literature citations) were separately assessed for usability by 18 practicing physicians ("clinician users") in triplicate and for reliability by 21 expert providers ("content experts") on a Likert scale ("definitely yes," "generally yes," or "no") in duplicate or triplicate. Participants also directly compared the usability and reliability of paired answers.
    RESULTS: The usability and reliability of ChatGPT answers varied widely depending on the question posed. ChatGPT answers were not considered useful or accurate in 13.9% and 13.1% of cases, respectively. In within-individual rankings for usability, ChatGPT was inferior to P4P in organization/readability, relevance, and usefulness in 29.6%, 28.3%, and 29.6% of cases, respectively, and for reliability, inferior to P4P in clarity, accuracy, and thoroughness in 38.1%, 34.5%, and 31% of cases, respectively.
    CONCLUSIONS: The quality of ChatGPT responses to real-world clinical questions varied widely, with nearly one-third or more answers considered inferior to a human-authored source in several aspects of usability and reliability. Caution is advised when using ChatGPT in clinical decision making.
    DOI:  https://doi.org/10.14423/SMJ.0000000000001715
  7. World J Urol. 2024 Jul 29. 42(1): 455
       PURPOSE: Large language models (LLMs) are a form of artificial intelligence (AI) that uses deep learning techniques to understand, summarize and generate content. The potential benefits of LLMs in healthcare is predicted to be immense. The objective of this study was to examine the quality of patient information leaflets (PILs) produced by 3 LLMs on urological topics.
    METHODS: Prompts were created to generate PILs from 3 LLMs: ChatGPT-4, PaLM 2 (Google Bard) and Llama 2 (Meta) across four urology topics (circumcision, nephrectomy, overactive bladder syndrome, and transurethral resection of the prostate). PILs were evaluated using a quality assessment checklist. PIL readability was assessed by the Average Reading Level Consensus Calculator.
    RESULTS: PILs generated by PaLM 2 had the highest overall average quality score (3.58), followed by Llama 2 (3.34) and ChatGPT-4 (3.08). PaLM 2 generated PILs were of the highest quality in all topics except TURP and was the only LLM to include images. Medical inaccuracies were present in all generated content including instances of significant error. Readability analysis identified PaLM 2 generated PILs as the simplest (age 14-15 average reading level). Llama 2 PILs were the most difficult (age 16-17 average).
    CONCLUSION: While LLMs can generate PILs that may help reduce healthcare professional workload, generated content requires clinician input for accuracy and inclusion of health literacy aids, such as images. LLM-generated PILs were above the average reading level for adults, necessitating improvement in LLM algorithms and/or prompt design. How satisfied patients are to LLM-generated PILs remains to be evaluated.
    Keywords:  Artificial intelligence (AI); ChatGPT; Google bard; Large language model (LLM); Patient education; Patient information leaflet
    DOI:  https://doi.org/10.1007/s00345-024-05146-3
  8. Cleft Palate Craniofac J. 2024 Aug 01. 10556656241266368
       INTRODUCTION: The application of artificial intelligence (AI) in healthcare has expanded in recent years, and these tools such as ChatGPT to generate patient-facing information have garnered particular interest. Online cleft lip and palate (CL/P) surgical information supplied by academic/professional (A/P) sources was therefore evaluated against ChatGPT regarding accuracy, comprehensiveness, and clarity.
    METHODS: 11 plastic and reconstructive surgeons and 29 non-medical individuals blindly compared responses written by ChatGPT or A/P sources to 30 frequently asked CL/P surgery questions. Surgeons indicated preference, determined accuracy, and scored comprehensiveness and clarity. Non-medical individuals indicated preference. Calculations of readability scores were determined using seven readability formulas. Statistical analysis of CL/P surgical online information was performed using paired t-tests.
    RESULTS: Surgeons, 60.88% of the time, blindly preferred material generated by ChatGPT over A/P sources. Additionally, surgeons consistently indicated that ChatGPT-generated material was more comprehensive and had greater clarity. No significant difference was found between ChatGPT and resources provided by professional organizations in terms of accuracy. Among individuals with no medical background, ChatGPT-generated materials were preferred 60.46% of the time. For materials from both ChatGPT and A/P sources, readability scores surpassed advised levels for patient proficiency across seven readability formulas.
    CONCLUSION: As the prominence of ChatGPT-based language tools rises in the healthcare space, potential applications of the tools should be assessed by experts against existing high-quality sources. Our results indicate that ChatGPT is capable of producing high-quality material in terms of accuracy, comprehensiveness, and clarity preferred by both plastic surgeons and individuals with no medical background.
    Keywords:  accuracy; artificial intelligence; clarity; cleft lip and palate; comprehensiveness; online resources; quality; readability
    DOI:  https://doi.org/10.1177/10556656241266368
  9. Cureus. 2024 Jul;16(7): e63580
       BACKGROUND: Low back pain (LBP) is a prevalent healthcare concern that is frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) and GoogleBard's (Google, Mountain View, CA, USA) accuracy in responding to RF-related LBP questions and their capacity to discriminate the severity of the condition.
    METHODS: We created 70 questions on RF-related symptoms and diseases following the LBP guidelines. Among them, 58 had a single symptom (SS), and 12 had multiple symptoms (MS) of LBP. Questions were posed to ChatGPT and GoogleBard, and responses were assessed by two authors for accuracy, completeness, and relevance (ACR) using a 5-point rubric criteria.
    RESULTS: Cohen's kappa values (0.60-0.81) indicated significant agreement among the authors. The average scores for responses ranged from 3.47 to 3.85 for ChatGPT-3.5 and from 3.36 to 3.76 for GoogleBard for 58 SS questions, and from 4.04 to 4.29 for ChatGPT-3.5 and from 3.50 to 3.71 for GoogleBard for 12 MS questions. The ratings for these responses ranged from 'good' to 'excellent'. Most SS responses effectively conveyed the severity of the situation (93.1% for ChatGPT-3.5, 94.8% for GoogleBard), and all MS responses did so. No statistically significant differences were found between ChatGPT-3.5 and GoogleBard scores (p>0.05).
    CONCLUSIONS: In an era characterized by widespread online health information seeking, artificial intelligence (AI) systems play a vital role in delivering precise medical information. These technologies may hold promise in the field of health information if they continue to improve.
    Keywords:  artificial intelligence; chatgpt; googlebard; health information; low back pain; red flags
    DOI:  https://doi.org/10.7759/cureus.63580
  10. Front Oncol. 2024 ;14 1386718
       Background: Many patients use artificial intelligence (AI) chatbots as a rapid source of health information. This raises important questions about the reliability and effectiveness of AI chatbots in delivering accurate and understandable information.
    Purpose: To evaluate and compare the accuracy, conciseness, and readability of responses from OpenAI ChatGPT-4 and Google Bard to patient inquiries concerning the novel 177Lu-PSMA-617 therapy for prostate cancer.
    Materials and methods: Two experts listed the 12 most commonly asked questions by patients on 177Lu-PSMA-617 therapy. These twelve questions were prompted to OpenAI ChatGPT-4 and Google Bard. AI-generated responses were distributed using an online survey platform (Qualtrics) and blindly rated by eight experts. The performances of the AI chatbots were evaluated and compared across three domains: accuracy, conciseness, and readability. Additionally, potential safety concerns associated with AI-generated answers were also examined. The Mann-Whitney U and chi-square tests were utilized to compare the performances of AI chatbots.
    Results: Eight experts participated in the survey, evaluating 12 AI-generated responses across the three domains of accuracy, conciseness, and readability, resulting in 96 assessments (12 responses x 8 experts) for each domain per chatbot. ChatGPT-4 provided more accurate answers than Bard (2.95 ± 0.671 vs 2.73 ± 0.732, p=0.027). Bard's responses had better readability than ChatGPT-4 (2.79 ± 0.408 vs 2.94 ± 0.243, p=0.003). Both ChatGPT-4 and Bard achieved comparable conciseness scores (3.14 ± 0.659 vs 3.11 ± 0.679, p=0.798). Experts categorized the AI-generated responses as incorrect or partially correct at a rate of 16.6% for ChatGPT-4 and 29.1% for Bard. Bard's answers contained significantly more misleading information than those of ChatGPT-4 (p = 0.039).
    Conclusion: AI chatbots have gained significant attention, and their performance is continuously improving. Nonetheless, these technologies still need further improvements to be considered reliable and credible sources for patients seeking medical information on 177Lu-PSMA-617 therapy.
    Keywords:  177 Lu-PSMA-617 therapy; Bard; ChatGPT; artificial intelligence; chatbot; information literacy; machine learning; prostate cancer
    DOI:  https://doi.org/10.3389/fonc.2024.1386718
  11. Indian J Anaesth. 2024 Jul;68(7): 631-636
       Background and Aims: Artificial intelligence (AI) chatbots like Conversational Generative Pre-trained Transformer (ChatGPT) have recently created much buzz, especially regarding patient education. Such informed patients understand and adhere to the management and get involved in shared decision making. The accuracy and understandability of the generated educational material are prime concerns. Thus, we compared ChatGPT with traditional patient information leaflets (PILs) about chronic pain medications.
    Methods: Patients' frequently asked questions were generated from PILs available on the official websites of the British Pain Society (BPS) and the Faculty of Pain Medicine. Eight blinded annexures were prepared for evaluation, consisting of traditional PILs from the BPS and AI-generated patient information materials structured similar to PILs by ChatGPT. The authors performed a comparative analysis to assess materials' readability, emotional tone, accuracy, actionability, and understandability. Readability was measured using Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Grade Level (FKGL). Sentiment analysis determined emotional tone. An expert panel evaluated accuracy and completeness. Actionability and understandability were assessed with the Patient Education Materials Assessment Tool.
    Results: Traditional PILs generally exhibited higher readability (P values < 0.05), with [mean (standard deviation)] FRE [62.25 (1.6) versus 48 (3.7)], GFI [11.85 (0.9) versus 13.65 (0.7)], and FKGL [8.33 (0.5) versus 10.23 (0.5)] but varied emotional tones, often negative, compared to more positive sentiments in ChatGPT-generated texts. Accuracy and completeness did not significantly differ between the two. Actionability and understandability scores were comparable.
    Conclusion: While AI chatbots offer efficient information delivery, ensuring accuracy and readability, patient-centeredness remains crucial. It is imperative to balance innovation with evidence-based practice.
    Keywords:  AI; ChatGPT; analgesia; artificial intelligence; chronic pain; medication adherence; patient education; readability
    DOI:  https://doi.org/10.4103/ija.ija_204_24
  12. Abdom Radiol (NY). 2024 Aug 01.
       PURPOSE: To assess the accuracy, reliability, and readability of publicly available large language models in answering fundamental questions on hepatocellular carcinoma diagnosis and management.
    METHODS: Twenty questions on liver cancer diagnosis and management were asked in triplicate to ChatGPT-3.5 (OpenAI), Gemini (Google), and Bing (Microsoft). Responses were assessed by six fellowship-trained physicians from three academic liver transplant centers who actively diagnose and/or treat liver cancer. Responses were categorized as accurate (score 1; all information is true and relevant), inadequate (score 0; all information is true, but does not fully answer the question or provides irrelevant information), or inaccurate (score - 1; any information is false). Means with standard deviations were recorded. Responses were considered as a whole accurate if mean score was > 0 and reliable if mean score was > 0 across all responses for the single question. Responses were also quantified for readability using the Flesch Reading Ease Score and Flesch-Kincaid Grade Level. Readability and accuracy across 60 responses were compared using one-way ANOVAs with Tukey's multiple comparison tests.
    RESULTS: Of the twenty questions, ChatGPT answered nine (45%), Gemini answered 12 (60%), and Bing answered six (30%) questions accurately; however, only six (30%), eight (40%), and three (15%), respectively, were both accurate and reliable. There were no significant differences in accuracy between any chatbot. ChatGPT responses were the least readable (mean Flesch Reading Ease Score 29; college graduate), followed by Gemini (30; college) and Bing (40; college; p < 0.001).
    CONCLUSION: Large language models provide complex responses to basic questions on hepatocellular carcinoma diagnosis and management that are seldomly accurate, reliable, or readable.
    Keywords:  Artificial intelligence; Hepatocellular carcinoma; Large language model; Liver cancer
    DOI:  https://doi.org/10.1007/s00261-024-04501-7
  13. Br J Clin Pharmacol. 2024 Aug 03.
      With its increasing popularity, healthcare professionals and patients may use ChatGPT to obtain medication-related information. This study was conducted to assess ChatGPT's ability to provide satisfactory responses (i.e., directly answers the question, accurate, complete and relevant) to medication-related questions posed to an academic drug information service. ChatGPT responses were compared to responses generated by the investigators through the use of traditional resources, and references were evaluated. Thirty-nine questions were entered into ChatGPT; the three most common categories were therapeutics (8; 21%), compounding/formulation (6; 15%) and dosage (5; 13%). Ten (26%) questions were answered satisfactorily by ChatGPT. Of the 29 (74%) questions that were not answered satisfactorily, deficiencies included lack of a direct response (11; 38%), lack of accuracy (11; 38%) and/or lack of completeness (12; 41%). References were included with eight (29%) responses; each included fabricated references. Presently, healthcare professionals and consumers should be cautioned against using ChatGPT for medication-related information.
    Keywords:  ChatGPT; generative artificial intelligence; medication information
    DOI:  https://doi.org/10.1111/bcp.16212
  14. J Stomatol Oral Maxillofac Surg. 2024 Jul 26. pii: S2468-7855(24)00225-8. [Epub ahead of print] 101979
       OBJECTIVE: This study aims to evaluate the capacity of ChatGPT-4o to generate new systematic review ideas in the field of oral and maxillofacial surgery. The data obtained from this study will provide evidence-based information to oral and maxillofacial surgeons regarding the academic use of GPT-4o.
    MATERIALS AND METHODS: ChatGPT-4o was asked to provide four previously unpublished systematic review ideas each for the topics of impacted third molars, dental implants, orthognathic surgery, and temporomandibular disorders. A literature search was conducted in the PubMed database to check if the ideas generated by GPT-4o had been previously published, and the search results were compared with the ideas generated by the AI.
    RESULTS: The PubMed database search resulted in a total of 871 publications, with 37 publications found to be related to the topics generated by GPT-4o after the first and second screening. Out of the 16 publication ideas generated by GPT-4o, 9 (56.25 %) were determined to be previously unexplored according to the PubMed database search. There was no statistically significant relationship between the presence of ChatGPT's suggestions in PubMed and the subject areas of the studies.
    CONCLUSION: ChatGPT-4o has a high potential to be used as a valuable tool for suggesting systematic review topics in oral and maxillofacial surgery. Additionally, this tool can assist researchers not only in proposing publication ideas but also in developing the methodology of the study.
    Keywords:  Artificial intelligence; ChatGPT; Publication idea
    DOI:  https://doi.org/10.1016/j.jormas.2024.101979
  15. J Card Fail. 2024 Jul 31. pii: S1071-9164(24)00264-1. [Epub ahead of print]
      Online education materials are widely used by patients and caregivers to understand the management of complex chronic diseases such as heart failure (HF). Organizations such as the American Medical Association and National Institutes of Health recommend that materials be written at a 6th grade reading level. The current study examined the readability and accessibility of online education materials for patients with HF. Whole page text from each included website was entered into an online readability calculator. Five validated readability indices (Flesch-Kincaid Grade Level, Flesch Reading Ease Scale, Gunning Fog Index, Coleman-Liau Index, and Simple Measure of Gobbledygook (SMOG Index)) were used to evaluate each source. Websites were categorized by source (government, public, and private). The availability of audiovisual accessibility features and content in non-English languages were assessed for each website. Of the 36 online resources analyzed, the median readability level was 9-10th grade by the Flesch-Kincaid Grade Level and college level using the Flesch Reading Ease Scale. The Gunning Fog Index and Coleman-Liau Index both showed median readability scores corresponding to a 12th grade reading level, while the SMOG Index showed a median score corresponding to that of the 9th grade. Only 10 websites (28%) offered information in languages other than English, and none provided comprehensive accessibility features for users with disabilities. Common online educational materials for patients with HF are characterized by a higher readability level than that recommended by the National Institutes of Health and American Medical Association with limited multilingual and accessibility options, potentially limiting the accessibility of resources to patients and caregivers.
    DOI:  https://doi.org/10.1016/j.cardfail.2024.06.015
  16. Otolaryngol Head Neck Surg. 2024 Aug 02.
       OBJECTIVE: This cross-sectional website analysis aimed to determine the readability and quality of English and Spanish websites pertaining to the prevention of noise-induced hearing loss.
    STUDY DESIGN: Cross-sectional website analysis.
    SETTING: Various online search engines.
    METHODS: We queried four popular search engines using the term "noise-induced hearing loss prevention" to reveal the top 50 English and top 50 Spanish websites for data collection. Websites meeting inclusion criteria were stratified based on the presence of a Health on the Net Code certificate (independent assessment of honesty, reliability, and quality). Websites were then independently reviewed by experts using the DISCERN criteria in order to assess information quality. Readability was calculated using the Flesch reading ease score for English and the Fernandez-Huerta formula for Spanish websites.
    RESULTS: Thirty-six English websites and 32 Spanish websites met the inclusion criteria. English websites had significantly lower readability (average = 56.34, SD = 11.17) compared to Spanish websites (average = 61.88, SD = 5.33) (P < .05). Spanish websites (average = 37, SD = 8.47) were also of significantly higher quality than English websites (average = 25.13, SD = 10.11).
    CONCLUSION: This study emphasizes the importance of providing quality and readable materials to patients seeking information about noise-induced hearing loss prevention. All of the English and Spanish websites reviewed were written at a level higher than the American Medical Association-recommended sixth-grade reading level. The study also highlights the need for evidence-based information online provided by experts in our field.
    Keywords:  health disparities; health literacy; noise‐induced hearing loss
    DOI:  https://doi.org/10.1002/ohn.925
  17. JMIR Form Res. 2024 Aug 01. 8 e56594
       BACKGROUND: The development of internet technology has greatly increased the ability of patients with chronic obstructive pulmonary disease (COPD) to obtain health information, giving patients more initiative in the patient-physician decision-making process. However, concerns about the quality of website health information will affect the enthusiasm of patients' website search behavior. Therefore, it is necessary to evaluate the current situation of Chinese internet information on COPD.
    OBJECTIVE: This study aims to evaluate the quality of COPD treatment information on the Chinese internet.
    METHODS: Using the standard disease name "" ("chronic obstructive pulmonary disease" in Chinese) and the commonly used public search terms "" ("COPD") and "" ("emphysema") combined with the keyword "" ("treatment"), we searched the PC client web page of Baidu, Sogou, and 360 search engines and screened the first 50 links of the website from July to August 2021. The language was restricted to Chinese for all the websites. The DISCERN tool was used to evaluate the websites.
    RESULTS: A total of 96 websites were included and analyzed. The mean overall DISCERN score for all websites was 30.4 (SD 10.3; range 17.3-58.7; low quality), no website reached the maximum DISCERN score of 75, and the mean score for each item was 2.0 (SD 0.7; range 1.2-3.9). There were significant differences in mean DISCERN scores between terms, with "chronic obstructive pulmonary disease" having the highest mean score.
    CONCLUSIONS: The quality of COPD information on the Chinese internet is poor, which is mainly reflected in the low reliability and relevance of COPD treatment information, which can easily lead consumers to make inappropriate treatment choices. The term "chronic obstructive pulmonary disease" has the highest DISCERN score among commonly used disease search terms. It is recommended that consumers use standard disease names when searching for website information, as the information obtained is relatively reliable.
    Keywords:  COPD; China; DISCERN; DISCERN instrument; chronic; chronic obstructive pulmonary disease; chronic pulmonary disease; cross-sectional study; evaluation; health information; information quality; internet; pulmonary; pulmonary disease; treatment; website information; websites
    DOI:  https://doi.org/10.2196/56594
  18. J Orthod. 2024 Jul 31. 14653125241264827
       OBJECTIVES: To evaluate the characteristics and content of YouTube™ videos created by patients undergoing orthodontic fixed appliance treatment and to assess the content accuracy of these videos.
    DESIGN: A mixed-methods quantitative and qualitative study.
    DATA SOURCE: YouTube™ webpage.
    METHODS: The term 'braces' was used to search for relevant videos on the YouTube™ webpage between 18 August and 30 August 2020, with no limits imposed regarding how long the video had been available on YouTube™. Videos were included if they were made by patients and were predominantly about patients' experiences during treatment with labial fixed appliances. The main themes/subthemes of the included videos were identified. A checklist was then developed to assess accuracy of the video content for two of the main themes and the videos were assessed against the checklist.
    RESULTS: The video search identified 350 videos, of which 64 were selected as potentially eligible; 41 were subsequently excluded as they related primarily to the bond up/debond experience or had minimal information about orthodontics. This meant that 23 videos were ultimately included for analysis. Six main themes were identified in the videos: problems with fixed appliances, effects of fixed appliances, oral hygiene maintenance, dietary advice, treatment duration/appointment frequency and auxiliaries used with fixed appliances. From the 23 videos, 20 were assessed against the checklist for content accuracy related to two selected themes: oral hygiene maintenance and dietary advice. The majority of videos had low content accuracy scores, indicating that important and relevant content was generally missing.
    CONCLUSION: Several included videos focused on oral hygiene maintenance and dietary advice associated with fixed appliances; however, the content was incomplete and not always accurate. This is concerning to the profession, and it is therefore recommended that clinicians consider collaborating with patients to produce videos that are patient-centred and that also contain accurate information.
    Keywords:  Internet; braces; orthodontic videos; social media
    DOI:  https://doi.org/10.1177/14653125241264827
  19. Anat Sci Educ. 2024 Jul 28.
      In modern medical curricula, embryology is typically taught through lectures, with a few institutions providing tutorials. The use of 3-D videos or animations enables students to study these embryological structures and how they change with time. The aim of this study was to assess the quality of cardiac embryology videos available on YouTube. A systematic literature review regarding the use of YouTube in teaching or learning cardiac embryology identified no papers that examined this specific question, and next, a systematic search of YouTube was performed. A total of 1200 cardiac embryology videos were retrieved using 12 specific search terms, with 370 videos retrieved under two or more search terms and excluded. A further 511 videos were excluded under additional, specific criteria. The remaining 319 videos were evaluated with the YouTube Video Assessment Criteria (UTvAC), with 121 rated as "useful." Videos on YouTube are uploaded with a wide audience in mind, from children to cardiologists, and content control is imperfect. Multiple videos were identified as duplicates of videos from original channels, typically without attribution. While 49 videos showed operations or human material, none contained an ethical statement regarding consent, and only 10 of these included an age restriction or graphical advisory. While there are useful videos for medical students studying cardiac embryology on YouTube, intuitive search strategies will also identify many with irrelevant content and of variable quality. Digital competence and search strategies are not innate skills, so educators should teach students to assess information so as to avoid overload or "filter failure."
    Keywords:  E‐learning; anatomy and medical education; cardiac embryology; educational methodology; embryology; teaching of embryology
    DOI:  https://doi.org/10.1002/ase.2467
  20. J Laparoendosc Adv Surg Tech A. 2024 Aug 02.
      Purpose: This study aims to evaluate the educational quality and appropriateness of laparoscopic radical nephrectomy videos on YouTube using the LAParoscopicsurgery Video EducationalGuidelineS (LAP-VEGaS) criteria. It focuses on understanding the role of online resources in medical education and objectively assessing their quality. Methods: A search was conducted on YouTube™ for "laparoscopic radical nephrectomy" on August 15, 2023, leading to the selection of the first 125 videos. Videos were chosen based on length (over 1 minute), content (laparoscopic radical nephrectomy), language (English), and nonindustry sponsorship. The LAP-VEGaS criteria, encompassing 16 items under five main categories: video introduction, case presentation, procedures, outcomes, and educational content, were used for evaluation, assigning 0 or 1 point per criterion. Results: Out of 100 videos meeting the criteria, they were divided into two groups: personal uploads by expert surgeons (Group-1) and institutional uploads by hospitals and organizations (Group-2). Group-2 videos had longer durations and higher LAP-VEGaS scores. The transperitoneal approach was preferred in 88% of the videos, and 84% were right laparoscopic nephrectomies. Group-2 had significantly higher LAP-VEGaS scores (6.3 ± 2.2) compared with Group-1 (4 ± 2.1) (P < 0,001). The number of videos published over the years increased, while LAP-VEGaS scores fluctuated. Conclusion: Assessing laparoscopic radical nephrectomy videos on YouTube™ using the LAP-VEGaS criteria helped understand the role of online sources in medical education. Institutional uploads were found to be more successful in educational aspects, emphasizing the need for continuous quality review of online medical education materials. This study also guides how to evaluate and improve medical education materials on online platforms.
    Keywords:  YouTube™; laparoscopy; online; radical nephrectomy
    DOI:  https://doi.org/10.1089/lap.2024.0175
  21. Niger J Clin Pract. 2024 Jul 01. 27(7): 886-890
       BACGROUND: In the realm of healthcare, particularly after the COVID-19 pandemic, there is a rising trend of sharing videos on YouTube. The increased popularity of these videos among Internet users can be attributed to the captivating nature of visual and auditory data compared to written information.
    AIM: This study aims to assess the content, accuracy, reliability, and quality of YouTube videos focusing on defibrillation applications-a critical component of cardiopulmonary resuscitation (CPR).
    METHODS: On October 17, 2022, a video search was conducted using the keyword "defibrillation" on the YouTube platform, sorted in order of interest. Various parameters, including views, view rate, duration, comments, total likes and dislikes, target population, JAMA, DISCERN, and GQS scores, were recorded. In addition, content information was evaluated by Emergency Medicine specialists.
    RESULTS: The average video duration was 263.95 seconds, with an average of 90,574.6 views, 587.4 likes, and 19.1 comments. The mean DISCERN score was 35.9 (poor), modified DISCERN score was 1.7, GQS score was 2.7, and JAMA score was 2. The mean score regarding the scope and detail of information in the videos was calculated as 6.1.
    CONCLUSIONS: Deficiencies in the accuracy and reliability of Internet information were observed, mirroring the findings in our study. Supervision in this regard was often found to be inadequate. We advocate for the evaluation of video appropriateness before sharing on the Internet. We believe that platforms ensuring easy access to accurate information about crucial interventions such as CPR will significantly contribute to improving health literacy.
    DOI:  https://doi.org/10.4103/njcp.njcp_68_24
  22. J Asthma. 2024 Jul 27. 1-18
      This study aims to analyze the quality, reliability, and content of YouTube videos on pediatric asthma inhaler techniques both for parents and children. The study has a descriptive, retrospective, and cross-sectional design. The research was conducted by searching YouTube using the "Pediatric Metered Dose Inhaler," "Pediatric Accuhaler," and "Pediatric Diskus." The video's popularity was measured using the Video Power Index. The quality and reliability of the videos were evaluated using the modified DISCERN and Global Quality Scale (GQS). This study analyzed 55 YouTube videos on the pediatric inhaler technique. In total, 19 of the videos were related to the pressurized metered dose inhalers (pMDI) with a spacer for tidal breathing, 14 pMDI inhalers with a spacer for single breath, and 22 diskus devices. Findings show that videos demonstrating the use of pMDI devices for single breath have more reliable modified DISCERN scores. However, videos related to tidal breathing are more popular than those showing the use of diskus devices and pMDI single breath. Based on the checklist for videos on diskus devices, the steps with the highest error rates are 'Check dose counter' at 72.7% and 'Breathe out gently, away from the inhaler' at 63.6%. A moderate correlation was observed between the modified DISCERN score and the GQS. While YouTube videos on the pMDI single-breath technique may be useful for pediatric patients and caregivers, it is crucial for them to receive inhaler technique education from their healthcare provider. This study's findings hold great significance for pediatric patients and caregivers, particularly those who rely on YouTube for health-related information.
    Keywords:  YouTube; asthma; inhaler; pediatric
    DOI:  https://doi.org/10.1080/02770903.2024.2385981
  23. AJOG Glob Rep. 2024 Aug;4(3): 100364
       Background: TikTok has increasingly become a source of information about reproductive health. Patients seeking health information about oral contraception on TikTok may be influenced by videos containing misinformation or biased information.
    Objective: This social media infodemiological study aims to provide a descriptive content analysis of the quality and reliability of oral contraceptive health information on TikTok.
    Study Design: Researchers screened 1,000 TikTok videos from December 2022 to March 2023 retrieved under various search terms related to oral contraceptives. Data, including engagement metrics such as views, likes, comments, saves, and shares, were recorded. Video content including contraceptive methods discussed, efficacy, tolerability, and side effects were recorded. Two reviewers independently used a modified DISCERN criteria and Global Quality Scale (GQS) to assess the quality and reliability of information for each video.
    Results: Five hundred seventy-four videos were analyzed after applying exclusion criteria. Videos had a median length of 27 seconds (Q1=13sec, Q3=57sec) and received a median of 35,000 total views (Q1=4856 views, Q3=411,400 views) and 166 views per day (Q1=28 views per day, Q3=2021 views per day). Video creators were 83.3% female and 58.7% white. The mean modified DISCERN score was 1.63 (SD=1.06) and the mean GQS score was 2.28 (SD=1.37). Video creators were 83.3% female and 58.7% white. The mean modified DISCERN score was 1.63 (SD=1.06) and the mean GQS score was 2.28 (SD=1.37). The most common topic discussed in the videos was the effects of contraception. Healthcare professionals had significantly higher DISCERN and GQS scores (p<.001) than non-healthcare professionals. However, they received fewer views, likes, and comments on their videos (p<.001). Healthcare professionals were 86 times more likely than non-healthcare professionals to post educational videos (p<.001). However, non-educational content received significantly more views, likes, and comments than educational content (p<.001).
    Conclusion: TikTok videos related to oral contraceptive health had low quality and reliability of information. The majority of videos were made by non-healthcare providers, and the most common topic discussed was the effects of contraception. Videos made by healthcare professionals contained more reliable contraceptive information, but received less engagement than videos made by non-healthcare professionals. Healthcare providers should consider the prevalence of poor-quality information about oral contraceptives on social media when counseling and educating patients about reproductive health.
    Keywords:  adolescent; birth control; contraception; contraceptive pill; infodemiology; misinformation; online content; oral contraceptives; reproductive health; sex education; social media; social media analytics
    DOI:  https://doi.org/10.1016/j.xagr.2024.100364
  24. Pediatr Nephrol. 2024 Jul 30.
       BACKGROUND: Social media platforms such as TikTok™ are key sources of health information for young patients and caregivers. Misinformation is prevalent on TikTok™ across healthcare fields, which can perpetuate false beliefs about medical care. Limited data exists on the reliability of pediatric nephrology TikTok™ content. This study aimed to describe the quality of medical content of TikTok™ Videos (TTVs), related to pediatric kidney disease and transplant.
    METHODS: TTVs were selected using specific search terms and categorized into pediatric kidney disease and kidney transplant, excluding duplicate and adult-related content. The top 100 TTVs in each category, based on views, were analyzed. TTV characteristics were stratified by account type (physician, non-physician healthcare professional (HCP), non-HCP) and video aim (personal story, education, entertainment). DISCERN scoring, a validated questionnaire evaluating health information reliability, was conducted by 4 independent raters. Inter-rater reliability was assessed using a 2-way random effects model, and differences between content creator types were evaluated using one-way ANOVA and post-Hoc Tukey test.
    RESULTS: TTVs had a total of 12.5 million likes and 113.1 million views. Over 70% of videos were created by non-HCPs (n = 147/200). DISCERN scoring revealed low reliability of medical information across content creator types. TTVs created by physicians and non-physician HCPs about kidney disease had significantly higher mean DISCERN scores compared to those created by non-HCPs (2.85, p < 0.001 and 2.48, p = 0.005, respectively).
    CONCLUSIONS: Educators within the pediatric nephrology community must keep in mind the lack of reliability of medical information available on TikTok™ and coordinate collective efforts to consider utilizing TikTok™ for patient education.
    Keywords:  Family-centered care; Patient engagement; Pediatric chronic kidney disease; Pediatric kidney transplant; Social media
    DOI:  https://doi.org/10.1007/s00467-024-06462-x
  25. Facial Plast Surg Aesthet Med. 2024 Aug 02.
      Background: With the rising popularity of online search tools, patients seeking information on facial palsy are increasingly turning to the Internet for medical knowledge. Objective: To categorize the most common online questions about Bell's palsy or facial paralysis and the sources that provide answers to those queries. Methods: Query volumes for terms pertaining to facial palsy were obtained using Google Search trends. The top 40 keywords associated with the terms "Bell's palsy" and "facial paralysis" were extracted. People Also Ask (PAA) Questions-a Google search engine response page feature-were used to identify the top questions associated with each keyword. Results: A total of 151 PAA Questions pertaining to the top 40 keywords associated with "Bell's palsy" and "facial paralysis" were identified. Etiology questions were most frequent (n = 50, 33.1%), meanwhile those pertaining to treatment were most accessible (119.5 average search engine response pages/question, 35.5%). Most sources were academic (n = 81, 53.6%). Medical practice group sites were most accessible (211.9 average search engine response pages/website, 44.8%). Conclusion: Most PAA questions pertained to etiology and were sourced by academic sites. Questions regarding treatment and medical practice sites appeared on more search engine response pages when compared with all other categories.
    DOI:  https://doi.org/10.1089/fpsam.2023.0277
  26. BMC Public Health. 2024 Jul 30. 24(1): 2054
       BACKGROUND: Health information consumers can acquire knowledge regarding health problems, combat health problems, make health-related decisions, and change their behaviour by conducting health information searches. This study aims to identify the sociodemographic and economic factors affecting individuals' search for health information on the internet before and during COVID-19.
    METHODS: In this study, micro data sets of the Household Information Technologies (IT) Usage Survey conducted by the Turkish Statistical Institute in 2018 and 2021 were used. The binary logistic regression analysis was also used in the study.
    RESULTS: It was determined that age, gender, education level, occupation, social media use, searching for information about goods and services, internet banking use, e-government use, having a desktop computer, having a tablet computer, and region variables were associated with the status of searching for health information on the internet during the COVID-19 period.
    CONCLUSION: The main reasons for the increase in health information searches during the COVID-19 epidemic can be attributed to several key factors, such as society's need for information and meeting its need for information, access to up-to-date health data and increased trust in official sources. The study's findings serve as a valuable resource for health service providers and information sources attempting to identify the health information-seeking behaviour of the public and to meet their needs in this context.
    Keywords:  Binary logistic regression; COVID-19; Health Information Search; Pandemic; Türkiye
    DOI:  https://doi.org/10.1186/s12889-024-19546-y