bims-librar Biomed News
on Biomedical librarianship
Issue of 2024–11–17
thirty-two papers selected by
Thomas Krichel, Open Library Society



  1. Bioinformatics. 2024 Nov 09. pii: btae672. [Epub ahead of print]
       SUMMARY: Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.
    AVAILABILITY AND IMPLEMENTATION: The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btae672
  2. BMC Med Res Methodol. 2024 Nov 09. 24(1): 271
       BACKGROUND: Systematic reviews (SRs) are used to inform clinical practice guidelines and healthcare decision making by synthesising the results of primary studies. Efficiently retrieving as many relevant SRs as possible is challenging with a minimum number of databases, as there is currently no guidance on how to do this optimally. In a previous study, we determined which individual databases contain the most SRs, and which combination of databases retrieved the most SRs. In this study, we aimed to validate those previous results by using a different, larger, and more recent set of SRs.
    METHODS: We obtained a set of 100 Overviews of Reviews that included a total of 2276 SRs. SR inclusion was assessed in MEDLINE, Embase, and Epistemonikos. The mean inclusion rates (% of included SRs) and corresponding 95% confidence intervals were calculated for each database individually, as well as for combinations of MEDLINE with each other database and reference checking. Features of SRs not identified by the best database combination were reviewed qualitatively.
    RESULTS: Inclusion rates of SRs were similar in all three databases (mean inclusion rates in % with 95% confidence intervals: 94.3 [93.9-94.8] for MEDLINE, 94.4 [94.0-94.9] for Embase, and 94.4 [93.9-94.9] for Epistemonikos). Adding reference checking to MEDLINE increased the inclusion rate to 95.5 [95.1-96.0]. The best combination of two databases plus reference checking consisted of MEDLINE and Epistemonikos (98.1 [97.7-98.5]). Among the 44/2276 SRs not identified by this combination, 34 were published in journals from China, four were other journal publications, three were health agency reports, two were dissertations, and one was a preprint. When discounting the journal publications from China, the SR inclusion rate in the recommended combination (MEDLINE, Epistemonikos and reference checking) was even higher than in the previous study (99.6 vs. 99.2%).
    CONCLUSIONS: A combination of databases and reference checking was the best approach to searching for biomedical SRs. MEDLINE and Epistemonikos, complemented by checking the references of the included studies, was the most efficient and produced the highest recall. However, our results point to the presence of geographical bias, because some publications in journals from China were not identified.
    STUDY REGISTRATION: https://doi.org/10.17605/OSF.IO/R5EAS (Open Science Framework).
    Keywords:  Databases; Evidence synthesis; Geographical bias; Information specialist; Overview of review; Review methods; Search strategy; Systematic reviews; Umbrella review
    DOI:  https://doi.org/10.1186/s12874-024-02384-2
  3. Med Teach. 2024 Nov 14. 1-8
       INTRODUCTION: Bibliographic databases are essential research tools. In medicine, key databases are MEDLINE/PubMed, Embase, and Cochrane Central (MEC). In education, the Education Resource Information Center (ERIC) is a major database. Medical education, situated between medicine and education, has no dedicated database of its own. Many medical education researchers use MEC, some use ERIC and some do not.
    METHODS: We performed a descriptive analysis using search strategies to retrieve medical education references from MEC and ERIC. ERIC references which were duplicates with MEC references were removed. Unique ERIC references were tallied.
    RESULTS: Between 1977 and 2022, MEC has 359,354 unique references relevant to medical education. ERIC provided 3925 unique references for the same period, all of which would be missed by searching only MEC. The mean unique ERIC medical education references per year for all 46 years is 85 (SD = ±29), or 119 (SD = ±15) for the last 10 years from 2013 to 2022.
    CONCLUSION: ERIC consistently offered a small yet significant number of unique references relevant to medical education for decades. We recommend the use of ERIC for medical education research when comprehensive literature searches are required, such as in systematic reviews, scoping reviews, evidence synthesis, or guideline development.
    Keywords:  Medical education research; medicine; methods; profession; teaching and learning
    DOI:  https://doi.org/10.1080/0142159X.2024.2422003
  4. Nucleic Acids Res. 2024 Nov 11. pii: gkae979. [Epub ahead of print]
      The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence repository and the PubMed® repository of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 31 distinct repositories and knowledgebases. The E-utilities serve as the programming interface for most of these. Resources receiving significant updates in the past year include PubMed, PubMed Central, Bookshelf, the NIH Comparative Genomics Resource, BLAST, Sequence Read Archive, Taxonomy, iCn3D, Conserved Domain Database, Pathogen Detection, antimicrobial resistance resources and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
    DOI:  https://doi.org/10.1093/nar/gkae979
  5. Clin Exp Optom. 2024 Nov 13. 1-7
       CLINICAL RELEVANCE: Clinical skills training is essential in optometry curricula to develop core graduate entry competencies, including self-directed learning to facilitate life-long learning. Efficient and efficacious approaches are required to optimise student and educator time and resources.
    BACKGROUND: A video library of optometric clinical skills was created in 2012 to support self-directed student learning and face-to-face training. Use of videos in higher education generally increased during the COVID-19 pandemic and has remained above pre-pandemic levels. This study aimed to capture and evaluate student access patterns in the library to determine which videos were accessed most, and if this changed with training stage and following the pandemic.
    METHODS: Data on student viewing and critique submission were extracted for 71 videos from a clinical skills video library from 2018 to 2023. The number of videos viewed by students was analysed by year, cohort, video type (gold standard, scripted errors, summary, and student generated) and video category (history, screening, refraction, anterior segment, posterior segment and tonometry).
    RESULTS: First-year students viewed the most videos, and this decreased significantly during and following the pandemic. Overall, the number of videos viewed decreased with increasing course stage. Video access, by category, aligned with the curriculum. Views were highest for gold standard and student videos. Viewing of scripted error videos and submission of critiques of procedural videos was low for all course stages and years.
    CONCLUSION: A web-based video library of optometric clinical skills was used for self-directed learning, mostly by students early in their training. Similar resources developed in the future should align with curriculum and include exemplar and student-based videos. Exploration of student and educator perspectives regarding factors that impact engagement with the online library warrants investigation to facilitate optimal integration in post-pandemic times.
    Keywords:  COVID-19; Clinical; online; student; videos
    DOI:  https://doi.org/10.1080/08164622.2024.2425666
  6. Coll Res Libr. 2024 Nov;85(7): 994-1005
      The Duke University Clinical and Translational Science Institute Community Engaged Research Initiative (CERI) created an e-Library in 2018. This e-Library was developed in response to requests from academic researchers and the community for reliable, easily accessible information about community-engaged research approaches and concepts. It was vetted by internal and external partners. The e-Library's goal is to compile and organize nationally relevant community-engaged research resources to build bi-directional capacity between diverse community collaborators and the academic research community. Key elements of the e-Library's development included a selection of LibGuides as the platform; iterative community input; adaptation during the COVID-19 pandemic; and modification of this resource as needs grow and change.
    DOI:  https://doi.org/10.5860/crl.85.7.994
  7. Health Info Libr J. 2024 Nov 08.
       BACKGROUND: Pakistan is a densely populated South Asian country. It is facing numerous health challenges, as well as problems of the digital divide. The government of Pakistan established e-libraries as a pilot project in 2018. These libraries are functioning through community centers/public libraries in the largest province of the country.
    OBJECTIVE: This paper examines the role of Pakistani e-libraries in creating health awareness and providing health information to the public.
    METHODS: The qualitative research design was based on focus group discussions with the head librarians of all 13 of the 20 e-libraries contacted.
    RESULTS: The findings revealed that e-libraries actively create health-related awareness and connect the public to health advisors. The e-libraries were engaged in four types of health-related activities (seminars, awareness campaigns, open health camps, and special health day celebrations) with high attendance from the public. Attendees of these programs returned to librarians with additional health-related queries.
    CONCLUSIONS: The study suggests a need for more liaison between the community and local healthcare institutions. This approach can make these programs more effective in helping individuals manage their health. The results of this study can serve as a useful guide for other developing nations in developing similar services.
    Keywords:  Asia; developing economies; digital information resources; health disparities; information services; public health; public libraries; south
    DOI:  https://doi.org/10.1111/hir.12554
  8. Autoimmun Rev. 2024 Nov 13. pii: S1568-9972(24)00179-4. [Epub ahead of print]23(12): 103688
      This study focuses on the search strategies used in bibliometric analyses within the field of autoimmune ear diseases, critically examining ways to improve search accuracy and relevance. Using the study by Liu et al. as an example, we found that the extensive search terms employed resulted in the inclusion of numerous irrelevant studies, weakening the specificity of the research findings. To address this issue, we propose a more precise search strategy using a combination of specific terms and wildcard symbols to ensure the search scope focuses on literature related to autoimmune ear diseases. Additionally, we recommend limiting search terms to titles, abstracts, and author keywords to reduce interference from unrelated literature. Moreover, we identify potential errors in keyword analysis caused by unmerged synonyms and suggest optimizing the accuracy of keyword co-occurrence analysis through synonym merging. This study aims to provide a more reliable methodological guide for future bibliometric analyses, thereby improving the quality and scientific rigor of research on autoimmune ear diseases.
    Keywords:  Autoimmune ear disease; Bibliometric analysis; Keyword consolidation; Precision research; Search strategy optimization
    DOI:  https://doi.org/10.1016/j.autrev.2024.103688
  9. Int J Prev Med. 2024 ;15 52
       Background: The diversities of medical information resources and health information needs in recent years have caused another form of summarization and abstract writing to appear in the information capsule (IC) versus various types of abstracts. The present study was conducted with the aim to analyze the current IC, giving a unique definition and a standard structure for the development of an IC, and how it can be represented and implemented.
    Methods: This study was conducted in three phases in the form of a qualitative study. In the first phase, a library review study was done on the relevant websites and international databases, such as PubMed, Science Direct, Web of Sciences, Google Scholar, ProQuest, and Embase. In the second phase, the results of the previous stage were discussed with a panel of experts. In the third phase, a suggested frame for an IC was stated.
    Results: A specific structure was suggested for IC so that, in addition to having parts of similar cases, they had extra parts. The suggested frame includes title, name of the IC writers and reviewers, question or goals, design or methods, setting, patient or community of the study, result, commentary, citation, topic, picture, and tag and can be used in different fields.
    Conclusions: Due to the importance of IC in summarizing information, our suggested structure should be used in other fields and be subject to trial and error.
    Keywords:  Capsule summary; information capsule; informative capsule; medical information; research brief
    DOI:  https://doi.org/10.4103/ijpvm.ijpvm_254_23
  10. Arthrosc Sports Med Rehabil. 2024 Oct;6(5): 100963
       Purpose: To assess the differences in frequently asked questions (FAQs) and responses related to rotator cuff surgery between Google and ChatGPT.
    Methods: Both Google and ChatGPT (version 3.5) were queried for the top 10 FAQs using the search term "rotator cuff repair." Questions were categorized according to Rothwell's classification. In addition to questions and answers for each website, the source that the answer was pulled from was noted and assigned a category (academic, medical practice, etc). Responses were also graded as "excellent response not requiring clarification" (1), "satisfactory requiring minimal clarification" (2), "satisfactory requiring moderate clarification" (3), or "unsatisfactory requiring substantial clarification" (4).
    Results: Overall, 30% of questions were similar between what Google and ChatGPT deemed to be the most FAQs. For questions from Google web search, most answers came from medical practices (40%). For ChatGPT, most answers were provided by academic sources (90%). For numerical questions, ChatGPT and Google provided similar responses for 30% of questions. For most of the questions, both Google and ChatGPT responses were either "excellent" or "satisfactory requiring minimal clarification." Google had 1 response rated as satisfactory requiring moderate clarification, whereas ChatGPT had 2 responses rated as unsatisfactory.
    Conclusions: Both Google and ChatGPT offer mostly excellent or satisfactory responses to the most FAQs regarding rotator cuff repair. However, ChatGPT may provide inaccurate or even fabricated answers and associated citations.
    Clinical Relevance: In general, the quality of online medical content is low. As artificial intelligence develops and becomes more widely used, it is important to assess the quality of the information patients are receiving from this technology.
    DOI:  https://doi.org/10.1016/j.asmr.2024.100963
  11. Prostate. 2024 Nov 08.
       BACKGROUND: Large language model (LLM) chatbots, a form of artificial intelligence (AI) that excels at prompt-based interactions and mimics human conversation, have emerged as a tool for providing patients with information about urologic conditions. We aimed to examine the quality of information related to benign prostatic hyperplasia surgery from four chatbots and how they would respond to sample patient messages.
    METHODS: We identified the top three queries in Google Trends related to "treatment for enlarged prostate." These were entered into ChatGPT (OpenAI), Bard (Google), Bing AI (Microsoft), and Doximity GPT (Doximity), both unprompted and prompted for specific criteria (optimized). The chatbot-provided answers to each query were evaluated for overall quality by three urologists using the DISCERN instrument. Readability was measured with the built-in Flesch-Kincaid reading level tool in Microsoft Word. To assess the ability of chatbots to answer patient questions, we prompted the chatbots with a clinical scenario related to holmium laser enucleation of the prostate, followed by 10 questions that the National Institutes of Health recommends patients ask before surgery. Accuracy and completeness of responses were graded with Likert scales.
    RESULTS: Without prompting, the quality of information was moderate across all chatbots but improved significantly with prompting (mean [SD], 3.3 [1.2] vs. 4.4 [0.7] out of 5; p < 0.001). When answering simulated patient messages, the chatbots were accurate (mean [SD], 5.6 [0.4] out of 6) and complete (mean [SD], 2.8 [0.3] out of 3). Additionally, 98% (39/40) had a median score of 5 or higher for accuracy, which corresponds to "nearly all correct." The readability was poor, with a mean (SD) Flesch-Kincaid reading level grade of 12.1 (1.3) (unprompted).
    CONCLUSIONS: LLM chatbots hold promise for patient education, but their effectiveness is limited by the need for careful prompting from the user and by responding at a reading level higher than that of most Americans (grade 8). Educating patients and physicians on optimal LLM interaction is crucial to unlock the full potential of chatbots.
    Keywords:  ChatGPT; DISCERN; HoLEP; artificial intelligence; large language model; patient education
    DOI:  https://doi.org/10.1002/pros.24814
  12. Int J Paediatr Dent. 2024 Nov 12.
       BACKGROUND: With the increasing popularity of online sources for health information, parents may seek information related to early childhood caries (ECC) from artificial intelligence-based chatbots.
    AIM: The aim of this article was to evaluate the usefulness, quality, reliability, and readability of ChatGPT answers to parents' questions about ECC.
    DESIGN: Eighty questions commonly asked about ECC were compiled from experts and keyword research tools. ChatGPT 3.5 was asked these questions independently. The answers were evaluated by experts in paediatric dentistry.
    RESULTS: ChatGPT provided "very useful" and "useful" responses to 82.5% of the questions. The mean global quality score was 4.3 ± 1 (good quality). The mean reliability score was 18.5 ± 8.9 (average to very good). The mean understandability score was 59.5% ± 13.8 (not highly understandable), and the mean actionability score was 40.5% ± 12.8 (low actionability). The mean Flesch-Kincaid reading ease score was 32% ± 25.7, and the mean Simple Measure of Gobbledygook index readability score was 15.3 ± 9.1(indicating poor readability for the lay person). Misleading and false information were detected in some answers.
    CONCLUSION: ChatGPT has significant potential as a tool for answering parent's questions about ECC. Concerns, however, do exist about the readability and actionability of the answers. The presence of false information should not be overlooked.
    Keywords:  ChatGPT; early childhood caries; parents
    DOI:  https://doi.org/10.1111/ipd.13283
  13. J Clin Med. 2024 Oct 30. pii: 6512. [Epub ahead of print]13(21):
      Background: This study evaluates the ability of six popular chatbots; ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity, to provide reliable answers to questions concerning keratoconus. Methods: Chatbots responses were assessed using mDISCERN (range: 15-75) and Global Quality Score (GQS) (range: 1-5) metrics. Readability was evaluated using nine validated readability assessments. We also addressed the quality and accountability of websites from which the questions originated. Results: We analyzed 20 websites, 65% "Private practice or independent user" and 35% "Official patient education materials". The mean JAMA benchmark score was 1.40 ± 0.91 (0-4 points), indicating low accountability. Reliability, measured using mDISCERN, ranged from 42.91 ± 3.15 (ChatGPT-3.5) to 46.95 ± 3.53 (Copilot). The most frequent question was "What is keratoconus?" with 70% of websites providing relevant information. This received the highest mDISCERN score (49.30 ± 4.91) and a relatively high GQS score (3.40 ± 0.56) with an Automated Readability Level Calculator score of 13.17 ± 2.13. Moderate positive correlations were determined between the website numbers and both mDISCERN (r = 0.265, p = 0.25) and GQS (r = 0.453, p = 0.05) scores. The quality of information, assessed using the GQS, ranged from 3.02 ± 0.55 (ChatGPT-3.5) to 3.31 ± 0.64 (Gemini) (p = 0.34). The differences between the texts were statistically significant. Gemini emerged as the easiest to read, while ChatGPT-3.5 and Perplexity were the most difficult. Based on mDISCERN scores, Gemini and Copilot exhibited the highest percentage of responses in the "good" range (51-62 points). For the GQS, the Gemini model exhibited the highest percentage of responses in the "good" quality range with 40% of its responses scoring 4-5. Conclusions: While all chatbots performed well, Gemini and Copilot showed better reliability and quality. However, their readability often exceeded recommended levels. Continuous improvements are essential to match information with patients' health literacy for effective use in ophthalmology.
    Keywords:  chatbots; keratoconus; large language models
    DOI:  https://doi.org/10.3390/jcm13216512
  14. Cureus. 2024 Oct;16(10): e71105
      Introduction Minimally invasive spine surgery (MISS) has evolved over the last three decades as a less invasive alternative to traditional spine surgery, offering benefits such as smaller incisions, faster recovery, and lower complication rates. With patients frequently seeking information about MISS online, the comprehensibility and accuracy of this information are crucial. Recent studies have shown that much of the online material regarding spine surgery exceeds the recommended readability levels, making it difficult for patients to understand. This study explores the clinical appropriateness and readability of responses generated by Chat Generative Pre-Trained Transformer (ChatGPT) to frequently asked questions (FAQs) about MISS. Methods A set of 15 FAQs was formulated based on clinical expertise and existing literature on MISS. Each question was independently inputted into ChatGPT five times, and the generated responses were evaluated by three neurosurgery attendings for clinical appropriateness. Appropriateness was judged based on accuracy, readability, and patient accessibility. Readability was assessed using seven standardized readability tests, including the Flesch-Kincaid Grade Level and Flesch Reading Ease (FRE) scores. Statistical analysis was performed to compare readability scores across preoperative, postoperative, and intraoperative/technical question categories. Results The mean readability scores for preoperative, postoperative, and intraoperative/technical questions were 15±2.8, 16±3, and 15.7±3.2, respectively, significantly exceeding the recommended sixth- to eighth-grade reading level for patient education (p=0.017). Differences in readability across individual questions were also statistically significant (p<0.001). All responses required a reading level above 11th grade, with a majority indicating college-level comprehension. Although preoperative and postoperative questions generally elicited clinically appropriate responses, 50% of intraoperative/technical questions yielded either "inappropriate" or "unreliable" responses, particularly for inquiries about radiation exposure and the use of lasers in MISS. Conclusions While ChatGPT is proficient in providing clinically appropriate responses to certain FAQs about MISS, it frequently produces responses that exceed the recommended readability level for patient education. This limitation suggests that its utility may be confined to highly educated patients, potentially exacerbating existing disparities in patient comprehension. Future AI-based patient education tools must prioritize clear and accessible communication, with oversight from medical professionals to ensure accuracy and appropriateness. Further research comparing ChatGPT's performance with other AI models could enhance its application in patient education across medical specialties.
    Keywords:  ai; chatgpt; minimally invasive spine surgery; patient education; readability
    DOI:  https://doi.org/10.7759/cureus.71105
  15. JBRA Assist Reprod. 2024 Nov 14.
       OBJECTIVE: To access to reproductive health information on the Internet helps patients understand their infertility journey and make decisions about their treatment. This study aimed to evaluate the quality of fertility clinic websites accredited by the Latin American Network for Assisted Reproduction (REDLARA) using the QUality Evaluation Scoring Tool (QUEST).
    METHODS: This observational, cross-sectional, and online study evaluated the clinic websites registered as accredited centers on the REDLARA website. The QUEST was used for the quality assessment of the websites. Data were collected from the available websites of all accredited fertility clinics between September 2023 and January 2024.
    RESULTS: A total of 173 websites from fertility clinics accredited by REDLARA were evaluated, and 152 (87.8%) clinics had functioning websites. The majority of analyzed websites were from Brazilian fertility clinics (n=58; 38.1%), followed by Mexican (n=23; 15.1%) and Argentine (n=21; 13.8%). No indication of authorship or username was observed on most websites. Some form of support for the patient-physician relationship was reported by 86.8% of websites. The mean (standard deviation±SD) of the total score obtained by all fertility clinics was 12.73±4.7 (range: 1-26). Brazil had the highest total score (mean±SD=16.03±4.6), whereas Peru had the lowest (6.42±1.7). Statistical analysis revealed a difference in the quality of websites among Latin American countries.
    CONCLUSIONS: The health information disseminated by fertility clinic websites in Latin America is of poor quality. Therefore, REDLARA should implement rules for building good-quality websites.
    Keywords:  advertising; assisted reproductive technology; in vitro fertilization; internet; quality; website
    DOI:  https://doi.org/10.5935/1518-0557.20240074
  16. Mhealth. 2024 ;10 28
       Background: The increasing prevalence of irritable bowel syndrome (IBS) in Saudi Arabia has led to a growing interest in understanding how patients seek health information online. While it is known that digital platforms, such as search engines, social media, and artificial intelligence (AI) chatbots, are commonly used for health information seeking, there is limited knowledge about the specific behaviors of IBS patients in this context and how these behaviors correlate with their self-care activities. This study aimed to explore online health information-seeking behavior and its correlation with self-care activities among patients with IBS in Saudi Arabia, focusing on the use of these digital platforms.
    Methods: A cross-sectional survey was conducted at King Khalid University Hospital in Riyadh, Saudi Arabia, from January to July 2023. The survey, available in both English and Arabic, targeted IBS patients aged 16 years or older. The questionnaire covered demographics, general internet usage, online health information-seeking behavior, and IBS knowledge and awareness.
    Results: In this study, 451 IBS patients completed the survey. Notably, 95.1% of participants were internet users, primarily accessing health information through mobile phones and search engines. The results highlighted a significant correlation between online health information-seeking behaviors and self-care practices (P=0.009) like exercise and dietary adjustments, despite a moderate basic knowledge [standard deviation (SD) 2.26%] of IBS. Symptomatically, 93.3% experienced abdominal pain weekly, yet 63% did not fully meet the Rome criteria for IBS. Common management strategies included hydration, diet modifications, and exercise. About 28.4% visited the emergency room (ER) for severe symptoms, and 20% regularly consulted doctors every 3-6 months. Surprisingly, 80% were unaware of the FODMAP (fermentable oligosaccharides, disaccharides, monosaccharides and polyols) diet, often suggested for IBS.
    Conclusions: The research indicates a rise in digital health literacy among IBS patients in Saudi Arabia, highlighting the need for accurate and culturally appropriate online resources. It suggests that healthcare professionals and policymakers should direct patients to reliable information and address the digital divide to enhance self-care and IBS management outcomes.
    Keywords:  Information-seeking behavior; artificial intelligence chatbots (AI chatbots); digital health literacy; irritable bowel syndrome (IBS); self-care
    DOI:  https://doi.org/10.21037/mhealth-24-14
  17. J Am Podiatr Med Assoc. 2024 Sep-Oct;114(5):pii: 21-155. [Epub ahead of print]114(5):
       BACKGROUND: As the incidence of diabetes mellitus increases, the incidence of diabetic foot also increases. This situation, which may lead to devastating complications and progress to limb loss for patients, exposes patients and their social environments to a big crisis. Thus, patients may seek secondary opinions from online sources about information they initially obtained from health institutions. We aimed to evaluate the information content related to diabetic foot on the Internet that is probably used by patients for Internet searching.
    METHODS: After software optimization and reset, related queries with the keyword diabetic foot were determined on Google Trends. Selected keywords were searched in three search engines, and the results were examined. Web sites were classified into five subcategories (nongovernmental health institution, governmental institution, academic, blog, and university) and evaluated with an information content scale (ICS) based on the literature, Journal of the American Medical Association benchmark criteria, the Flesch-Kincaid readability test, and presence of the Health On the Net Foundation Code of Conduct certificate. The search engines, keywords, and Web site subcategories were investigated with the evaluation criteria.
    RESULTS: In terms of finding Web sites eligible for assessment, the Google search engine listed more eligible Web sites than did Bing and Yahoo. Concerning the ICS, there was no significant difference between search engines for total scores (P > .05). Concerning ICS diagnosis and evaluation and ICS total score, academic Web sites scored significantly higher than other subcategories.
    CONCLUSIONS: Results that can be obtained with an Internet search for diabetic foot depend on the proper keyword selection, Web site type, and search engine to help patients reach more appropriate content.
    DOI:  https://doi.org/10.7547/21-155
  18. Urol Pract. 2024 Nov 07. 101097UPJ0000000000000740
       PURPOSE: No consensus exists on performance standards for evaluation of generative artificial intelligence (AI) to generate medical responses. The purpose of this study was the assessment of Chat Generative Pre-training Transformer (ChatGPT) to address medical questions in prostate cancer.
    MATERIALS AND METHODS: A global online survey was conducted April-June 2023 among >700 medical oncologists or urologists who treat patients with prostate cancer. Participants were unaware this was a survey evaluating AI. In component 1, responses to nine questions were written independently by medical writers (MW; from medical websites) and ChatGPT-4.0 (AI-generated from publicly available information). Respondents were randomly exposed and blinded to both AI-generated and MW-curated responses; evaluation criteria and overall preference were recorded. Exploratory component 2 evaluated AI-generated responses to five complex questions with nuanced answers in the medical literature. Responses were evaluated on a 5-point Likert scale. Statistical significance was denoted by P < .05.
    RESULTS: In component 1, respondents (N = 602) consistently preferred the clarity of AI-generated responses over MW-curated responses in 7/9 questions (P < .05). Despite favoring AI-generated responses when blinded to questions/answers, respondents considered medical websites a more credible source (52%-67%) than ChatGPT (14%). Respondents in component 2 (N = 98) also considered medical websites more credible than ChatGPT, but rated AI-generated responses highly for all evaluation criteria, despite nuanced answers in the medical literature.
    CONCLUSIONS: These findings provide insight into how clinicians rate AI-generated and MW-curated responses with evaluation criteria that can be used in future AI validation studies.
    Keywords:  artificial intelligence; medical oncology; proof of concept study; survey and questionnaires; urology
    DOI:  https://doi.org/10.1097/UPJ.0000000000000740
  19. Shoulder Elbow. 2024 Sep 25. 17585732241283971
       Background: ChatGPT is rapidly becoming a source of medical knowledge for patients. This study aims to assess the completeness and accuracy of ChatGPT's answers to the most frequently asked patients' questions about shoulder pathology.
    Methods: ChatGPT (version 3.5) was queried to produce the five most common shoulder pathologies: biceps tendonitis, rotator cuff tears, shoulder arthritis, shoulder dislocation and adhesive capsulitis. Subsequently, it generated the five most common patient questions regarding these pathologies and was queried to respond. Responses were evaluated by three shoulder and elbow fellowship-trained orthopedic surgeons with a mean of 9 years of independent practice, on Likert scales for accuracy (1-6) and completeness (rated 1-3).
    Results: For all questions, responses were deemed acceptable, rated at least "nearly all correct," indicated by a score of 5 or greater for accuracy, and "adequately complete," indicated by a minimum of 2 for completeness. The mean scores for accuracy and completeness, respectively, were 5.5 and 2.6 for rotator cuff tears, 5.8 and 2.7 for shoulder arthritis, 5.5 and 2.3 for shoulder dislocations, 5.1 and 2.4 for adhesive capsulitis, 5.8 and 2.9 for biceps tendonitis.
    Conclusion: ChatGPT provides both accurate and complete responses to the most common patients' questions about shoulder pathology. These findings suggest that Large Language Models might play a role as a patient resource; however, patients should always verify online information with their physician.
    Level of Evidence: Level V Expert Opinion.
    Keywords:  Artificial intelligence; ChatGPT; large language model; machine learning; shoulder
    DOI:  https://doi.org/10.1177/17585732241283971
  20. J Med Syst. 2024 Nov 13. 48(1): 102
      This research evaluates the readability and quality of patient information material about female urinary incontinence (fUI) in ten popular artificial intelligence (AI) supported chatbots. We used the most recent versions of 10 widely-used chatbots, including OpenAI's GPT-4, Claude-3 Sonnet, Grok 1.5, Mistral Large 2, Google Palm 2, Meta's Llama 3, HuggingChat v0.8.4, Microsoft's Copilot, Gemini Advanced, and Perplexity. Prompts were created to generate texts about UI, stress type UI, urge type UI, and mix type UI. The modified Ensuring Quality Information for Patients (EQIP) technique and QUEST (Quality Evaluating Scoring Tool) were used to assess the quality, and the average of 8 well-known readability formulas, which is Average Reading Level Consensus (ARLC), were used to evaluate readability. When comparing the average scores, there were significant differences in the mean mQEIP and QUEST scores across ten chatbots (p = 0.049 and p = 0.018). Gemini received the greatest mean scores for mEQIP and QUEST, whereas Grok had the lowest values. The chatbots exhibited significant differences in mean ARLC, word count, and sentence count (p = 0.047, p = 0.001, and p = 0.001, respectively). For readability, Grok is the easiest to read, while Mistral is highly complex to understand. AI-supported chatbot technology needs to be improved in terms of readability and quality of patient information regarding female UI.
    Keywords:  Artificial Intelligence; Claude; Copilot; Female Urinary Incontinence; GPT-4; Gemini; Google Palm; Grok; Huggingchat; Llama; Mistral; Perplexity
    DOI:  https://doi.org/10.1007/s10916-024-02125-4
  21. Cureus. 2024 Oct;16(10): e71472
       AIM: To enhance outcomes for patients with pulmonary arterial hypertension (PAH), comprehensive and individualized therapy is needed. A large language model called Generative Pre-trained Transformer (ChatGPT) has the ability to provide expert yet patient-friendly care. We wanted to determine how well ChatGPT could accurately and consistently respond to inquiries on knowledge and management for PAH.
    MATERIALS AND METHODS: When 20 PAH patients were diagnosed, they were asked what concerns they had about PAH and what they had researched online. In the evaluation, it was determined that patients frequently searched the Internet for answers to eight queries. These eight queries were posed to ChatGPT, and their responses were recorded. Ten experts in the field of PAH assessed the trustworthiness, value, and hazard of the answers generated by the ChatGPT.
    RESULTS: According to evaluations conducted by experts, the ChatGPT-generated responses were deemed trustworthy with an average score of 8.4 (7.7-9.2) and valuable with an average score of 7.9 (7.4-8.2). Based on the statistical analysis, it can be inferred that most professionals believed that the utilization of prompts provided by ChatGPT did not present a substantial risk, with a mean of 2.1 (1.7-2.5). The answers were assessed for readability using two different indicators, namely the Flesch-Kincaid Grade Level (FKGL) and the Simple Measure of Gobbledygook (SMOG). The average FKGL value was determined to be 13.52 ± 2.40, indicating a "difficult" level of readability.
    CONCLUSION: ChatGPT provides reliable PAH-related information, but it is important to seek professional medical advice before making any decisions regarding PAH. ChatGPT can only provide general information and support, but a qualified healthcare provider can offer tailored recommendations.
    Keywords:  artificial intelligence; chatbot; language models; patient information; pulmonary arterial hypertension; readability; safety
    DOI:  https://doi.org/10.7759/cureus.71472
  22. JAMA Netw Open. 2024 Nov 04. 7(11): e2444988
    CDC Prevention Epicenters Program
      
    DOI:  https://doi.org/10.1001/jamanetworkopen.2024.44988
  23. Eur Arch Paediatr Dent. 2024 Nov 07.
       PURPOSE: To depict and evaluate the characteristics, engagement, content, and quality of YouTube videos containing information about silver diamine fluoride (SDF).
    METHODS: A total of 200 YouTube™ videos were selected and screened, and the video characteristics and engagement indicators were recorded. They were then reviewed for consistency with current professional guidelines on this topic. Two independent reviewers scored the videos using a customized 8-point scoring and 5-point Global Quality Scale (GQS) to assess the content information and the overall quality of each video. These videos were further classified into good, moderate, and poor videos. Kruskal-Wallis, Chi-squared, and Spearman's correlation tests were used for the statistical analysis.
    RESULTS: 110 videos met the inclusion criteria. The median total content score was 3 (IQR = 4) and the median GQS score was 2 (IQR = 2). Less than half (n = 49; 45.5%) of the videos were uploaded by healthcare professionals. The video content was classified as good (n = 26; 23.64%), moderate (n = 43; 39.09%), and poor (n = 41; 37.27%). Good-quality videos have a significantly higher information content score than the other groups (P = 0.001). A strong correlation was found between the total content score and GQS score (rho = 0.970, P = 0.001). Longer duration, higher interaction index, and recent upload are associated with higher content and quality scores.
    CONCLUSION: A considerable number of videos are available on YouTube about SDF treatment and are attracting public interest. The content and quality of these videos vary widely and are related to several factors.
    Keywords:  Fluoride; Silver diamine fluoride; Social media; Videos; YouTube
    DOI:  https://doi.org/10.1007/s40368-024-00958-8
  24. PLoS Negl Trop Dis. 2024 Nov 11. 18(11): e0012660
       BACKGROUND: Mycetoma is a fungal neglected tropical disease. Accurate dissemination of information is critical in endemic areas. YouTube, a popular platform for health information, hosts numerous videos on mycetoma, but the quality and reliability of these videos remain largely unassessed.
    METHODS: We used modified DISCERN and Global Quality Score (GQS) for reliability and quality respectively. Video duration, views, likes, and comments were recorded. Spearman's rank correlation and Mann-Whitney U tests were used to identify correlations between metrices and quality scores.
    RESULTS: We included 73 mycetoma-related YouTube videos were analyzed, the median GQS score was 4.00 ((IQR = 3.33-4.00), indicating generally high-quality content, while the median mDISCERN score was 3.00 (IQR = 3.00-3.00) reflecting moderate reliability. Videos produced by professionals had significantly higher scores compared to those from consumer-generated content (p < 0.001). A significantly positive correlation was observed between video duration and both GQS (r = 0.417, p < 0.001) and mDISCERN (r = 0.343, p = 0.003). However, views, likes and comments did not significantly correlate with video quality. Additionally, videos longer in duration (p < 0.001) and older in upload date (p = 0.014) had higher quality scores.
    CONCLUSIONS: The study shows that mycetoma-related videos on YouTube are generally of high quality, with moderate reliability. This emphasizes the need for expert involvement in content creation and efforts to improve health information online.
    DOI:  https://doi.org/10.1371/journal.pntd.0012660
  25. PeerJ. 2024 ;12 e18344
       Background: The Internet has transformed global information access, particularly through platforms like YouTube, which launched in 1995 and has since become the second largest search engine worldwide with over two billion monthly users. While YouTube offers extensive educational content, including health topics like cardiopulmonary resuscitation (CPR) and basic life support (BLS), it also poses risks due to potential misinformation. Our study focuses on evaluating the accuracy of CPR and BLS videos on YouTube according to the latest 2020 American Heart Association (AHA) guidelines. This research aims to highlight inconsistencies and provide insights into improving YouTube as a reliable educational resource for both lay rescuers and healthcare professionals.
    Methods: In this cross-sectional observational study, English YouTube videos uploaded between October 21, 2020, and May 1, 2023, were searched using keywords related to CPR and basic life support. Videos were assessed for their source, duration, views, use of human or mannequin models, and mean assessment scores by two emergency medicine physicians. A third physician's opinion was sought in cases of disagreement. The first assessment evaluated video validity based on specified information criteria, while the second assessed their ability to convey advanced medical information aligned with the 2020 AHA guidelines.
    Results: In this study, 201 English YouTube videos uploaded between October 21, 2020, and May 1, 2023, were evaluated based on search terms related to CPR and BLS, resulting in 95 videos meeting inclusion criteria after excluding 106 due to various reasons. Most included videos were from healthcare professionals (49.5%), followed by anonymous sources (29.5%) and official medical organizations (21.1%). Video durations ranged widely from 43 to 6,019 seconds, with an average of 692 seconds. Videos featuring mannequins predominated (91.6%), followed by those using human subjects (5.3%) or both (3.2%). Healthcare professional and official medical organization videos scoring significantly higher than those of unknown origin (p = 0.001). Video length did not correlate significantly with view counts, although shorter videos under 5 minutes tended to have higher average views.
    Discussion: The results presented in this study demonstrated that English-language videos on YouTube related to BLS and CPR, throughout the study period, did not conform to the 2020 AHA guidelines in terms of providing basic information for lay rescuers. Furthermore, healthcare professionals cannot obtain advanced medical knowledge through these videos. We recommend a professional oversight mechanism in health-related videos that does not tolerate such misinformation.
    Keywords:  Basic life support; Cardiopulmonary resuscitation; YouTube education
    DOI:  https://doi.org/10.7717/peerj.18344
  26. J Pain Res. 2024 ;17 3577-3586
       Background: As cannabis legalization expands nationally and globally, its use for chronic pain increases, prompting people to seek information on social media platforms like YouTube. This study evaluates the accuracy and quality of information of popular YouTube videos on cannabis for chronic pain.
    Methods: Using search terms related to cannabis for pain, the top 66 videos by view count were selected. Each video was classified as useful, misleading, or neither. The quality and reliability of each video were assessed using the modified DISCERN, mDISCERN, score and the Global Quality Scale, GQS. The video characteristics, usefulness classification, mDISCERN scores, and GQS scores were summarized. Continuous and categorical outcomes were compared using t-test and chi-square, respectively.
    Results: Of the 66 videos, 22.73% (n=15) were classified as useful, and 77.27% (n=51) were classified as neither. Of useful videos, 40.00% (n=6) were uploaded by physicians, 40.00% (n=6) were uploaded by corporations, and 6.67% (n=1) were uploaded by an independent user. Of videos classified as neither useful nor misleading, news sources uploaded 27.45% (n=14) of these videos (P=0.02). Physicians uploaded 37.50% (n = 18) of videos with a GQS score ≥3 (P=0.04), while independent users uploaded significantly more videos with a mDISCERN score <3 (22.20%, P=0.02). Useful videos had a mean GQS of 4.00 ± 0.65 compared to a mean GQS of 2.76 ± 0.86 for videos deemed neither (P<0.0001).
    Conclusion: This study suggests a moderate quality of YouTube content on cannabis use for chronic pain. Given cannabis's growing popularity and potential for misinformation on popular social media platforms, healthcare professionals and organizations should consider uploading educational videos on this topic on YouTube.
    Keywords:  YouTube; analgesics; cannabis; medical cannabis; social media
    DOI:  https://doi.org/10.2147/JPR.S479200
  27. Ind Psychiatry J. 2024 Aug;33(Suppl 1): S36-S44
       Background: YouTube™ is an important online resource to access health-related online information by the public worldwide. However, the quality of information available on it has not been adequately characterized.
    Aim: To assess the quality and reliability of information available on the treatment of premature ejaculation (PME) on YouTube™ in the Hindi and English language videos.
    Materials and Methods: A total of 151 (Hindi: 109, English: 42) YouTube videos were selected for assessment. The quality was evaluated using structured tools: Patient Education Materials Assessment Tool (PEMAT); and a 5-point modified DISCERN questionnaire (Range: 1-serious shortcomings; 5-minimal shortcomings). PEMAT assesses the understandability and actionability of video as separate percentages.
    Results: Three most common treatments suggested for PME were Kegel exercise (22.5%), start-stop technique (21.9%), and antidepressant medications (20.5%). Antidepressant medications, stop-squeeze techniques, and psychotherapy were more frequently suggested in English videos, whereas ayurvedic or herbal medicines were more frequently suggested in Hindi videos. About two-thirds of videos presented information in an easy-to-understand and actionable manner (PEMAT scores ≥70%). Only 6% of videos had a DISCERN score of ≥4, indicating good overall quality of information presented in them.
    Conclusion: People likely to encounter poor-quality information when seeking information for PME treatment on YouTube. A large number of videos suggested ineffective or unproven treatment strategies for PME. Healthcare professionals need to be mindful of this while counselling patients, and guide them regarding useful and reliable sources of health information available online.
    Keywords:  DISCERN; PEMAT; internet; misinformation premature ejaculation
    DOI:  https://doi.org/10.4103/ipj.ipj_333_23
  28. JPRAS Open. 2024 Dec;42 311-314
      Oncologic breast reconstruction (OBR) is a complex process that requires consideration of multiple factors, including chemoradiation, extent of cancer treatment, and surgical approach. Patients often feel uncertain about the numerous surgical options and may turn to popular social media platforms like YouTube for information. Thus, this study aims to assess the quality and reliability of YouTube videos related to OBR. We conducted a retrospective, cross-sectional analysis of YouTube videos related to OBR. Search terms were obtained from plasticsurgery.org and Google Trends. The first ten videos for each search term were analyzed. Videos were categorized by source and subject matter and independently reviewed by three evaluators using the DISCERN scale. This study examined 172 YouTube videos. Five video source categories were identified: Health Care Administrations, Physicians, Non-physician Providers, News Organizations, and Patients. Health Care Administration accounts received the highest overall DISCERN score of 3.6 ± 0.51, followed by Physicians at 2.98 ± 0.93, with News Organization accounts scoring the lowest at 2.22 ± 0.60 (p < 0.001). Videos from academic sources (Physicians and Health Care Administrators) had higher DISCERN scores compared to non-academic sources (News Organizations and Patients), 2.98 ± 0.95 versus 2.28 ± 0.77, respectively (p < 0.001). Our findings indicate that videos from academic sources generally exhibit higher DISCERN scores, pointing to a higher content quality and reliability standard. Given the increasing reliance on YouTube for healthcare information, our study underscores the need for healthcare professionals to engage more actively in content creation and dissemination.
    Keywords:  Oncoplastic breast reconstruction; Patient education; Plastic surgery; Social media; Youtube
    DOI:  https://doi.org/10.1016/j.jpra.2024.10.006
  29. Neuromodulation. 2024 Nov 06. pii: S1094-7159(24)01185-1. [Epub ahead of print]
       OBJECTIVES: YouTube is an important source of medical information for various medical topics and procedures. The purpose of the present study is to appraise the quality of medical information available on YouTube on the topic of peripheral nerve stimulation (PNS) for chronic pain.
    MATERIALS AND METHODS: A total of 53 videos were appraised by four individuals using three scales for appraisal: 1) the Modified DISCERN scale, 2) the Journal of American Medical Association (JAMA) Benchmark scoring, and 3) the Global Quality Scale. Descriptive characteristics and author type of each video were recorded. The mean scores of these scales among all four reviewers based on author type were calculated. One-way analysis of variance was used to compare mean scores of the three scales among author types, and post hoc pairwise Tukey's honestly significant difference test was used to evaluate for significant differences between mean scores. Furthermore, mean scale scores of videos above and below the total average-view count and total average "thumbs ups" were calculated and compared.
    RESULTS: Most videos (n = 31, 58.5%) were submitted from private practice. The mean Modified DISCERN and JAMA scores of videos by academic and society authors (M = 3.54 and 2.83, respectively) were significantly higher (p < 0.05) than the mean Modified DISCERN and JAMA scores of videos by private practice authors (M = 2.10 and 2.03, respectively). Interestingly, the mean scale scores of videos with above-average view counts were found to be lower than scores of videos with below-average view counts across all three scoring instruments.
    CONCLUSIONS: YouTube videos on PNS stimulation for chronic pain are low to moderate in quality. Videos from academic sources were higher in quality than were private practice videos. Furthermore, videos with above-average view counts had lower mean scores on all three instruments, suggesting most of the viewership had watched lower-quality video content.
    Keywords:  Peripheral nerve stimulation; YouTube; social media
    DOI:  https://doi.org/10.1016/j.neurom.2024.09.472
  30. West Afr J Med. 2024 Nov 10. pii: West Afr J Med.. [Epub ahead of print]41(11 Suppl 1): S53
       Background: Parental use of online search engines to get information on diagnosis and treatment options of their children's illnesses, a common practice in developed countries, is creeping into our society, especially in the face of chronic and life-threatening illnesses. Health-related information on the internet is largely unregulated and disease-specific information accessed online may be hard to understand and assimilate by parents making it needful to crosscheck such information with the child's healthcare provider.
    Objectives: This study was undertaken to ascertain the proportion of parents of children with neurological disorders browsing the internet for medical information and factors associated with this behaviour.
    Methods: This cross-sectional study was carried out in the paediatric neurology clinic of the Rivers State University Teaching Hospital, consecutively recruiting 106 child-parent pairs attending the clinic. A questionnaire was used to collect information on biodata and their use of the internet to assess information on their children's diseases. Data was analysed with SPSS 23, with statistical significance set at P value < 0.05.
    Result: The mean ages of the children, mothers, and fathers were 5.5±4.6 years, 37.2±6.9 years, and 44.6±6.9 years respectively. Most mothers (63.2%) and fathers (61.3%) attained tertiary education and were of middle socioeconomic class. Of the 54(50.9%) parents who had browsed the internet, 49(90.7%) used Google, 5 (92.6%), used their phones, but only 11(20.4%) discussed information obtained with a physician. Fifteen (27.8%) parents browsed the internet to conveniently obtain medical information while 50.8% were satisfied with their online search. Tertiary education among parents and middle socioeconomic status was significantly associated with browsing the internet.
    Conclusion: A good proportion of enlightened parents are browsing the internet for medical information but few are verifying this information with physicians, which may have untoward consequences in the future such as the adoption of non-scientific harmful practices.
    Keywords:  Online; children with neurological disorders; information seeking by parents
  31. Pediatr Dermatol. 2024 Nov 12.
      Patient education materials (PEMs) are crucial for improving patient adherence and outcomes; however, they may not be accessible due to high reading levels. Our study used seven readability measures to compare the readability of Spanish PEMs from the Society of Pediatric Dermatology (SPD) with those generated by Open-AI's ChatGPT 4.0 and Google Gemini. Our results showed that when prompted to produce material at a 6th grade level, both AI ChatBots generated significantly improved readability scores when compared to the SPD handouts. These findings suggest that AI-generated PEMs could better meet readability standards and potentially improve patient outcomes, although further studies are needed to confirm this.
    Keywords:  artificial intelligence; health literacy; healthcare equity; patient education; readability
    DOI:  https://doi.org/10.1111/pde.15805