bims-librar 2024-03-24 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2024–03–24
twenty papers selected by
Thomas Krichel, Open Library Society

Assessment of search strategies in Medline to identify studies on the impact of long COVID on workability.
Development of a search filter to retrieve reports of interrupted time series studies from MEDLINE and PubMed.
Efficacy of searching in biomedical databases beyond MEDLINE in identifying randomised controlled trials on hyperbaric oxygen treatment.
Online Search Strategies and Results From a Crowdsourced Survey on Asymptomatic Bacteriuria.
How developing a point of need training tool for evidence synthesis can improve librarian support for researchers.
Systematic online academic resource (SOAR) review: Pediatric respiratory infectious disease.
How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information.
Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.
GPT-4 as a Source of Patient Information for Anterior Cervical Discectomy and Fusion: A Comparative Analysis Against Google Web Search.
Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.
Artificial intelligence insights into osteoporosis: assessing ChatGPT's information quality and readability.
Quality assessment of available Internet information on early orthodontic treatment.
Assessing readability and comprehension of web-based patient education materials by American Heart Association (AHA) and CardioSmart online platform by American College of Cardiology (ACC): How useful are these websites for patient understanding?
Assessing and Improving the Effectiveness of Online Patient Education Materials on Essential Vocal Tremor: A Comprehensive Evaluation.
Arabic Web-Based Information on Oral Lichen Planus: Content Analysis.
What do popular YouTube videos say about genetically modified foods? A content analysis.
Evaluation of YouTube As A Source For Graves' Disease Information: Is High-Quality Guideline-Based Information Available?
Breath of Change: Evaluating Asthma Information on TikTok and Introducing the Video Health Information Credibility Score.
Associations of online health information seeking with health behaviors of cancer survivors.
Parents' User Experience Accessing and Using a Web-Based Map of COVID-19 Recommendations for Health Decision-Making: Qualitative Descriptive Study.

Front Res Metr Anal. 2024 ;9 1300533

Assessment of search strategies in Medline to identify studies on the impact of long COVID on workability.

Jean-François Gehanno, Isabelle Thaon, Carole Pelissier, Laetitia Rollin.

   Objectives: Studies on the impact of long COVID on work capacity are increasing but are difficult to locate in bibliographic databases, due to the heterogeneity of the terms used to describe this new condition and its consequences. This study aims to report on the effectiveness of different search strategies to find studies on the impact of long COVID on work participation in PubMed and to create validated search strings.
Methods: We searched PubMed for articles published on Long COVID and including information about work. Relevant articles were identified and their reference lists were screened. Occupational health journals were manually scanned to identify articles that could have been missed. A total of 885 articles potentially relevant were collected and 120 were finally included in a gold standard database. Recall, Precision, and Number Needed to Read (NNR) of various keywords or combinations of keywords were assessed.
Results: Overall, 123 search-words alone or in combination were tested. The highest Recalls with a single MeSH term or textword were 23 and 90%, respectively. Two different search strings were developed, one optimizing Recall while keeping Precision acceptable (Recall 98.3%, Precision 15.9%, NNR 6.3) and one optimizing Precision while keeping Recall acceptable (Recall 90.8%, Precision 26.1%, NNR 3.8).
Conclusions: No single MeSH term allows to find all relevant studies on the impact of long COVID on work ability in PubMed. The use of various MeSH and non-MeSH terms in combination is required to recover such studies without being overwhelmed by irrelevant articles.

Keywords:  MEDLINE; bibliometrics; information retrieval methods; long COVID; work

DOI:  https://doi.org/10.3389/frma.2024.1300533
Res Synth Methods. 2024 Mar 17.

Development of a search filter to retrieve reports of interrupted time series studies from MEDLINE and PubMed.

Phi-Yen Nguyen, Joanne E McKenzie, Simon L Turner, Matthew J Page, Steve McDonald.

   BACKGROUND: Interrupted time series (ITS) studies contribute importantly to systematic reviews of population-level interventions. We aimed to develop and validate search filters to retrieve ITS studies in MEDLINE and PubMed.
METHODS: A total of 1017 known ITS studies (published 2013-2017) were analysed using text mining to generate candidate terms. A control set of 1398 time-series studies were used to select differentiating terms. Various combinations of candidate terms were iteratively tested to generate three search filters. An independent set of 700 ITS studies was used to validate the filters' sensitivities. The filters were test-run in Ovid MEDLINE and the records randomly screened for ITS studies to determine their precision. Finally, all MEDLINE filters were translated to PubMed format and their sensitivities in PubMed were estimated.
RESULTS: Three search filters were created in MEDLINE: a precision-maximising filter with high precision (78%; 95% CI 74%-82%) but moderate sensitivity (63%; 59%-66%), most appropriate when there are limited resources to screen studies; a sensitivity-and-precision-maximising filter with higher sensitivity (81%; 77%-83%) but lower precision (32%; 28%-36%), providing a balance between expediency and comprehensiveness; and a sensitivity-maximising filter with high sensitivity (88%; 85%-90%) but likely very low precision, useful when combined with specific content terms. Similar sensitivity estimates were found for PubMed versions.
CONCLUSION: Our filters strike different balances between comprehensiveness and screening workload and suit different research needs. Retrieval of ITS studies would be improved if authors identified the ITS design in the titles.

Keywords:  interrupted time series; literature search; search filter; search strategy; sensitivity; specificity

DOI:  https://doi.org/10.1002/jrsm.1716
Diving Hyperb Med. 2024 Mar 31. 54(1): 2-8

Efficacy of searching in biomedical databases beyond MEDLINE in identifying randomised controlled trials on hyperbaric oxygen treatment.

Hira Khan, Mohammad Sindeed Islam, Manvinder Kaur, Joseph K Burns, Cole Etherington, Pierre-Marc Dion, Sarah Alsayadi, Sylvain Boet.

   Introduction: Literature searches are routinely used by researchers for conducting systematic reviews as well as by healthcare providers, and sometimes patients, to quickly guide their clinical decisions. Using more than one database is generally recommended but may not always be necessary for some fields. This study aimed to determine the added value of searching additional databases beyond MEDLINE when conducting a literature search of hyperbaric oxygen treatment (HBOT) randomised controlled trials (RCTs).
Methods: This study consisted of two phases: a scoping review of all RCTs in the field of HBOT, followed by a a statistical analysis of sensitivity, precision, 'number needed to read' (NNR) and 'number unique' included by individual biomedical databases. MEDLINE, Embase, Cochrane Central Register of Control Trials (CENTRAL), and Cumulated Index to Nursing and Allied Health Literature (CINAHL) were searched without date or language restrictions up to December 31, 2022. Screening and data extraction were conducted in duplicate by pairs of independent reviewers. RCTs were included if they involved human subjects and HBOT was offered either on its own or in combination with other treatments.
Results: Out of 5,840 different citations identified, 367 were included for analysis. CENTRAL was the most sensitive (87.2%) and had the most unique references (7.1%). MEDLINE had the highest precision (23.8%) and optimal NNR (four). Among included references, 14.2% were unique to a single database.
Conclusions: Systematic reviews of RCTs in HBOT should always utilise multiple databases, which at minimum include MEDLINE, Embase, CENTRAL and CINAHL.

Keywords:  Biomedical databases; Research methods; Systematic review

DOI:  https://doi.org/10.28920/dhm54.1.2-8
Urogynecology (Phila). 2024 Mar 13.

Online Search Strategies and Results From a Crowdsourced Survey on Asymptomatic Bacteriuria.

Megan S Bradley, Melanie D Hetzel-Riggin, Julia C Knight, Ashley Murillo, Halina Zyczynski, Christopher R Shelton.

IMPORTANCE: Despite the prevalence of asymptomatic bacteriuria (ASB), what proportion of the population is aware of this condition and the quality of internet resources are currently unknown.
OBJECTIVE: This study aimed to use an online crowdsourcing platform to explore general knowledge and internet search strategies, along with the quality of information, on ASB.
STUDY DESIGN: An online survey was administered through a crowdsourcing platform to women 50 years or older via Qualtrics, which is a sophisticated online survey tool. Participants completed a survey on ASB, and participants were asked how they would search the internet for information both on urinary test results and on ASB. Outcomes included survey responses, and qualitative data were coded and analyzed thematically. χ2 Testing and regression modeling were used to look for variables associated with concern for ASB.
RESULTS: There were a total of 518 participants who passed attention check qualifications, and only 45 respondents (8.7%) had heard of ASB. Many were concerned about progress to a worsening infection (n = 387 [77.6%]). When controlling for confounders, education beyond a college degree was not associated with a lower concern for ASB when compared with those with a high school education or less (adjusted odds ratio, 0.63; 95% confidence interval, 0.25-1.55; P = 0.31). Medical providers were the target audience for a majority of the websites, and many of the patient-facing results were of poor quality.
CONCLUSIONS: Our national survey of women demonstrated a prevalent knowledge deficit surrounding ASB. We must seek to create high-quality, readily available, patient-facing information to increase awareness of ASB, allay concerns, and increase antibiotic stewardship.

DOI: https://doi.org/10.1097/SPV.0000000000001500
Health Info Libr J. 2024 Mar 19.

How developing a point of need training tool for evidence synthesis can improve librarian support for researchers.

Bronte Chiang, Caitlin McClurg.

  Medical and health sciences librarians who are involved in evidence synthesis projects will know that systematic reviews are intensely rigorous, requiring research teams to devote significant resources to the methodological process. As expert searchers, librarians are often identified as personnel to conduct the database searching portion and/or are approached as experts in the methodology to guide research teams through the lifecycle of the project. This research method has surged in popularity at our campus and demand for librarian participation is unsustainable. As a response to this, the library created self-directed learning objects in the form of roadmap to assist researchers in learning about the knowledge synthesis methodology in an expedient, self-directed manner. This paper will discuss the creation, implementation and feedback around our educational offering: Systematic & Scoping Reviews: Your Roadmap to Conducting an Evidence Synthesis.

Keywords:  eLearning; education and training; higher education; research support; review and systematic search; students, medical; teaching

DOI:  https://doi.org/10.1111/hir.12524
AEM Educ Train. 2024 Feb;8(1): e10945

Systematic online academic resource (SOAR) review: Pediatric respiratory infectious disease.

Joshua Belfer, Cindy G Roskind, Andrew Grock, JooYeon Jung, Shirley W Bae, Lisa Zhao, Brad Sobolewski.

Background: Free open access medical education (FOAM) resources have become increasingly popular in graduate medical education. Despite their accessibility, the assessment of FOAM resources' quality is challenging due to their decentralized nature and the diverse qualifications of their authors and distribution platforms. In this first pediatric systematic online academic resource (SOAR) review, we utilized a systematic methodology to aggregate and assess the quality of FOAM resources on pediatric respiratory infectious disease topics.
Methods: We searched 177 keywords using FOAMSearch, the top 50 FOAM websites on the Social Media Index, and seven additional pediatric emergency medicine-focused blogs. Following a basic initial screen, resources then underwent full-text quality assessment utilizing the revised Medical Education Translational Resources: Impact and Quality (rMETRIQ) tool.
Results: The search yielded 44,897 resources. After 44,456 were excluded, 441 underwent quality assessment. A total of 36/441 posts (8% of posts) reached the high-quality threshold score (rMETRIQ ≥ 16). The most frequent topics overall were pneumonia and bronchiolitis. A total of 67/441 posts (15% of posts) were found to have a rMETRIQ score of less than or equal to 7, which may indicate poor quality.
Conclusions: We systematically identified, described, and performed quality assessment on FOAM resources pertaining to the topic of pediatric respiratory infectious disease. We found that there is a paucity of high-quality posts on this topic. Despite this, the curated list of high-quality resources can help guide trainees and educators toward relevant educational information and suggest unmet needs for future FOAM resources.

DOI: https://doi.org/10.1002/aet2.10945
Clin Orthop Relat Res. 2024 Apr 01. 482(4): 578-588

How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information.

Oscar Y Shen, Jayanth S Pratap, Xiang Li, Neal C Chen, Abhiram R Bhashyam.

BACKGROUND: The lay public is increasingly using ChatGPT (a large language model) as a source of medical information. Traditional search engines such as Google provide several distinct responses to each search query and indicate the source for each response, but ChatGPT provides responses in paragraph form in prose without providing the sources used, which makes it difficult or impossible to ascertain whether those sources are reliable. One practical method to infer the sources used by ChatGPT is text network analysis. By understanding how ChatGPT uses source information in relation to traditional search engines, physicians and physician organizations can better counsel patients on the use of this new tool.
QUESTIONS/PURPOSES: (1) In terms of key content words, how similar are ChatGPT and Google Search responses for queries related to topics in orthopaedic surgery? (2) Does the source distribution (academic, governmental, commercial, or form of a scientific manuscript) differ for Google Search responses based on the topic's level of medical consensus, and how is this reflected in the text similarity between ChatGPT and Google Search responses? (3) Do these results vary between different versions of ChatGPT?
METHODS: We evaluated three search queries relating to orthopaedic conditions: "What is the cause of carpal tunnel syndrome?," "What is the cause of tennis elbow?," and "Platelet-rich plasma for thumb arthritis?" These were selected because of their relatively high, medium, and low consensus in the medical evidence, respectively. Each question was posed to ChatGPT version 3.5 and version 4.0 20 times for a total of 120 responses. Text network analysis using term frequency-inverse document frequency (TF-IDF) was used to compare text similarity between responses from ChatGPT and Google Search. In the field of information retrieval, TF-IDF is a weighted statistical measure of the importance of a key word to a document in a collection of documents. Higher TF-IDF scores indicate greater similarity between two sources. TF-IDF scores are most often used to compare and rank the text similarity of documents. Using this type of text network analysis, text similarity between ChatGPT and Google Search can be determined by calculating and summing the TF-IDF for all keywords in a ChatGPT response and comparing it with each Google search result to assess their text similarity to each other. In this way, text similarity can be used to infer relative content similarity. To answer our first question, we characterized the text similarity between ChatGPT and Google Search responses by finding the TF-IDF scores of the ChatGPT response and each of the 20 Google Search results for each question. Using these scores, we could compare the similarity of each ChatGPT response to the Google Search results. To provide a reference point for interpreting TF-IDF values, we generated randomized text samples with the same term distribution as the Google Search results. By comparing ChatGPT TF-IDF to the random text sample, we could assess whether TF-IDF values were statistically significant from TF-IDF values obtained by random chance, and it allowed us to test whether text similarity was an appropriate quantitative statistical measure of relative content similarity. To answer our second question, we classified the Google Search results to better understand sourcing. Google Search provides 20 or more distinct sources of information, but ChatGPT gives only a single prose paragraph in response to each query. So, to answer this question, we used TF-IDF to ascertain whether the ChatGPT response was principally driven by one of four source categories: academic, government, commercial, or material that took the form of a scientific manuscript but was not peer-reviewed or indexed on a government site (such as PubMed). We then compared the TF-IDF similarity between ChatGPT responses and the source category. To answer our third research question, we repeated both analyses and compared the results when using ChatGPT 3.5 versus ChatGPT 4.0.
RESULTS: The ChatGPT response was dominated by the top Google Search result. For example, for carpal tunnel syndrome, the top result was an academic website with a mean TF-IDF of 7.2. A similar result was observed for the other search topics. To provide a reference point for interpreting TF-IDF values, a randomly generated sample of text compared with Google Search would have a mean TF-IDF of 2.7 ± 1.9, controlling for text length and keyword distribution. The observed TF-IDF distribution was higher for ChatGPT responses than for random text samples, supporting the claim that keyword text similarity is a measure of relative content similarity. When comparing source distribution, the ChatGPT response was most similar to the most common source category from Google Search. For the subject where there was strong consensus (carpal tunnel syndrome), the ChatGPT response was most similar to high-quality academic sources rather than lower-quality commercial sources (TF-IDF 8.6 versus 2.2). For topics with low consensus, the ChatGPT response paralleled lower-quality commercial websites compared with higher-quality academic websites (TF-IDF 14.6 versus 0.2). ChatGPT 4.0 had higher text similarity to Google Search results than ChatGPT 3.5 (mean increase in TF-IDF similarity of 0.80 to 0.91; p < 0.001). The ChatGPT 4.0 response was still dominated by the top Google Search result and reflected the most common search category for all search topics.
CONCLUSION: ChatGPT responses are similar to individual Google Search results for queries related to orthopaedic surgery, but the distribution of source information can vary substantially based on the relative level of consensus on a topic. For example, for carpal tunnel syndrome, where there is widely accepted medical consensus, ChatGPT responses had higher similarity to academic sources and therefore used those sources more. When fewer academic or government sources are available, especially in our search related to platelet-rich plasma, ChatGPT appears to have relied more heavily on a small number of nonacademic sources. These findings persisted even as ChatGPT was updated from version 3.5 to version 4.0.
CLINICAL RELEVANCE: Physicians should be aware that ChatGPT and Google likely use the same sources for a specific question. The main difference is that ChatGPT can draw upon multiple sources to create one aggregate response, while Google maintains its distinctness by providing multiple results. For topics with a low consensus and therefore a low number of quality sources, there is a much higher chance that ChatGPT will use less-reliable sources, in which case physicians should take the time to educate patients on the topic or provide resources that give more reliable information. Physician organizations should make it clear when the evidence is limited so that ChatGPT can reflect the lack of quality information or evidence.

DOI: https://doi.org/10.1097/CORR.0000000000002995
Vascular. 2024 Mar 18. 17085381241240550

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.

Ethan Chervonski, Keerthi B Harish, Caron B Rockman, Mikel Sadek, Katherine A Teter, Glenn R Jacobowitz, Todd L Berland, Joann Lohr, Colleen Moore, Thomas S Maldonado.

   OBJECTIVES: Generative artificial intelligence (AI) has emerged as a promising tool to engage with patients. The objective of this study was to assess the quality of AI responses to common patient questions regarding vascular surgery disease processes.
METHODS: OpenAI's ChatGPT-3.5 and Google Bard were queried with 24 mock patient questions spanning seven vascular surgery disease domains. Six experienced vascular surgery faculty at a tertiary academic center independently graded AI responses on their accuracy (rated 1-4 from completely inaccurate to completely accurate), completeness (rated 1-4 from totally incomplete to totally complete), and appropriateness (binary). Responses were also evaluated with three readability scales.
RESULTS: ChatGPT responses were rated, on average, more accurate than Bard responses (3.08 ± 0.33 vs 2.82 ± 0.40, p < .01). ChatGPT responses were scored, on average, more complete than Bard responses (2.98 ± 0.34 vs 2.62 ± 0.36, p < .01). Most ChatGPT responses (75.0%, n = 18) and almost half of Bard responses (45.8%, n = 11) were unanimously deemed appropriate. Almost one-third of Bard responses (29.2%, n = 7) were deemed inappropriate by at least two reviewers (29.2%), and two Bard responses (8.4%) were considered inappropriate by the majority. The mean Flesch Reading Ease, Flesch-Kincaid Grade Level, and Gunning Fog Index of ChatGPT responses were 29.4 ± 10.8, 14.5 ± 2.2, and 17.7 ± 3.1, respectively, indicating that responses were readable with a post-secondary education. Bard's mean readability scores were 58.9 ± 10.5, 8.2 ± 1.7, and 11.0 ± 2.0, respectively, indicating that responses were readable with a high-school education (p < .0001 for three metrics). ChatGPT's mean response length (332 ± 79 words) was higher than Bard's mean response length (183 ± 53 words, p < .001). There was no difference in the accuracy, completeness, readability, or response length of ChatGPT or Bard between disease domains (p > .05 for all analyses).
CONCLUSIONS: AI offers a novel means of educating patients that avoids the inundation of information from "Dr Google" and the time barriers of physician-patient encounters. ChatGPT provides largely valid, though imperfect, responses to myriad patient questions at the expense of readability. While Bard responses are more readable and concise, their quality is poorer. Further research is warranted to better understand failure points for large language models in vascular surgery patient education.

Keywords:  ChatGPT; Vascular surgery; artificial intelligence; google bard; patient education; readability

DOI:  https://doi.org/10.1177/17085381241240550
Global Spine J. 2024 Mar 21. 21925682241241241

GPT-4 as a Source of Patient Information for Anterior Cervical Discectomy and Fusion: A Comparative Analysis Against Google Web Search.

Paul G Mastrokostas, Leonidas E Mastrokostas, Ahmed K Emara, Ian J Wellington, Elizabeth Ginalis, John K Houten, Amrit S Khalsa, Ahmed Saleh, Afshin E Razi, Mitchell K Ng.

   STUDY DESIGN: Comparative study.
OBJECTIVES: This study aims to compare Google and GPT-4 in terms of (1) question types, (2) response readability, (3) source quality, and (4) numerical response accuracy for the top 10 most frequently asked questions (FAQs) about anterior cervical discectomy and fusion (ACDF).
METHODS: "Anterior cervical discectomy and fusion" was searched on Google and GPT-4 on December 18, 2023. Top 10 FAQs were classified according to the Rothwell system. Source quality was evaluated using JAMA benchmark criteria and readability was assessed using Flesch Reading Ease and Flesch-Kincaid grade level. Differences in JAMA scores, Flesch-Kincaid grade level, Flesch Reading Ease, and word count between platforms were analyzed using Student's t-tests. Statistical significance was set at the .05 level.
RESULTS: Frequently asked questions from Google were varied, while GPT-4 focused on technical details and indications/management. GPT-4 showed a higher Flesch-Kincaid grade level (12.96 vs 9.28, P = .003), lower Flesch Reading Ease score (37.07 vs 54.85, P = .005), and higher JAMA scores for source quality (3.333 vs 1.800, P = .016). Numerically, 6 out of 10 responses varied between platforms, with GPT-4 providing broader recovery timelines for ACDF.
CONCLUSIONS: This study demonstrates GPT-4's ability to elevate patient education by providing high-quality, diverse information tailored to those with advanced literacy levels. As AI technology evolves, refining these tools for accuracy and user-friendliness remains crucial, catering to patients' varying literacy levels and information needs in spine surgery.

Keywords:  GPT-4; Google; anterior cervical discectomy and fusion; artificial intelligence; health literacy; patient education; readability

DOI:  https://doi.org/10.1177/21925682241241241
Semin Ophthalmol. 2024 Mar 22. 1-8

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.

Samuel A Cohen, Arthur Brant, Ann Caroline Fisher, Suzann Pershing, Diana Do, Carolyn Pan.

   PURPOSE: Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials.
METHODS: The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria.
RESULTS: Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google (p < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers (p < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%).
CONCLUSIONS: When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.

Keywords:  Cataract surgery; ChatGPT; Google; cataracts; patient education; readability

DOI:  https://doi.org/10.1080/08820538.2024.2326058
Arch Osteoporos. 2024 Mar 19. 19(1): 17

Artificial intelligence insights into osteoporosis: assessing ChatGPT's information quality and readability.

Yakup Erden, Mustafa Hüseyin Temel, Fatih Bağcıer.

  Accessible, accurate information, and readability play crucial role in empowering individuals managing osteoporosis. This study showed that the responses generated by ChatGPT regarding osteoporosis had serious problems with quality and were at a level of complexity that that necessitates an educational background of approximately 17 years.
PURPOSE: The use of artificial intelligence (AI) applications as a source of information in the field of health is increasing. Readable and accurate information plays a critical role in empowering patients to make decisions about their disease. The aim was to examine the quality and readability of responses provided by ChatGPT, an AI chatbot, to commonly asked questions regarding osteoporosis, representing a major public health problem.
METHODS: "Osteoporosis," "female osteoporosis," and "male osteoporosis" were identified by using Google trends for the 25 most frequently searched keywords on Google. A selected set of 38 keywords was sequentially inputted into the chat interface of the ChatGPT. The responses were evaluated with tools of the Ensuring Quality Information for Patients (EQIP), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE).
RESULTS: The EQIP score of the texts ranged from a minimum of 36.36 to a maximum of 61.76 with a mean value of 48.71 as having "serious problems with quality." The FKRE scores spanned from 13.71 to 56.06 with a mean value of 28.71 and the FKGL varied between 8.48 and 17.63, with a mean value of 13.25. There were no statistically significant correlations between the EQIP score and the FKGL or FKRE scores.
CONCLUSIONS: Although ChatGPT is easily accessible for patients to obtain information about osteoporosis, its current quality and readability fall short of meeting comprehensive healthcare standards.

Keywords:  Artificial intelligence; ChatGPT; Chatbot; Health information; Osteoporosis

DOI:  https://doi.org/10.1007/s11657-024-01376-5
BMC Oral Health. 2024 Mar 19. 24(1): 351

Quality assessment of available Internet information on early orthodontic treatment.

Mehmed Taha Alpaydin, Tugce Alpaydin, Merve Koklu, Suleyman Kutalmış Buyuk.

   BACKGROUND: This study aimed to evaluate the content, reliability, quality and readability of information on Internet websites about early orthodontic treatment.
METHODS: The "early orthodontic treatment" search term was individually entered into four web search engines. The content quality and reliability were reviewed with DISCERN, Journal of American Medical Association (JAMA), and Health on the Net code (HONcode) tools using the contents of websites meeting predetermined criteria. The readability of websites was evaluated with Flesch Reading Facilitate Score (FRES) and Flesch-Kincaid Grade Level (FKGL).
RESULTS: Eighty-six websites were suitable for inclusion and scoring of the 200 websites. 80.2% of websites belonged to orthodontists, 15.1% to multidisciplinary dental clinics and 4.7% to professional organizations. The mean DISCERN score of all websites (parts 1 and 2) was 27.98/75, ranging between 19 and 67. Professional organization websites had the highest scores for DISCERN criteria. Moreover, 45.3% of websites were compatible with JAMA's disclosure criterion, 7% with the currency criterion, 5.8% with the authorship criterion and 5.8% with the attribution criterion. Only three websites met all JAMA criteria, and these websites belonged to professional organizations. None of the websites had the HONcode logo. Mean FRES and FKGL were 47.6 and 11.6, respectively.
CONCLUSIONS: The quality of web-based information about early orthodontic treatment is poor, and readability is insufficient. More accurate and higher quality Internet sources are required on the web.

Keywords:  Early orthodontic treatment; Internet; Patient information; Quality of information; Websites analysis

DOI:  https://doi.org/10.1186/s12903-024-04019-w
Am Heart J Plus. 2023 Aug;32 100308

Assessing readability and comprehension of web-based patient education materials by American Heart Association (AHA) and CardioSmart online platform by American College of Cardiology (ACC): How useful are these websites for patient understanding?

Amanpreet Singh Wasir, Annabelle Santos Volgman, Meenakshi Jolly.

  Cardiovascular diseases (CVD) are a leading cause of morbidity & mortality worldwide. Patient education materials help patients understand the disease and its management. Health literacy is an important challenge that may contribute to health inequities and disparities. The National Institute of Health and American Medical Association recommend patient education materials to be ≤6th-grade reading level.
Objective: To evaluate readability and comprehension of patient education materials related to CVD, available at the American Heart Association (AHA) & CardioSmart web platform by the American College of Cardiology (ACC) websites.
Method: We examined the readability and comprehension of 63 patient education materials (accessed June 2022) using: (a) Flesch Kincaid Readability Ease (FKRE): measures readability (0-100, goal > 70), (b) Flesch Kincaid Grade Level (FKGL) (goal = grade 7). We compared the AHA and ACC scores using descriptive and t-tests. P-value ≤ 0.05 was significant.
Results: Sixty-three web pages of patient education materials (AHA 24, ACC 39) were reviewed in June 2022. Mean ± standard deviation (SD) FKRE was 54.9 ± 6.8 for all the web pages. FKRE 50-60 equates to "fairly difficult to read." Mean ± SD FKGL was 10.0 ± 1.3. AHA patient education materials content was significantly more difficult to read and comprehend, were longer, and had more complex words than ACC patient education materials.
Conclusions: CVD-related patient education materials available online through leading national organizations are not congruent with the recommendations from national healthcare organizations. They are not as user-friendly as they can be. Urgent recognition of the gaps and unmet needs are indicated to optimize patient health literacy.

Keywords:  American Heart Association; American college of cardiology; Cardiovascular conditions; Comprehension; Health literacy; Patient education materials

DOI:  https://doi.org/10.1016/j.ahjo.2023.100308
J Voice. 2024 Mar 15. pii: S0892-1997(24)00061-4. [Epub ahead of print]

Assessing and Improving the Effectiveness of Online Patient Education Materials on Essential Vocal Tremor: A Comprehensive Evaluation.

Bethany Ho, Ellen M Hong, Brian E Benson.

   INTRODUCTION: Health literacy, a strong indicator of health outcomes, is an important aspect of good patient care. With an increasing reliance on the Internet for health information, online patient materials should be easily understood by the average reader. The American Medical Association (AMA) and National Institutes of Health (NIH) recommend that patient education materials be written at a sixth-grade level. Creating effective digital information requires careful consideration of not only word choice, but also many other factors including actionability, comprehensiveness, evidence, and visual organization. To support the creation of valuable online health content, the Office of Disease Prevention and Health Promotion (ODPHP) published Health Literacy Online, a research-based guide that discusses why and how to design digital health information tools.This study aims to assess the effectiveness of online patient education materials regarding vocal tremor, assess the effectiveness of patient education materials published by the American Laryngological Association, and to evaluate the usefulness of the Health Literacy Online guide in creating effective online patient education materials on laryngological diseases.
METHODS: The first 50 unsponsored search results for the terms "vocal tremor" and "essential vocal tremor" were evaluated. Each website was analyzed using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL) readability tests, the DISCERN instrument, and the Patient Education Materials Assessment Tool (PEMAT). The resources published by the American Laryngological Association were also evaluated in this manner.
RESULTS: Of the 100 websites identified from the initial queries, 14 websites were included in this analysis. The average FRES and FKGL scores were 47.21 ± 10.47 and 10.96 ± 2.46, respectively, indicating that readers need a 11th-grade education to comprehend the materials. The average DISCERN score was 22.50 ± 9.76, indicating "very poor" quality with serious shortcomings and not appropriate sources of information about treatment choices. The average PEMAT understandability score was 68.43% ± 9.80% with an actionability score of 20.00% ± 23.53%, indicating the information was fairly difficult to process and do not help identify next steps. For the materials published by the American Laryngological Association (ALA), the average FRES and FKGL scores were 38.33 ± 12.81 and 12.56 ± 2.15, respectively, indicating a 12th-grade reading level. A DISCERN score of 27 was consistent across each item, indicating "very poor" quality. A PEMAT understandability score was 45% with an actionability score of 0%, indicating they are difficult to process and do not help identify next steps. After writing a revised sample of the information provided by the ALA based on the ODPHP's Health Literacy Online tool, the new FRES and FKGL score was 75.6 and 5.9, respectively. The new DISCERN score was 35. The new PEMAT understandability scores was 79% with actionability scores of 80%.
CONCLUSION: This study found that most publicly available online patient education materials on essential vocal tremor and other laryngological diseases do not use plain language and require reading levels too advanced for the average reader to comprehend. In addition, most websites were of very poor quality readability, and were therefore less likely to benefit individuals in their decision-making. In an age where most people seek information on the Internet, the lack of easily understood online patient resources reduces the usefulness of these resources for many individuals. Professional organizations and societies like the American Laryngological Association may consider the use of the Health Literacy Online tool as a resource to provide both accurate and easily understandable patient education resources.

Keywords:  Essential vocal tremor; Online resources; Patient education; Readability

DOI:  https://doi.org/10.1016/j.jvoice.2024.02.021
JMIR Form Res. 2024 Mar 19. 8 e49198

Arabic Web-Based Information on Oral Lichen Planus: Content Analysis.

Azzam AlMeshrafi, Arwa F AlHamad, Hamoud AlKuraidees, Lubna A AlNasser.

   BACKGROUND: The use of web-based health information (WBHI) is on the rise, serving as a valuable tool for educating the public about health concerns and enhancing treatment adherence. Consequently, evaluating the availability and quality of context-specific WBHI is crucial to tackle disparities in health literacy and advance population health outcomes.
OBJECTIVE: This study aims to explore and assess the quality of the WBHI available and accessible to the public on oral lichen planus (OLP) in Arabic.
METHODS: The Arabic translation of the term OLP and its derivatives were searched in three general search platforms, and each platform's first few hundred results were reviewed for inclusion. We excluded content related to cutaneous LP, content not readily accessible to the public (eg, requiring subscription fees or directed to health care providers), and content not created by health care providers or organizations (ie, community forums, blogs, and social media). We assessed the quality of the Arabic WBHI with three standardized and validated tools: DISCERN, Journal of the American Medical Association (JAMA) benchmarks, and Health On the Net (HON).
RESULTS: Of the 911 resources of WBHI reviewed for eligibility, 49 were included in this study. Most WBHI resources were provided by commercial affiliations (n=28, 57.1%), with the remainder from academic or not-for-profit affiliations. WBHI were often presented with visual aids (ie, images; n=33, 67.4%). DISCERN scores were highest for WBHI resources that explicitly stated their aim, while the lowest scores were for providing the effect of OLP (or OLP treatment) on the quality of life. One-quarter of the resources (n=11, 22.4%) met all 4 JAMA benchmarks, indicating the high quality of the WBHI, while the remainder of the WBHI failed to meet one or more of the JAMA benchmarks. HON scores showed that one-third of WBHI sources had scores above 75%, indicating higher reliability and credibility of the WBHI source, while one-fifth of the sources scored below 50%. Only 1 in 7 WBHI resources scored simultaneously high on all three quality instruments. Generally, WBHI from academic affiliations had higher quality scores than content provided by commercial affiliations.
CONCLUSIONS: There are considerable variations in the quality of WBHI on OLP in Arabic. Most WBHI resources were deemed to be of moderate quality at best. Providers of WBHI could benefit from increasing collaboration between commercial and academic institutions in creating WBHI and integrating guidance from international quality assessment tools to improve the quality and, hopefully, the utility of these valuable WBHI resources.

Keywords:  Arab; Arabic; chronic; credibility; credible; dental; dentist; dentistry; health information; inflammation; inflammatory; information seeking; medical information; mouth; mucous membrane; mucous membranes; online information; oral; oral lichen planus; periodontology; quality; reliability; reliable

DOI:  https://doi.org/10.2196/49198
Dialogues Health. 2023 Dec;2 100131

What do popular YouTube videos say about genetically modified foods? A content analysis.

Sawyer I Basch, Lalitha Samuel, Joseph Fera.

   Purpose: YouTube is one of the most popular media sharing platforms that facilitates both professionals and lay people to participate in dissemination of knowledge and opinions. Its wide-reaching impact allows both top-down and bottom-up flow of information between experts and lay audience. With a vast proportion of Americans obtaining health-related information digitally, the purpose of this study was to describe the content of 100 most viewed YouTube videos in the English language, specific to genetically modified foods (GMFs).
Methods: Using the search terms "genetically modified foods" the URLs and metadata for 100 English YouTube videos with the highest viewership were curated. Each video was viewed, and dichotomously coded for the absence or presence of ten content categories. Descriptive statistics, percentages of categorical variables and independent one-tailed t-tests (α=.05) were conducted to assess the statistical effect of the absence or presence of these categories on the number of views and likes garnered by the videos.
Results: Cumulatively, the 100 videos observed received 65,536,885 views and 1,328,605 likes. Only 7% of the videos were created by professionally credentialed individuals or organizations. More than 90% of the sampled videos described GMFs with an example, 50% mentioned their role in alleviating hunger, and 65% mentioned ecological concerns attributed to GMFs.
Conclusions: Our results underscore the need for health professionals to increase their digital presence on online media sharing platforms such as YouTube, and capitalize on its pervasiveness as potential conduits of accurate scientific information to equip consumers make evidence-based, informed decision regarding GMFs.

Keywords:  Genetically modified foods; Informed decision; YouTube; online information

DOI:  https://doi.org/10.1016/j.dialog.2023.100131
OTO Open. 2024 Jan-Mar;8(1):8(1): e118

Evaluation of YouTube As A Source For Graves' Disease Information: Is High-Quality Guideline-Based Information Available?

Oluwatobiloba Ayo-Ajibola, Ryan J Davis, Claire Theriault, Christopher Lamb, Deborah Choe, Matthew E Lin, Trevor E Angell, Daniel I Kwon.

   Objective: To understand the quality of informational Graves' disease (GD) videos on YouTube for treatment decision-making quality and inclusion of American Thyroid Association (ATA) treatment guidelines.
Study Design: Cross-sectional cohort.
Setting: Informational YouTube videos with subject matter "Graves' Disease treatment."
Method: The top 50 videos based on our query were assessed using the DISCERN instrument. This validated algorithm discretely rates treatment-related information from excellent (≥4.5) to very poor (<1.9). Videos were also screened for ATA guideline inclusion. Descriptive statistics were used for cohort characterization. Univariate and multivariate linear regressions characterized factors associated with DISCERN scores. Significance was set at P < .05.
Results: The videos featured 57,513.43 views (SD = 162,579.25), 1054.70 likes (SD = 2329.77), and 168.80 comments (SD = 292.97). Most were patient education (52%) or patient experience (24%). A minority (40%) were made by thyroid specialists (endocrinologists, endocrine surgeons, or otolaryngologists). Under half did not mention all 3 treatment modalities (44%), and 54% did not mention any ATA recommendations. Overall, videos displayed poor reliability (mean = 2.26, SD = 0.67), treatment information quality (mean = 2.29, SD = 0.75), and overall video quality (mean = 2.47, SD = 1.07). Physician videos were associated with lower likes, views, and comments (P < .001) but higher DISCERN reliability (P = .015) and overall score (P = .019). Longer videos (P = .015), patient accounts (P = .013), and patient experience (P = .002) were associated with lower scores.
Conclusion: The most available GD treatment content on YouTube varies significantly in the quality of medical information. This may contribute to suboptimal disease understanding, especially for patients highly engaged with online health information sources.

Keywords:  DISCERN; Graves' disease; YouTube; social media; treatment decision‐making

DOI:  https://doi.org/10.1002/oto2.118
Cureus. 2024 Feb;16(2): e54247

Breath of Change: Evaluating Asthma Information on TikTok and Introducing the Video Health Information Credibility Score.

Bilal Irfan, Ihsaan Yasin, Aneela Yaqoob.

  Introduction Asthma's global prevalence underscores the need for accessible health information dissemination, especially in the digital age. TikTok, known for its wide reach and diverse content, presents both opportunities and challenges in health information dissemination. This study aims to characterize the quality and reach of asthma-related content on TikTok and introduces the Video Health Information Credibility Score (VHICS) as a novel tool for quality assessment. Materials and methods We used a systematic methodology to analyze the top 100 TikTok videos by the number of likes tagged with #asthma. Data were collected in June 2023 and January 2024 to allow for temporal trend analysis. Videos were evaluated based on engagement metrics (views, likes, comments, shares, and favorites) and quality using the DISCERN instrument. Results Our analysis showed that physician-generated content accounted for a significant proportion of asthma-related videos, with varying levels of engagement. The DISCERN scores, with a range of 1 (lowest) to 5 highest), provided insights into content quality, revealing trends in user engagement and information reliability over time. Temporal analysis indicated changes in content creation and audience interaction. Discussion The study highlights the evolving landscape of digital health communication on TikTok. The introduction of VHICS added depth to the quality assessment of future directions, indicating the necessity for accurate and reliable health information on social media. The findings suggest an imperative for healthcare professionals to address misinformation and leverage digital platforms for patient education effectively. Conclusions TikTok is a significant medium for health information dissemination, with substantial potential for impact in patient education. The introduction of VHICS can enrich the analysis of video content, offering a robust tool for assessing the quality of health information on social media. This study underscores the importance of credible, clear, and audience-relevant health communication in the digital era.

Keywords:  asthma; digital health education; online patient education; social media analytics; tiktok; vhics

DOI:  https://doi.org/10.7759/cureus.54247
Digit Health. 2024 Jan-Dec;10:10 20552076241238074

Associations of online health information seeking with health behaviors of cancer survivors.

Zhaoli Liu, Yue Liao, Chueh-Lung Hwang, Chad D Rethorst, Xiaoli Zhang.

   Objective: To examine the effects of online health information seeking (OHIS) behavior on five health behaviors (regular physical activity, less sedentary, calorie checking, no alcohol consumption, and no smoking) among adult cancer survivors in the United States.
Methods: A cross-sectional analysis was conducted with adult cancer survivors (≥18 years old) from Cycles 2, 3, and 4 of the Health Information National Trends Survey (HINTS). The respondents self-reported OHIS, and the data on the five health behaviors were pooled to perform descriptive and multivariable logistic regression analyses using Stata 17.0.
Results: Of the 1245 adult cancer survivors, approximately 74% reported OHIS behavior for themselves within the previous year of the survey. We found that OHIS was significantly and positively associated with the level of physical activity (odds ratio [OR] = 1.53, p = .002) and calorie checking (OR = 1.64, p = .001), but not with sedentary behavior, smoking, and alcohol consumption after adjusting for age, sex, race/ethnicity, education, income, body mass index (BMI), marital status, depression, and general health.
Conclusions: Findings from this study suggest that most cancer survivors used various forms of digital tools and platforms to seek health information. The study also demonstrated an independent impact of OHIS behavior on physical activity and calorie checking. Healthcare professionals may need to encourage and guide cancer survivors to seek credible eHealth information and further utilize digital health tools as a platform for care delivery, promoting health behaviors and preventing adverse health outcomes among cancer survivors.

Keywords:  Online health information seeking; cancer survivors; eHealth; health behaviors; physical activity

DOI:  https://doi.org/10.1177/20552076241238074
JMIR Form Res. 2024 Mar 20. 8 e53593

Parents' User Experience Accessing and Using a Web-Based Map of COVID-19 Recommendations for Health Decision-Making: Qualitative Descriptive Study.

Samantha Cyrkot, Lisa Hartling, Shannon D Scott, Sarah A Elliott.

   BACKGROUND: The eCOVID19 Recommendations Map & Gateway to Contextualization (RecMap) website was developed to identify all COVID-19 guidelines, assess the credibility and trustworthiness of the guidelines, and make recommendations understandable to various stakeholder groups. To date, little has been done to understand and explore parents' experiences when accessing and using the RecMap website for COVID-19 health decision-making.
OBJECTIVE: To explore (1) where parents look for COVID-19 health information and why, (2) parents' user experience when accessing and using the RecMap website to make health decisions, and (3) what knowledge mobilization activities are needed to increase parents' awareness, use, and engagement with the RecMap website.
METHODS: We conducted a qualitative descriptive study using semistructured interviews and a think-aloud activity with parents of children aged 18 years or younger living in Canada. Participants were asked to provide feedback on the RecMap website and to "think aloud" as they navigated the website to find relevant COVID-19 health recommendations. Demographic information was collected using a web-based questionnaire. A hybrid deductive and inductive thematic approach guided analysis and data synthesis.
RESULTS: A total of 21 participants (13/21, 62% mothers) were interviewed and participated in a think-aloud activity. The data were categorized into four sections, representative of key elements that deductively and inductively emerged from the data: (1) parent information seeking behaviors and preferences for COVID-19, (2) RecMap website usability, (3) perceived usefulness of the RecMap website, and (4) knowledge mobilization strategies to increase awareness, use, and engagement of the RecMap website. Parents primarily used the internet to find COVID-19 information and focused on sources that they determined to be credible, trustworthy, simple, and engaging. As the pandemic evolved, participants' information-seeking behaviors changed, specifically their topics of interest and search frequency. Most parents were not aware of the RecMap website before this study but found satisfaction with its concept and layout and expressed intentions to use and share it with others. Parents experienced some barriers to using the RecMap website and suggested key areas for improvement to facilitate its usability and perceived usefulness. Recommendations included a more user-friendly home page for lay audiences (separate public-facing user interface), improving the search and filter options, quicker navigation, clearer titles, more family-friendly graphics, and improving mobile-friendly access. Several strategies to disseminate the RecMap website were also expressed, including a mix of traditional and nontraditional methods (handouts and social media) in credible and high-traffic locations that parents frequent often.
CONCLUSIONS: Overall, parents liked the concept of the RecMap website but had some suggestions to improve its usability (language, navigation, and website interface). These findings can be used to improve the RecMap website for parents and offer insight for the development and dissemination of effective web-based health information tools and resources for the general public.

Keywords:  COVID-19; SARS-CoV-2; awareness; credibility; credible; descriptive; guidelines; health evidence; information behavior; information needs; information seeking; information-seeking behaviour; interface; internet; interview; knowledge mobilization; parent; parenting; public health; qualitative; recommendation; recommender; think-aloud; think-aloud activity; trust; trustworthy; usability; user experience; web design; website

DOI:  https://doi.org/10.2196/53593