J Med Syst. 2025 Nov 10. 49(1): 158
The cost-effective open-source artificial intelligence (AI) model DeepSeek-R1 in China holds significant potential for healthcare applications. As a health education tool, it could help patients acquire health science knowledge and improve health literacy. Low back pain (LBP), the most common musculoskeletal problem globally, has seen increasing use of large language model (LLM)-based AI chatbots by patients to access health information, making it critical to further examine the quality of such information. This study aimed to evaluate the response quality and readability of answers generated by DeepSeek-R1 to common patient questions about LBP. Ten questions were formulated using inductive methods based on literature analysis and Baidu Index data, which were presented to DeepSeek-R1 on March 10, 2025. The evaluation spanned readability, understandability, actionability, clinician assessment, and reference assessment. Readability was measured using the Flesch-Kincaid Grade Level, Flesch Reading Ease Scale, Gunning Fog Index, Coleman-Liau Index, and Simple Measure of Gobbledygook (SMOG Index). Understandability and actionability were assessed via the Patient Education Materials and Assessment Tool for Printable Materials (PEMAT-P). Clinicians evaluated accuracy, completeness, and correlation. A reference evaluation tool was used to assess reference quality and the presence of hallucinations. Readability analysis indicated that DeepSeek's responses were overall "difficult to read", with Flesch-Kincaid Grade Level (mean 12.39, SD 1.91), Flesch Reading Ease Scale (mean 19.55, Q1 12.94, Q3 29.78), Gunning Fog Index (mean 13.95, SD 2.61), Coleman-Liau Index (mean 17.46, SD 2.30), and SMOG Index (mean 11.04, SD 1.37). PEMAT-P revealed good understandability but weak actionability. Consensus among five clinicians confirmed satisfactory accuracy, completeness, and relevance. References Assessment identified 9 instances (14.8%) of hallucinated references, while Supporting was rated as moderate, with most references sourced from authoritative platforms. Our study demonstrates the potential of DeepSeek-R1 in the educational content for patients with LBP. It can be employed as a supplement to patient education tools rather than substituting for clinical judgment.
Keywords: Artificial intelligence; DeepSeek-R1; Large language models; Low back pain; Patient education; Readability assessment