J Fluency Disord. 2025 Aug 15. pii: S0094-730X(25)00051-8. [Epub ahead of print]85 106149
OBJECTIVE: This study aimed to examine how frequently asked questions regarding stuttering were comprehended and answered by ChatGPT.
METHODS: In this exploratory study, eleven common questions about stuttering were asked in a single conversation with the GPT-4o mini. While being blind relative to the source of the answers (whether by AI or SLPs), a panel of five certified speech and language pathologists (SLPs) was requested to differentiate if responses were produced by the ChatGPT chatbot or provided by SLPs. Additionally, they were instructed to evaluate the responses based on several criteria, including the presence of inaccuracies, the potential for causing harm and the degree of harm that could result, and alignment with the prevailing consensus within the SLP community. All ChatGPT responses were also evaluated utilizing various readability features, including the Flesch Reading Ease Score (FRES), Gunning Fog Scale Level (GFSL), and Dale-Chall Score (D-CS), the number of words, number of sentences, words per sentence (WPS), characters per word (CPW), and the percentage of difficult words. Furthermore, Spearman's rank correlation coefficient was employed to examine relationship between the evaluations conducted by the panel of certified SLPs and readability features.
RESULTS: A substantial proportion of the AI-generated responses (45.50 %) were incorrectly identified by SLP panel as being written by other SLPs, indicating high perceived human-likeness (origin). Regarding content quality, 83.60 % of the responses were found to be accurate (incorrectness), 63.60 % were rated as harmless (harm), and 38.20 % were considered to cause only minor to moderate impact (extent of harm). In terms of professional alignment, 62 % of the responses reflected the prevailing views within the SLP community (consensus). The means ± standard deviation of FRES, GFSL, and D-CS were 26.52 ± 13.94 (readable for college graduates), 18.17 ± 3.39 (readable for graduate students), and 9.90 ± 1.08 (readable for 13th to 15th grade [college]), respectively. Furthermore, each response contained an average of 99.73 words, 6.80 sentences, 17.44 WPS, 5.79 CPW, and 27.96 % difficult words. The correlation coefficients ranged between significantly large negative value (r = -0.909, p < 0.05) to very large positive value (r = 0.918, p < 0.05).
CONCLUSION: The results revealed that the emerging ChatGPT possesses a promising capability to provide appropriate responses to frequently asked questions in the field of stuttering, which is attested by the fact that panel of certified SLPs perceived about 45 % of them to be generated by SLPs. However, given the increasing accessibility of AI tools, particularly among individuals with limited access to professional services, it is crucial to emphasize that such tools are intended solely for educational purposes and should not replace diagnosis or treatment by qualified SLPs.
Keywords: Artificial intelligence; ChatGPT; Health literacy; Patient education; Stuttering