J Clin Epidemiol. 2026 Apr 20. pii: S0895-4356(26)00147-2. [Epub ahead of print]
112272
Isabel O'Byrne,
Johanna Pope,
Paula Byrne,
Darren Dahly,
Conor Judge,
Martin O Donnell,
Finn Krewer,
Mengqi Li,
Km Saif-Ur-Rahman,
Tom Conway,
James Thomas,
Geraldine Sheils,
Nikita Burke,
Marie O'Byrne,
Aislinn O'Byrne,
Declan Devane.
Podcasts can make health evidence easier to follow, but it is unclear whether AI-assisted production can match human production when both use the same audio format. We will run a randomised, two-arm, non-inferiority trial comparing AI-assisted podcasts with human-produced podcasts. Adults (≥18 years; English-proficient) will be recruited from the general public via Prolific, an online research participant recruitment platform, and randomly allocated 1:1 to listen to three short episodes (6-8 minutes each) based on the same Cochrane Plain Language Summaries. The AI arm uses Wondercraft AI in a human-in-the-loop workflow; the human arm features experienced communicators working to an identical brief. In both arms, content is limited to the Plain Language Summary, with authorship masked for participants and expert raters. The primary outcome is comprehension, measured by a 10-item test per episode, with the primary analysis using the participant-level mean score across the three episodes, aligned with the QUEST "Understanding" dimension. Secondary outcomes include format accessibility (listenability), quality of information, perceived trust, and safety. Non-inferiority margins are pre-specified; for comprehension, the margin is 1 point on the 10-item scale. If non-inferiority is shown, we will also assess superiority. We plan to recruit 458 participants. Differences between arms will be estimated using appropriate repeated-measures models, with two-sided 95% confidence intervals. This trial evaluates whether a vetted AI workflow can match human communicators on comprehension, quality, safety, accessibility, and trust when both deliver podcasts derived from the same evidence base. By providing head-to-head evidence in the same audio format, the study will address a practical question faced by journals and health organisations already experimenting with AI tools: can AI generate clear, safe, and trusted audio content at scale, and identifies where human input remains essential.
Keywords: Cochrane reviews; QUEST framework; artificial intelligence; health communication; non-inferiority trial; plain language summaries; podcasts; systematic reviews