Ther Innov Regul Sci. 2025 Jun 14.
INTRODUCTION: Generative artificial intelligence (AI) has the potential to transform and accelerate how information is accessed during the regulation of human drug and biologic products.
OBJECTIVES: Determine whether a generative AI-supported application with retrieval-augmented generation (RAG) architecture can be used to correctly answer questions about the information contained in FDA guidance documents.
METHODS: Five large language models (LLMs): Flan-UL2, GPT-3.5 Turbo, GPT-4 Turbo, Granite, and Llama 2, were evaluated in conjunction with the RAG application Golden Retriever to assess their ability to answer questions about the information contained in clinically oriented FDA guidance documents. Models were configured to precise mode with a low temperature parameter setting to generate precise, non-creative answers, ensuring reliable clinical regulatory review guidance for users.
RESULTS: During preliminary testing, GPT-4 Turbo was the highest performing LLM. It was therefore selected for additional evaluation where it generated a correct response with additional helpful information 33.9% of the time, a correct response 35.7% of the time, a response with some of the required correct information 17.0% of the time, and a response with any incorrect information 13.4% of the time. The RAG application was able to cite the correct source document 89.2% of the time.
CONCLUSION: The ability of the generative AI application to identify the correct guidance document and answer questions could significantly reduce the time in finding the correct answer for questions about FDA guidance documents. However, as the information in FDA guidance documents may be relied on by sponsors and FDA staff to guide important drug development decisions, the use of incorrect information could have a significantly negative impact on the drug development process. Based on our results, the correct citation documents can be used to reduce the time in finding the correct document that contains the information, but further research into the refinement of generative AI will likely be required before this tool can be relied on to answer questions about information contained in FDA guidance documents. Rephrasing questions by including additional context information, reconfiguring the embedding and chunking parameters, and other prompt engineering techniques may improve the rate of fully correct and complete responses.
Keywords: Document search; FDA guidance; Generative AI; Large language model