HealthBench: How Did GPT-5 "Thinking Mode" Hit 1.6% Hallucination?
https://mylessultimateperspectives.yousher.com/when-a-product-manager-relied-on-a-single-ai-recommendation-sam-s-story
If you have been monitoring the RAG (Retrieval-Augmented Generation) space as closely as I have, you’ve likely seen the recent headlines: "GPT-5 Thinking Mode hits 1.6% hallucination rate on the new HealthBench dataset