The 94% AI That Becomes 34% in Your Hands
Oxford researchers found that while AI models like GPT-4 correctly diagnose medical conditions 94.9% of the time in isolation, real people using the same AI achieve less than 34.5% accuracy—no better than Google. The failure point isn't the AI's knowledge but the conversation itself, revealing a critical flaw in how medical chatbots are currently deployed.
medical AI
ChatGPT healthcare
GPT-4 medical diagnosis
LLM reliability