Microsoft’s AI Is Better Than Doctors at Diagnosing Disease

Microsoft’s AI Is Better Than Doctors at Diagnosing Disease

Medicine may be a combination of art and science, but Microsoft just showed that much of both can be learned—by a bot.

The company reports in a study published on the preprint site arXiv that its AI-based medical program, the Microsoft AI Diagnostic Orchestrator (MAI-DxO), correctly diagnosed 85% of cases described in the New England Journal of Medicine. That’s four times higher than the accuracy rate of human doctors, who came up with the right diagnoses about 20% of the time.

[time-brightcove not-tgx=”true”]

The cases are part of the journal’s weekly series designed to stump doctors: complicated, challenging scenarios where the diagnosis isn’t obvious. Microsoft took about 300 of these cases and compared the performance of its MAI-DxO to that of 21 general-practice doctors in the U.S. and U.K. In order to mimic the iterative way doctors typically approach such cases—by collecting information, analyzing it, ordering tests, and then making decisions based on those results—Microsoft’s team first created a stepwise decision-making benchmark process for each case study. This allowed both the doctors and the AI system to ask questions and make decisions about next steps, such as ordering tests, based on the information they learned at each step—similar to a flow chart for decision-making, with subsequent questions and actions based on information gleaned from previous ones.

The 21 doctors were compared to a pooled set of off-the-shelf AI models that included Claude, DeepSeek, Gemini, GPT, Grok, and Llama. To further mirror the way human doctors approach such challenging cases, the Microsoft team also built an Orchestrator: a virtual emulation of the sounding board of colleagues and consultations that physicians often seek out in complex cases.

In the real world, ordering medical tests costs money, so Microsoft tracked the tests that the AI system and human doctors ordered to see which method could get it done more cheaply.

Not only did MAI-DxO far outperform doctors in landing on the correct diagnosis, but the AI bot was able to do so at a 20% lower cost on average.

Read More: If Thimerosal Is Safe, Why Is It Being Removed From Vaccines?

“The four-fold increase in accuracy was more than previous studies have shown,” says Dr. Eric Topol, chair of translational medicine and director and founder of the Scripps Research Translational Institute, who provided insights on the project. “Most of the time there is a 10% absolute percentage difference, so this is a really big jump.” But what really got his attention was cost. “Not only was the AI more accurate, but it was much less expensive,” he says.

MAI-DxO is still in development and not available for use outside of research yet. But incorporating such a model into medicine could potentially lead to reductions in medical errors, which account for a significant share of health care costs, and increase the efficiency of human doctors—which could in turn lead to better outcomes for patients.

“This is a startling result,” says Mustafa Suleyman, CEO of Microsoft AI. “I think it gives us a clear line of sight to making the very best expert diagnostics available to everybody in the world at an unbelievably affordable price point.”

A decade ago, when AI algorithms were first introduced in medicine, they were focused on binary tasks, Suleyman says, such as scanning images to detect tumors. “Today, these models are having fluent conversations at very high quality, asking the right questions and probing in the right ways, suggesting the right testing and interventions at the right time,” he says.

Another advantage an AI system may have is that it’s free of many of the biases inherent in the human experience. “We all have confirmation bias,” says Dr. Dominic King, vice president of Microsoft AI. “Sometimes clinicians will see something and think, ‘I’m sure this is just like the patient I saw last week.’ But AI is thinking slightly differently.”

Read More: The Surprising Reason Rural Hospitals Are Closing

MAI-DxO doesn’t just spit out an answer. It shows its work, so that doctors can potentially study and scrutinize its reasoning process. “It’s available for real-time oversight by the human clinician,” says Suleyman. “That’s a level of transparency and visibility into the thinking process that we haven’t seen before.” That, in turn, could improve the education and training doctors receive to further increase diagnostic accuracy and ultimately patient outcomes.

Still, some experts in the field of AI and medicine note that Microsoft’s approach isn’t entirely novel, since its diagnoses depended on the combined performance of multiple AI models. “In my mind, they are not testing any individual model that is optimized for health care,” says Keith Dreyer, chief data science officer at Massachusetts General Hospital and Brigham and Women’s Hospital Center for Clinical Data Science. “They are testing the concept of testing all of the models out there today and combining their decision-making together. That part to me is not surprising.”

Dreyer also points out that the results don’t necessarily bring such systems closer to being approved by regulatory agencies like the U.S. Food and Drug Administration, which still hasn’t weighed in on whether such systems are medical devices or not.

Read More: What Getting 105 Blood Tests From a Health Startup Taught Me

Microsoft isn’t the only company pursuing an AI-based medical program for diagnosing disease. Google is developing a conversation-based system to emulate the doctor-patient back-and-forth, mimicking the reasoning of human physicians in collecting information from patients and interpreting those symptoms to land on a diagnosis. In early tests, the system outperformed doctors in accurately diagnosing simulated patient case studies. In a 2024 test similar to the one Microsoft performed using case studies, the earlier version of Google’s system accurately diagnosed 59% of cases, compared to human doctors’ rate of 33%.

The real test, however, will be seeing how these AI systems perform in actual health systems. That’s the next step for understanding how AI could complement or supplement the doctor’s role in diagnosing disease. “It’s impressive what they did,” says Topol. “But it doesn’t change medical practice until they take it out on the real medical highway.”

Topol hopes the AI systems will be tested in different health systems, where doctors and the AI platform could be compared on a number of different and more typical cases. That would require a full-scale clinical trial, as well as approval from regulatory agencies to ensure no patients will be exposed to harm by relying more heavily on AI-based decision-making in delivering their care. “We are very much on that journey to create the evidence base required to support both clinicians and patients to make a difference in their health,” says King.

If confirmed, results like these could set the stage for introducing high-quality medical expertise in parts of the world that may not currently have access to major academic institutions or cutting-edge health care. “My primary focus in the next five to 10 years is to make sure everybody in the world gets access to the very best medical advice of all kinds,” says Suleyman. “We are very, very excited about this.”

Leave a comment

Send a Comment

Your email address will not be published. Required fields are marked *