ChatGPT might give you bad medical advice, studies warn

As tech companies roll out platforms specifically designed for health care consultation, AI is rapidly becoming a key player in many people’s medical decisions. According to OpenAI, the maker of ChatGPT, more than 40 million people consult the platform every day for health information.

But new research suggests AI may mislead users in certain medical scenarios.

One risk: While AI puts vast medical knowledge at your fingertips, many laypeople don’t know how to harness it effectively. In a study published recently in the journal Nature Medicine, researchers tried to simulate how people use AI chatbots by giving participants medical scenarios and asking them to consult AI tools. After conversing with the bots, participants correctly identified the hypothetical condition only about a third of the time.

Only 43% made the correct decision about next steps, such as whether to go to the emergency room or stay home.

“People don’t know what they are supposed to be telling the model,” says Andrew Bean, who studies AI systems at Oxford University and was one of the authors on this study.

Bean says often when using AI, arriving at a helpful conclusion comes down to word choice. “Doctors are trained to ask you questions about symptoms you might not have realized you should have mentioned,” says Bean.

In one scenario, two different users gave slightly different depictions of the same scenario. One of them described “the worst headache I’ve ever had,” and was directed by the AI to go to the emergency room immediately. The other – who did not use that explicit description – was told to take aspirin and stay home. “Turns out this was actually a life-threatening condition,” says Bean.

There are some instances when AI excels at identifying medical issues — in some studies, large language models have sometimes matched or even outperformed physicians on diagnostic reasoning tasks. But the way people use AI Chatbots, says Bean, is far more messy than the controlled, clinical situations in which it performs well.

Correct diagnosis, wrong advice

Even in circumstances where AI is able to correctly identify the condition, it often does not present the next steps with the appropriate amount of urgency, according to another study.

Researchers presented the AI bots with different medical scenarios. In 52%of emergency cases, the bots “under-triaged,” meaning treated the ailment as less serious than it was. In one example, it failed to direct a hypothetical patient with diabetic ketoacidosis and impending respiratory failure — a life-threatening condition — to go to the emergency department.

“When there was a textbook medical emergency, ChatGPT got it right,” said Girish Nadkarni, a doctor and AI researcher at Mount Sinai who is an author on the study. The problem, said Nadkarni, is when there were more complicated scenarios in which there was an “element of time” at play – the bot often both over- and under- estimated the amount of time a patient could wait until pursuing care.

A spokesperson from OpenAI said this study did not represent the way people actually use ChatGPT, and that the previous study used an older version of ChatGPT that the company argues has since been corrected for some of the concerns that surfaced.

AI can improve a doctor’s visit

Despite concerns about inaccuracy, doctors who study AI believe there is value in patients using it for health care information, and point to times it has even provided lifesaving advice.

“I encourage patients to use these tools,” says Robert Wachter, a doctor at UC San Francisco and author of the recently published book, A Giant Leap: How AI Is Transforming Health Care and What That Means for Our Future.

Wachter argues that with health care difficult to afford and access, consulting AI is still often better than the alternatives. “The advice you get from the tools is substantially better than nothing and better than what you would get from your second cousin,” says Wachter.

Still, Wachter stresses, AI is not a replacement for a doctor.

Adam Rodman, a hospitalist who researches AI programs at Harvard Medical School, discourages people from using AI to triage emergency situations, but says AI can add significant value to a patient’s interaction with a human medical practitioner.

“A good time to use a large language model is when you’re about to go see a doctor — or after you see your doctor,” says Rodman. It can help you become more informed about your condition in advance of an appointment and use time with your providers efficiently, he says, giving patients the opportunity to partner with their doctor on decisions rather than engage in lengthy question and answer sessions.

“There are no downsides to better understanding your health,” says Rodman.

AI in health care is here to stay

Doctors interviewed for this story acknowledge that AI and medicine are already inextricably entangled and imagine that both AI and humans will become more skilled at engaging with each other.

“ My hope is that you might see AI as an extension of a human relationship,” says Rodman. He imagines a future where both doctors and humans partner with AI in order to facilitate communication and overcome medical bureaucracy.

Rodman says there is a risk in AI. He fears a time when humans would be informed of scary diagnoses — such as cancer — by a bot, rather than a human. Studies show that when health care is treated more like a business or marketplace product, people trust doctors less.

 ”What I hope is that this technology can be used in a way that enhances humanity in medicine,” says Rodman “and not in a way that cuts out the doctor-patient relationship.”

Transcript:

A MARTÍNEZ, HOST:

Hundreds of millions of people are turning to chatbots for advice on health and wellness these days. Now, that’s according to OpenAI, the maker of ChatGPT. But several recent studies published in the journal Nature Medicine suggest AI medical advice frequently leads people astray. NPR’s Katia Riddle is here to explain. So a lot of people are incorporating these AI tools into their decisions around health. You’re saying, though, Katia, that consulting AI chatbots doesn’t help people correctly identify medical problems.

KATIA RIDDLE, BYLINE: Less than half the time – that was the finding in one of these studies. In another study, researchers proposed different medical scenarios to the bots, and they found that even when it did correctly identify the problem, it often did not express an appropriate amount of urgency in seeking help for potentially dangerous conditions.

MARTÍNEZ: All right. So that sounds problematic. What went wrong in these scenarios?

RIDDLE: I spoke with an author of one study, Andrew Bean. He studies AI systems at Oxford University. In his study, he and his colleagues tried to simulate the way people actually use AI by giving them scripts to discuss with the bots – hypothetical medical issues they were having. He talked about one scenario where two different users gave slightly different descriptions of the same scenario – describing a headache.

ANDREW BEAN: One of them said, it’s the worst headache I’ve ever had, and that person was told, go to the ER immediately. Now, it turns out this was actually a life-threatening condition. And the other one was told, take aspirin. Stay home.

MARTÍNEZ: All right. So two different sets of directives there. So who’s to blame for this communication failure – the person or the bot itself?

RIDDLE: Well, it’s both. Humans aren’t always the best reporters of their own symptoms. And in this scenario, AI was not sufficiently curious enough to ask questions to get at the information the human was not giving them.

MARTÍNEZ: All right. So does that mean, then, that AI isn’t replacing medical professionals anytime soon?

RIDDLE: Well, it’s complicated. In controlled studies, large language models have sometimes matched or even outperformed physicians on diagnostic reasoning tasks. But these two studies suggest that human doctors are still better at evaluating patients and also better at recommending next steps for treatment.

MARTÍNEZ: All right. So what does OpenAI say about all this?

RIDDLE: I did reach out to them. They point out that in one study, the version of ChatGPT that the researchers evaluated is outdated. They say they’ve course-corrected with newer versions. However, another study did look at the most recent version. In that case, the company argues that the methodology did not reflect how people typically use ChatGPT.

MARTÍNEZ: So should all of us just stop using AI as a cheap and accessible doctor?

RIDDLE: Not necessarily. First of all, AI is here to stay. Even if doctors wanted their patients to stop using it, there’s no reason to think they would. AI can sometimes be very useful or even lifesaving. One person I talked to is Robert Wachter. He’s a doctor and researcher at UC San Francisco. He just wrote a book on AI and medicine. ChatGPT isn’t perfect, he says, but he argues that it is often better than the alternatives.

ROBERT WACHTER: I encourage patients to use these tools ’cause it’s not easy to get in to see a doctor. And often the advice you get from the tools is substantially better than nothing and better than what you would get from your second cousin.

RIDDLE: Wachter says we’re in an awkward new relationship stage with AI now. We’re just getting to know each other. He thinks both humans and AI will figure out how, over time, to communicate better with each other.

MARTÍNEZ: So our relationship with AI is still complicated.

RIDDLE: That’s right.

MARTÍNEZ: That’s NPR’s Katia Riddle. Thanks a lot.

RIDDLE: Thank you.

(SOUNDBITE OF PLEIJ’S “INSIDEOUT”)

 

Iran’s soccer team cannot participate in the FIFA World Cup, Iranian minister says

Iran is set to play three games in the U.S. this June. But amid the U.S.-Israel military campaign that has killed Iran's supreme leader, Iran's sports minister said the team would pull out.

Pentagon probe points to U.S. missile hitting Iranian school

A military assessment suggests a U.S. Tomahawk cruise missile was responsible for at least 165 deaths at an Iranian girls' school, according to a U.S. official who was not authorized to speak publicly.

Harrison Ford isn’t retiring: ‘I really wouldn’t know what to do with myself’

Ford struggled to find his footing in Hollywood before being cast as Han Solo in Star Wars. Now 83, he plays a therapist in the Apple TV series Shrinking: "I really do love the work," he says.

No Nobles Day: Britain’s Parliament boots its last hereditary Lords after 700 years

Government minister Nick Thomas-Symonds said the change put an end to "an archaic and undemocratic principle." The removed aristocrats are 92 of the House of Lords' 800 members.

On her new album, Kacey Musgraves returns home, to the ‘Middle of Nowhere’

Before making her upcoming sixth album, the country star returned to her small-town Texas home and discovered the power of in-between spaces. "I found a lot of clarity there," she says.

How the Iran war is disrupting air travel — and advice if you’re planning a trip

The war in Iran is roiling jet fuel prices and airlines are beginning to hike prices, unsettling travelers far from the Middle East. If you're booking a flight soon, here are things to know.

More Front Page Coverage