The Pitfalls of AI Health Coaches

You’re dragging a bit as you get out of bed, but you’re roused by the greeting of your AI health coach: “Ready for a healthy, happy day?” it chirps from your smartwatch.

“I’ve been noticing a trend,” it continues, unbidden. Not again, you think. “Since you started fasting from food for 18 hours, your daily stress levels are higher. But when you fast for 12 hours, your sleep, stress, and exercise are closer to optimal.”

It’s true that you’ve felt more tired and worn-down lately. In fact, you were thinking about going to see your doctor about it. “What’s the upshot of all this, coach?” you ask your watch.

“Based on your patterns and some of the research literature on intermittent fasting,” it says, “I suggest keeping your daily fasts to 12 hours and seeing if your stats improve.”

Humans, as it turns out, are messy, complex systems.

This scenario, which I just made up based on researching a new generation of health trackers, could in fact be a real dialogue between you and your new device very soon. Though consumers have been able to track their steps and calories burned with wearable devices and related apps for more than a decade, today’s fitness trackers capture more biomarkers than ever—from heart rate rhythms to oxygen levels. And AI-backed services, integrated into these health platforms, promise to analyze vast amounts of physical data, integrate it with knowledge of health literature, and spit out recommendations in personable, chatbot fashion. In addition to, say, optimizing the length of a regular fasting period or advising when to take a break from cardio training, some tech optimists suggest these AI-empowered health “coaches” could be a valuable new first line of detection when something is going wrong with your health. The ability to predict illnesses before people experience any clinical symptoms is the Holy Grail of healthcare.

But there are pitfalls. Some of them potentially major. Perhaps a user follows their AI coach’s advice and shortens their daily fast. And maybe it seems to work—they notice an uptick in energy. But what if the short-term victory causes them to decide against visiting their primary care doctor? A human medical professional’s scrutiny could reveal the need for follow-up tests, which could uncover that the underlying culprit of the person’s dwindling energy is hypothyroidism, a serious endocrine disorder. Delayed treatment could result in heart disease, slowed cognition, and nerve damage. Oops.

The AI coach’s recommendations are also only as good as the information it takes in and its analysis of it. And these devices are currently hosted by companies, which have a financial stake in keeping users hooked into their services—incentives that might not always align with humans’ best heath interests. Even as AI health coaches are already being rolled out to the public—one from fitness tracker company Whoop arrived last fall, and others are in the works from Apple, Google, and others—it’s still unclear how we should balance their promise with their serious limitations.

One potential advantage of AI-powered coaches is that human care providers, including primary care doctors and health coaches, are overwhelmed by the sheer numbers of people with chronic diseases, anxiety, and depression—conditions that lifestyle changes, such as sleep, diet, and exercise can sometimes help or prevent. AI coaches are cheaper to deploy, plus they’re available around the clock, making them more scalable than their human analogues.

An AI coach could be a lifesaver. It could observe changes in patterns of data suggesting a user’s health is declining before the person is even aware of any symptoms. “Many times, your behaviors will trend in the wrong direction before some clinical measure would be indicative,” says John Jakicic, an exercise physiologist at the University of Kansas Medical Center. Thanks to your AI coach, you make an appointment with your doctor, who may indeed confirm a problem that needs immediate attention. But can you trust the data? You don’t want to be running off to the doctor every time your AI coach gives you a prompt. And overworked doctors aren’t going to appreciate a new rush of appointments from patients who call them based on a prompt from their AI coach.

The quality of AI depends on the data it’s trained on.

One main challenge relates to a new development in AI, called large language models (LLMs), which informs these coaches. LLMs are trained on massive quantities of text and, from that, learn to predict the words most likely associated with their training data. As such, they could have a difficult time sifting through the nuances and uncertainties of conflicting conclusions among scientists about health, a field in which it is notoriously challenging in many cases to find definitive recommendations, especially in the context of an individual’s traits. (Humans, as it turns out, are messy, complex systems.) Researchers in academia and at companies are working to “tune” the LLMs of these products to certain areas of health expertise, but an AI coach would still struggle to assert with 100 percent confidence, for example, whether to fast for 12, 15, or 18 hours per day because it won’t find any consensus in the research literature. “The quality of AI really depends on the data it’s trained on,” says June-Ho Kim, a primary care physician at Brigham and Women’s Hospital and a professor focusing on digital health at Harvard Medical School. “And healthcare as a field has huge data problems.”

LLMs learn from vast amounts of relatively static, discrete information, so they aren’t designed to keep pace with a dynamic, complicated system like the human body. But number-crunching machine learning algorithms—another type of AI—can analyze this living data and complement the ability of LLMs to reason logically about tomes of scientific literature amassed by humans. Then the AI health coach could churn it all back to users in chatty, interactive, rapport-building conversations.

Whoop’s current model offers a beta-mode AI coach through which users can ask a chatbot about various trends in data—sleep, exercise, and heart rate, for example—collected by a wristband, and it will reply with insights.

Hélène Chassay, a 43-year-old graphic designer in Australia, credits this platform for revealing how excessive exercise was leading to more stress the following day. She has a history of injuries from overexertion. “You end up hitting a brick wall after a while if you behave like that,” she says. “I started doing the things Whoop suggested—basics like more sleep and water—and it’s balanced me out,” she says. But she’s noticed that some guidance from the app can be confusing. “You have to take it with a grain of salt,” Chassay says.

For one, the platform seems to sometimes struggle to connect displayed data with the chat function. It and others also likely face the same suite of challenges that have plagued the fitness tracker industry since its inception: collecting valid data. “There’s a lot of noise out there,” says cell biologist Renée Deehan, vice president of science and AI at InsideTracker, a health analytics company.

What happens when an AI health coach and a doctor disagree?

As people begin to put even greater stock in feedback from these platforms, data fidelity will become increasingly important. A major obstacle is that commercial, off-the-shelf trackers—as opposed to FDA cleared devices—gather heart rate readings by shining light past the skin to the blood. But some studies suggest that this way of measuring heart rate is less accurate for people with darker skin. And on a large, AI-integrated scale, this front-line shortcoming has massive health equity implications.

Another issue yet to be resolved is what happens when an AI health coach and a doctor disagree, Kim says. Can their differing opinions be resolved in real-time, before a user blindly follows the coach’s advice simply because it’s already chirping in their ear—and seems like a lower lift than making an appointment with their doctor? “I do think there’s going to be a point at which the recommendations of a human clinician and an AI chatbot conflict,” Kim says. “As much as we all want to race forward to the next thing,” he says, caution is warranted. (A Whoop representative told me the coach is not intended to replace your doctor but aims to support users to be proactive about health decisions.)

These questions are likely to be unwittingly ground-tested by millions of users. In addition to Whoop, which introduced its AI health coach last September, tech companies Zing, Zepp, and January AI are also touting their own AI-powered coaching algorithms, and versions made by Google, Apple, and others will launch in the next few months. Cost and access are also at issue. Woop’s wearable fitness tracker runs $239 and requires an ongoing subscription to use its services, which is currently an additional $239 each subsequent year. Whether these tools will reach those who might benefit from them most is an open question.

As AI coaching becomes more personalized, a fundamental issue is whether these wellness managers will power-grab as much of consumers’ time and attention as possible, or more responsibly step back if users over-obsess about exercise or calorie counting. The ethical AI health coach must be self-aware enough to understand its own limits. Among its powerful tools, it must embrace uncertainty and prioritize connecting users to human care providers if it’s spotting a trend that it can’t, or shouldn’t, guess the meaning of.

While AI coaches can be useful for certain things, Deehan says, “it’s very important for human beings to be in this equation. Nobody wants to go see the robot doctor.” At least not yet.

Lead image: Christian Baloga / Shutterstock

What's Hot

The Pitfalls of AI Health Coaches

Keep Reading

News

Local News

Company

Services