
Photo illustrations by Judy Blomquist/Harvard Staff
Machine healing
Artificial intelligence is up to the challenge of reducing human suffering, experts say. Are we?
When Adam Rodman was a second-year medical student in the 2000s, he visited the library for a patient whose illness had left doctors stumped. Rodman searched the catalog, copied research papers, and shared them with the team.
“It made a big difference in that patient’s care,” Rodman said. “Everyone said, ‘This is so great. This is evidence-based medicine.’ But it took two hours. I can do that today in 15 seconds.”
Rodman, now an assistant professor at Harvard Medical School and a doctor at Beth Israel Deaconess Medical Center, these days carries a medical library in his pocket — a smartphone app created after the release of the large language model ChatGPT in 2022. OpenEvidence — developed in part by Medical School faculty — allows him to query specific diseases and symptoms. It searches the medical literature, drafts a summary of findings, and lists the most important sources for further reading, providing answers while Rodman is still face-to-face with his patient.
“We say, ‘Wow, the technology is really powerful.’ But what do we do with it to actually change things?”
Adam Rodman
Artificial intelligence in various forms has been used in medicine for decades — but not like this. Experts predict that the adoption of large language models will reshape medicine. Some compare the potential impact with the decoding of the human genome, even the rise of the internet. The impact is expected to show up in doctor-patient interactions, physicians’ paperwork load, hospital and physician practice administration, medical research, and medical education.
Most of these effects are likely to be positive, increasing efficiency, reducing mistakes, easing the nationwide crunch in primary care, bringing data to bear more fully on decision-making, reducing administrative burdens, and creating space for longer, deeper person-to-person interactions.

Adam Rodman, assistant professor at Harvard Medical School and physician at Beth Israel Deaconess Medical Center
“The optimist in me hopes that AI can make us doctors better versions of ourselves to better care for our patients.”
transcript
Transcript:
ADAM RODMAN: I am obsessed with metacognition, with thinking about thinking. So what excites me most about AI and medicine? Well, the optimist in me hopes that AI and medicine can make us doctors better versions of ourselves to better care for our patients. I think the best case scenario for me is a world in which an artificial intelligence is communicating with me and my patients, looking for signs of implicit bias, looking for signs that I might be making the wrong decision, and more importantly, feeding back that information to me so that I can improve over time, so that I can become a better human. My worry is actually directly related to this. These are very powerful reasoning technologies, and really what is medical education other than a way to frame and shape the medical mind? So part of my worry is that because these technologies are so powerful, they’ll shortcut many of the ways that we know that doctors learn and get better, and we may end up with generations of physicians who don’t know how to think the best. I don’t think that this is the foregone conclusion, but it really is my worry about the way that things are going.
But there are serious concerns, too.
Current data sets too often reflect societal biases that reinforce gaps in access and quality of care for disadvantaged groups. Without correction, these data have the potential to cement existing biases into ever-more-powerful AI that will increasingly influence how healthcare operates.
Another important issue, experts say, is that AIs remain prone to “hallucination,” making up “facts” and presenting them as if they are real.
Then there’s the danger that medicine won’t be bold enough. The latest AI has the potential to remake healthcare top to bottom, but only if given a chance. The wrong priorities — too much deference to entrenched interests, a focus on money instead of health — could easily reduce the AI “revolution” to an underwhelming exercise in tinkering around the edges.
“I think we’re in this weird space,” Rodman said. “We say, ‘Wow, the technology is really powerful.’ But what do we do with it to actually change things? My worry, as both a clinician and a researcher, is that if we don’t think big, if we don’t try to rethink how we’ve organized medicine, things might not change that much.”

Shoring up the ‘tottering edifice’
Five years ago, when asked about AI in healthcare, Isaac Kohane responded with frustration. Teenagers tapping away on social media apps were better equipped than many doctors. The situation today couldn’t be more different, he says.
Kohane, chair of the Medical School’s Department of Biomedical Informatics and editor-in-chief of the New England Journal of Medicine’s new AI initiative, describes the abilities of the latest models as “mind boggling.” To illustrate the point, he recalled getting an early look at OpenAI’s GPT-4. He tested it with a complex case — a child born with ambiguous genitalia — that might have stymied even an experienced endocrinologist. Kohane asked GPT-4 about genetic causes, biochemical pathways, next steps in the workup, even what to tell the child’s parents. It aced the test.
“This large language model was not trained to be a doctor; it’s just trained to predict the next word,” Kohane said. “It could speak as coherently about wine pairings with a vegetarian menu as diagnose a complex patient. It was truly a quantum leap from anything that anybody in computer science who was honest with themselves would have predicted in the next 10 years.”

Isaac Kohane, chairman of Harvard Medical School’s Department of Biomedical Informatics and editor-in-chief of the New England Journal of Medicine’s new AI journal
“Having an instant second opinion after any interaction with a clinician will change, for the better, the nature of the doctor-patient relationship.”
transcript
Transcript:
ISAAC KOHANE: I am most excited that AI is going to transform the patient experience. Just merely having an instant second opinion after any interaction with a clinician will change to the better the nature of the doctor-patient relationship. Also, with regard to what things I fear could go wrong, it’s that parties that do not have the patient’s best interest will be the ones steering the tendencies/biases or prejudices of our new AI companions.
And none too soon. The U.S. healthcare system, long criticized as costly, inefficient, and inordinately focused on treatment over prevention, has been showing cracks. Kohane, recalling a faculty member new to the department who couldn’t find a primary care physician, is tired of seeing them up close.
“The medical system, which I have long said is broken, is broken in extremely obvious ways in Boston,” he said. “People worry about equity problems with AI. I’m here to say we have a huge equity problem today. Unless you’re well connected and are willing to pay literally thousands of extra dollars for concierge care, you’re going to have trouble finding a timely primary care visit.”
Early worries that AI would replace physicians have yielded to the realization that the system needs both AI and its human workforce, Kohane said. Teaming nurse practitioners and physician assistants with AI is one among several promising scenarios.
“It is no longer a conversation about, ‘Will AI replace doctors,’ so much as, ‘Will AI, with a set of clinicians who may not look like the clinicians that we’re used to, firm up the tottering edifice that is organized medicine?’”

Building the optimal assistant
How LLMs were rolled out — to everyone at once — accelerated their adoption, Kohane says. Doctors immediately experimented with eye-glazing yet essential tasks, like writing prior authorization requests to insurers explaining the necessity of specific, usually expensive, treatments.
“People just did it,” Kohane said. “Doctors were tweeting back and forth about all the time they were saving.”
Patients did it too, seeking virtual second opinions, like the child whose recurring pain was misdiagnosed by 17 doctors over three years. In the widely publicized case, the boy’s mother entered his medical notes into ChatGPT, which suggested a condition no doctor had mentioned: tethered cord syndrome, in which the spinal cord binds inside of the backbone. When the patient moves, rather than sliding smoothly, the spinal cord stretches, causing pain. The diagnosis was confirmed by a neurosurgeon, who then corrected the anatomic anomaly.
One of the perceived benefits of employing AI in the clinic, of course, is to make doctors better the first time around. Greater, faster access to case histories, suggested diagnoses, and other data is expected to improve physician performance. But plenty of work remains, a recent study shows.
Research published in JAMA Network Open in October compared diagnoses delivered by an individual doctor, a doctor using an LLM diagnostic tool, and an LLM alone. The results were surprising, showing an insignificant improvement in accuracy for the physicians using the LLM — 76 percent versus 74 percent for the solitary physician. More surprisingly, the LLM by itself did best, scoring 16 percentage points higher than physicians alone.
Rodman, one of the paper’s senior authors, said it’s tempting to conclude that LLMs aren’t that helpful for doctors, but he insisted that it’s important to look deeper at the findings. Only 10 percent of the physicians, he said, were experienced LLM users before the study — which took place in 2023— and the rest received only basic training. Consequently, when Rodman later looked at the transcripts, most used the LLMs for basic fact retrieval.
“The best way a doctor could use it now is for a second opinion, to second-guess themselves when they have a tricky case,” he said. “How could I be wrong? What am I missing? What other questions should I ask? Those are the ways, we know from psychological literature, that complement how humans think.”
Among the other potential benefits of AI is the chance to make medicine safer, according to David Bates, co-director of the Center for Artificial Intelligence and Bioinformatics Learning Systems at Mass General Brigham. A recent study by Bates and colleagues showed that as many as one in four visits to Massachusetts hospitals results in some kind of patient harm. Many of those incidents trace back to adverse drug events.
“AI should be able to look for medication-related issues and identify them much more accurately than we’re able to do right now,” said Bates, who is also a professor of medicine at the Medical School and of health policy and management at the Harvard T.H. Chan School of Public Health.

David Bates, co-director of the Center for Artificial Intelligence and Bioinformatics Learning Systems at Mass General Brigham
“AI has a tendency to hallucinate, and that is a worry, because we don’t want things in people’s records that are not really there.”
transcript
Transcript:
DAVID BATES: AI has a great deal of promise. Burnout is rampant in many parts of medicine, especially, for example, primary care, and artificial intelligence will make many routine tasks like documentation much faster. Ambient scribes in particular are already doing that. There are also concerns about things going wrong. There are many ways that any time gains could be used, for example, just to increase physician workloads. It’s also very important that medical records be correct, and AI has a tendency to hallucinate, and that is a worry, because we don’t want things in people’s records that are not really there.
Another opportunity stems from AI’s growing competence in a mundane area: notetaking and summarization, according to Bernard Chang, dean for medical education at the Medical School.
Systems for “ambient documentation” will soon be able to listen in on patient visits, record everything that is said and done, and generate an organized clinical note in real time. When symptoms are discussed, the AI can suggest diagnoses and courses of treatment. Later, the physician can review the summary for accuracy.
Automation of notes and summaries would benefit healthcare workers in more than one way, Chang said. It would ease doctors’ paperwork load, often cited as a cause of burnout, and it would reset the doctor-patient relationship. One of patients’ biggest complaints about office visits is the physician sitting at the computer, asking questions and recording the answers. Freed from the note-taking process, doctors could sit face-to-face with patients, opening a path to stronger connections.
“It’s not the most magical use of AI,” Chang said. “We’ve all seen AI do something and said, ‘Wow, that’s amazing.’ This is not one of those things. But this program is being piloted at different ambulatory practices across the country and the early results are very promising. Physicians who feel overburdened and burnt out are starting to say, ‘You know what, this tool is going to help me.’”

Bernard Chang, Harvard Medical School Dean for Medical Education
“I see AI as a transformative tool on par with the availability of the internet in terms of its effect on medicine and medical education.”
transcript
Transcript:
BERNARD CHANG: What most excites me about AI’s promise in medicine is that these technological tools will allow physicians to spend more time on the human aspects of the profession, which is sorely needed, while facilitating the ability to access information quickly, analyze large amounts of important data, and make the difficult connections necessary to consider the rare diagnoses, the less obvious treatment paradigms, and ultimately the optimal care for patients. In medical education, students can use AI tools to accelerate their learning and move more quickly beyond rote practice to higher levels of cognitive analysis on their way to becoming the most outstanding doctors of the future. Whether things might go long lies in our hands. We need to be cautious about hallucinations and misinformation, bias, an erosion of fundamentals in learning, and an over-reliance on machines. As a society, we need to be mindful of the environmental impacts of the high energy costs involved. On the whole, I see AI as a transformative tool on par with the availability of the internet in terms of its effect on medicine and medical education.

The bias threat
For all their power, LLMs are not ready to be left alone.
“The technology is not good enough to have that safety level where you don’t need a knowledgeable human,” Rodman said. “I can understand where it might have gone aground. I can take a step further with the diagnosis. I can do that because I learned the hard way. In residency you make a ton of mistakes, but you learn from those mistakes. Our current system is incredibly suboptimal but it does train your brain. When people in medical school interact with things that can automate those processes — even if they’re, on average, better than humans — how are they going to learn?”
Doctors and scientists also worry about bad information. Pervasive data bias stems from biomedicine’s roots in wealthy Western nations whose science was shaped by white men studying white men, says Leo Celi, an associate professor of medicine and a physician in the Division of Pulmonary, Critical Care and Sleep Medicine at Beth Israel Deaconess Medical Center.

Leo Celi, associate professor of medicine and a physician in Beth Israel Deaconess Medical Center’s Division of Pulmonary, Critical Care and Sleep Medicine
“We need to design human AI systems, rather than build algorithms. We have to be able to predict how humans will mess up.”
transcript
Transcript:
LEO CELI: AI could be the Trojan horse we’ve been waiting for to redesign systems from a clean slate. I am talking about systems for knowledge creation, health care delivery, and eduction, which are all quite broken. The legacy of AI is to make us better critical thinkers, by putting data at the front and center, and making the breadth and the depth of the problems crystal clear. But we need to design human AI systems, rather than build algorithms. We have to be able to predict how humans will mess up. The designs should be similar those of systems for aviation, road safety, space, nuclear power generation. We need psychologists, cognitive scientists, behavioral economists, anthropologists to design human AI systems.”
“You need to understand the data before you can build artificial intelligence,” Celi said. “That gives us a new perspective of the design flaws of legacy systems for healthcare delivery, legacy systems for medical education. It becomes clear that the status quo is so bad — we knew it was bad and we’ve come to accept that it is a broken system — that all the promises of AI are going bust unless we recode the world itself.”
Celi cited research on disparities in care between English-speaking and non-English speaking patients hospitalized with diabetes. Non-English speakers are woken up less frequently for blood sugar checks, raising the likelihood that changes will be missed. That impact is hidden, however, because the data isn’t obviously biased, only incomplete, even though it still contributes to a disparity in care.
“They have one or two blood-sugar checks compared to 10 if you speak English well,” he said. “If you average it, the computers don’t see that this is a data imbalance. There’s so much missing context that experts may not be aware of what we call ‘data artifacts.’ This arises from a social patterning of the data generation process.”
Bates offered additional examples, including a skin cancer device that does a poor job detecting cancer on highly pigmented skin and a scheduling algorithm that wrongly predicted Black patients would have higher no-show rates, leading to overbooking and longer wait times.
“Most clinicians are not aware that every medical device that we have is, to a certain degree, biased,” Celi said. “They don’t work well across all groups because we prototype them and we optimize them on, typically, college-age, white, male students. They were not optimized for an ICU patient who is 80 years old and has all these comorbidities, so why is there an expectation that the numbers they represent are objective ground truths?”
The exposure of deep biases in legacy systems presents an opportunity to get things right, Celi said. Accordingly, more researchers are pushing to ensure that clinical trials enroll diverse populations from geographically diverse locations.
One example is Beth Israel’s MIMIC database, which reflects the hospital’s diverse patient population. The tool, overseen by Celi, offers investigators de-identified electronic medical records — notes, images, test results — in an open-source format. It has been used in 10,000 studies by researchers all around the world and is set to expand to 14 additional hospitals, he said.

Age of agility
As in the clinic, AI models used in the lab aren’t perfect, but they are opening pathways that hold promise to greatly accelerate scientific progress.
“They provide instant insights at the atomic scale for some molecules that are still not accessible experimentally or that would take a tremendous amount of time and effort to generate,” said Marinka Zitnik, an associate professor of biomedical informatics at the Medical School. “These models provide in-silico predictions that are accurate, that scientists can then build upon and leverage in their scientific work. That, to me, just hints at this incredible moment that we are in.”
”What is becoming increasingly important is to develop reliable, faithful benchmarks or techniques that allow us to evaluate how well the outputs of AI models behave in the real world.”
Marinka Zitnik
Zitnik’s lab recently introduced Procyon, an AI model aimed at closing knowledge gaps around protein structures and their biological roles.
Until recently, it has been difficult for scientists to understand a protein’s shape — how the long molecules fold and twist onto themselves in three dimensions. This is important because the twists and turns expose portions of the molecule and hide others, making those sites easier or harder for other molecules to interact with, which affects the molecule’s chemical properties.

Marinka Zitnik, assistant professor of biomedical informatics
“Insights from research labs don’t always translate into effective treatments, and AI could amplify this gap if it’s not designed to bridge it.”
transcript
Transcript:
MARINKA ZITNIK: I am most excited about AI’s ability to learn and innovate on its own, instead of just analyzing existing knowledge. AI can generate new ideas, uncover hidden patterns, and propose solutions that humans might not consider. In biomedical research and drug development, this means AI could design new molecules, predict how these molecules interact with biological systems, and match treatments to patients with greater accuracy. By integrating information across genetics, proteins, all the way to clinical outcomes, AI can speed up discoveries in ways that was previously not possible. A major challenge, however, is that AI models tend to focus on problems that have already been extensively studied, while other important areas receive less attention. If we are not careful, medical advances may become concentrated in familiar areas, while other conditions remain under-explored, not because they are less important, but because there is less existing knowledge to guide AI systems. Another issue is that AI-driven drug design and treatment recommendations often rely on experimental findings generated in research labs that might not fully capture the complexity of real patients. Insights from research labs don’t always translate into effective treatments, and AI could amplify this gap if it’s not designed to bridge it. The opportunity is to build AI that makes discoveries and ensure that those discoveries lead to meaningful advances, bringing innovation to areas where it’s needed most.
Today, predicting a protein’s shape — down to nearly every atom — from its known sequence of amino acids is feasible, Zitnik said. The major challenge is linking those structures to their functions and phenotypes across various biological settings and diseases. About 20 percent of human proteins have poorly defined functions, and an overwhelming share of research — 95 percent — is devoted to just 5,000 well-studied proteins.
“We are addressing this gap by connecting molecular sequences and structures with functional annotations to predict protein phenotypes, helping move the field closer to being able to in-silico predict functions for each protein,” Zitnik said.
A long-term goal for AI in the lab is the development of “AI scientists” that function as research assistants, with access to the entire body of scientific literature, the ability to integrate that knowledge with experimental results, and the capacity to suggest next steps. These systems could evolve into true collaborators, Zitnik said, noting that some models have already generated simple hypotheses. Her lab used Procyon, for example, to identify domains in the maltase glucoamylase protein that bind miglitol, a drug used to treat Type 2 diabetes. In another project, the team showed that Procyon could functionally annotate poorly characterized proteins implicated in Parkinson’s disease. The tool’s broad range of capabilities is possible because it was trained on massive experimental data sets and the entire scientific literature, resources far exceeding what humans can read and analyze, Zitnik said.
The classroom comes before the lab, and the AI dynamic of flexibility, innovation, and constant learning is also being applied to education. The Medical School has introduced a course dealing with AI in healthcare; added a Ph.D. track on AI in medicine; is planning a “tutor bot” to provide supplemental material beyond lectures; and is developing a virtual patient on which students can practice before their first nerve-wracking encounter with the real thing. Meanwhile, Rodman is leading a steering group on the use of generative AI in medical education.
These initiatives are a good start, he said. Still, the rapid evolution of AI technology makes it difficult to prepare students for careers that will span 30 years.
“The Harvard view, which is my view as well, is that we can give people the basics, but we just have to encourage agility and prepare people for a future that changes rapidly,” Rodman said. “Probably the best thing we can do is prepare people to expect the unexpected.”