A team of researchers recently developed an artificial intelligence model that can predict which coronavirus variants will likely dominate and cause surges. The work was led by Jacob Lemieux, an assistant professor of medicine at Harvard Medical School and Massachusetts General Hospital, and Pardis Sabeti, a member at the Broad Institute of MIT and Harvard, professor of organismic and evolutionary biology at Harvard’s Faculty of Arts and Sciences, and of immunology and infectious diseases at the Harvard T.H. Chan School of Public Health. It also benefits from the work of AI researchers Fritz Obermeyer and Martin Jankowiak, who joined the Broad in 2020 from Uber AI Labs, where they developed a machine-learning model that can handle massive amounts of data and provided a foundation for the latest work. The Gazette spoke with Lemieux and Sabeti about the new AI/machine-learning model, called PyR0 (pie-R-naught) and how it will help in the current pandemic and for diseases to come.
Jacob Lemieux and Pardis Sabeti
GAZETTE: You and colleagues have developed a machine-learning model that predicted the emergence of at least two particularly transmissible SARS-CoV-2 variants that caused a lot of illness globally. Can you tell us a little bit about that?
LEMIEUX: The clearest prediction that the model made was that, among the Omicron sub-lineages, BA.2 was the fittest. At the time that we analyzed the data, it was a BA.1 (Omicron) epidemic and BA.1 was the variant that everyone was focused on — in South Africa initially and then just about everywhere else in the world. The model made a fairly strong and confident prediction that BA.2 was fitter. That was based on BA.2’s dynamics in a few locations, mainly India and Denmark, which turned out to be quite accurate. Since that time, BA.2 has taken over BA.1 just about everywhere, and BA.4 or 5 are actually sub-lineages of BA.2. That was a vote of confidence in the model’s ability to at least forecast dynamics.
We also conducted an analysis — looking in retrospect — of what the model would have said is going to happen in different regions and globally. And the model would have picked up the alpha variant, B.117, and it would have picked up delta, around the same time that these lineages were picked up by leading, highly collaborative, and very labor-intensive surveillance efforts. So, we think it’s complementary to but doesn’t replace people staring really hard at the data and fitting focused models on individual regions. The nice thing is that it can compute all the data at once and aggregate information across regions, which is something that can be hard for a single person to do. It’s a useful tool in that regard.
GAZETTE: With BA.4 and BA.5 taking off this summer, what have recent runs told you about the course of the pandemic to come?
LEMIEUX: The model currently suggests that BA.2.75 is one to watch, although it doesn’t think the fitness differences are too great relative to other circulating variants. This suggests BA.2.75 may take over in some places but probably won’t change the pandemic in a major way.
GAZETTE: Does it say anything about disease severity?
LEMIEUX: Nothing. Growth rate is just one microbial phenotype. but there are so many other microbial phenotypes, like disease severity, that probably also have a genetic basis and that hopefully we’re going to be able to figure out using approaches like this. There’s already been a lot of work in this area for drug resistance, that’s been the one where we’ve had a good link between microbial genotype and microbial phenotype. So, I’m optimistic that with the growing scale of data and the new algorithmic tools and increasing computing power, we’ll be able to tackle some of these questions.