What crowdsourced big data may be able to tell us about COVID
Health information self-reported by app fuels infection prediction model
Among the pandemic’s biggest challenges for public health experts have been just how novel it is, how hard it’s been to come by sufficient useful data, and how few tools scientists have for accurately tracking and predicting its spread.
A recent study looking at information gathered by an app that 500,000 people use to log daily symptoms, health status, and exposures to COVID-19 hints at the possible role crowdsourced big data can play in understanding and predicting the spread of infection.
The analysis looked at self-reported data from the How We Feel app collected during April and May to determine which populations were likeliest to have been tested for the virus, the prevalence of social distancing and mask wearing, and what factors were most associated with people who tested positive in that period, such as key symptoms, exposure risks, preexisting medical conditions, and demographic information.
The study showed that Black and Latinx users, frontline health care workers, and essential workers had double the risk for infection than other groups after adjusting for social-economic and preexisting medical conditions, and that those same groups, along with people who were symptomatic, were likelier than others to be tested during April and May.
According to the researchers, this was a double-edged sword, because while it meant sick people were being tested, it also meant asymptomatic cases were likely being missed due to strict testing guidelines that called for only those with symptoms to be checked. The team also found that 36 percent of app users who tested positive reported symptoms not listed by the Centers for Disease Control during the April-May timeframe, or had no symptoms at all.
“The first message from the paper is that we should provide more widespread testing beyond the vulnerable groups and symptomatic subjects,” said Harvard Professor Xihong Lin, one of the paper’s senior authors. “Those asymptomatic and mildly symptomatic cases are still infectious, so it’s important to capture those people early and to isolate them in order to avoid the spread.”
The scientists then took their results and, using novel statistical and machine learning methods, lay the foundation for models that can predict who is likely to test positive for COVID-19. The hope is that predictive models like these can soon be used to help overcome testing capacity limitations and identify disease hotspots.
Researchers found their models, which were cross-validated but need further analysis, to have about an 80 percent chance of forecasting whether an individual will test positive or negative.
The study was published in Nature Human Behavior by a team of 36 researchers from Harvard, MIT, the Broad Institute of MIT and Harvard, and a number of other institutions.
The app is the first product from the How We Feel Project, a nonprofit created from a collaboration involving Lin, professor of biostatistics at the Harvard T.H. Chan School of Public Health and professor of statistics at the Faculty of Arts and Sciences; Feng Zhang of the Broad Institute; Gary King, Albert J. Weatherhead III University Professor and director of the Institute for Quantitative Social Science; and Pinterest CEO Ben Silbermann. Others working on the project include researchers from Cornell, Stanford, University of Pennsylvania, University of Maryland School of Medicine, Howard Hughes Medical Institute, and the Bill & Melinda Gates Foundation. Teams of independent volunteers also lent their assistance.
The researchers say the idea for the study and the app, which launched in April, sprang from a need to help fill sizeable information gaps on the rampant spread of the virus in the U.S.
“Understanding the features of the COVID-19 epidemic in the U.S. by analyzing large, real data is pivotal for guiding evidence-based policies on surveillance, screening, and control measures,” Lin said. “The findings from the analysis of the How We Feel data will help achieve this goal.”
The findings argue for the importance of widespread testing, especially because of the group’s findings on asymptomatic and mildly symptomatic cases, which the researchers said were likely underestimated. Lin notes that in a recent modeling study she and another team of researchers conducted for the outbreak in Wuhan, China, they found 87 percent of cases went undetected. A recent CDC serological survey found similar results in the U.S.
Coincidentally, the results of the How We Feel study published the same week in August that the CDC modified testing guidelines to say people without symptoms didn’t need a test.
When it came to social distancing and using masks, the study found that while a substantial portion of users, 61 percent, ventured outside their homes on a daily basis from April to May the majority reported complying with guidelines on distancing and face coverings.
Some trends troubled researchers, however.
Seven percent of those who received a positive test ignored it and went to work, though the vast majority reported quarantining at home for two to seven days. Researchers also saw that 3 percent of those who tested positive for COVID-19, 10 percent who tested negative, and 13 percent who weren’t tested went to work without masks. Those who had been tested, both positive and negative, said they came into close contact with a median of one and four people respectively within three days.
“Given the evidence that mask wearing and social distancing is effective in slowing or preventing the spread of COVID-19, there was room to do better at the time in some parts of the country,” said William Allen, a junior fellow in the Harvard Society of Fellows and one of the paper’s lead authors. These numbers are likely not representative of the situation now, he said.
Along with Allen, numerous postdoctoral fellows and graduate students at Harvard, MIT, and the Broad Institute worked on the study, including co-first authors Han Altae-Tran, James Briggs, Xin Jin, Glen McGee, and Andy Shi.
Other findings from the How We Feel data showed that household and community exposure were major factors in infection. People living with someone who was infected were at 19 times the risk of testing positive themselves, and those exposed to someone in the community with the virus were at almost four times the risk. People living in high-density neighborhoods had almost double the risk of testing positive.
Forty percent of respondents who reported losing smell, taste, or both and were tested for the virus received positive results. The finding adds to growing evidence that the symptom is the greatest predictor of a positive test and that it could be used to distinguish the virus from the common flu.
While the results were provocative, the team noted the limitations of their study. Volunteers were self-selected, predominantly women (80 percent), and thus didn’t represent the general population. Also, a disproportionate number were from either Connecticut or California. How We Feel has a partnership with Connecticut, and Pinterest is based in California.
Researchers are currently focused on analyzing data from over the summer, further validating the prediction models they created, and studying data from a new emotional well-being module in the app that looks at mental health.
“[How We Feel] has continued to grow since our initial analysis,” said Zhang, the paper’s other senior author, who, with Lin, supervised all aspects of the work. “We are looking forward to sharing this rich data set with others and continuing to mine it for important insights that can help stop the spread of COVID-19.”