David Parkes and Francesca Dominici.

Harvard Data Science Initiative co-directors David Parkes and Francesca Dominici during the HDSI 2017 launch. Today, a collaboration between HDSI and Amazon Web Services was announced.

File photo by Kris Snibbe/Harvard Staff Photographer

Science & Tech

Applying cloud computing to major global problems

9 min read

Harvard Data Science Initiative, Amazon Web Services join to boost research, unlock solutions to health, climate, economic challenges

Today, the Harvard University Data Science Initiative announced the AWS Impact Computing Project at the HDSI, a collaboration with Amazon Web Services (AWS) aimed at reimagining data science to identify potential solutions for society’s most complex challenges.

The alliance, enabled by Harvard’s Office of Technology Development, will support current and future faculty projects by creating new data solutions that will amplify the University’s social impact and transform research capabilities.

Since its launch in 2017, the HDSI has advanced data science methodology and application across Harvard and sought out opportunities presented through new data streams to better understand society. It pulls together experts across the University to work on large and complex evolving issues through the lenses of artificial intelligence, computer science, statistics, and with attention to policy and context. The AWS Impact Computing Project at the HDSI will empower faculty, students, and researchers at Harvard with new funding and new capabilities to accelerate their work and expand its impact, unlocking new possibilities for learning and discovery through data.

“This alliance will bring together researchers from across the University, representing multiple disciplines and areas of focus, with collaborators at AWS. They will address some of the most consequential problems facing humanity and the world. Whether the topic is socioeconomic and racial aspects of health disparities, or the climate crisis, the application of novel data-science approaches will lead to deeper insights and better designed solutions,” said Harvard Provost Alan M. Garber.

The Gazette spoke with HDSI co-directors Francesca Dominici, the Clarence James Gamble Professor of Biostatistics, Population and Data Science, and David Parkes, the George F. Colony Professor of Computer Science, to learn more about the alliance and what it means for the field of data science and how it will help further HDSI’s research and teaching mission.


Francesca Dominici and David Parkes

GAZETTE: The AWS Impact Computing Project at the HDSI will be a catalyst for impact computing. How would you describe that field to someone who has never heard of it before?

DOMINICI: Impact computing reimagines data science with the goal of addressing and finding potential solutions to society’s biggest challenges. The idea is that when you’re thinking of large, complex, problems in society, they require sophisticated solutions that harness new types of data, new ways of making that data usable, new methods to examine that data, and new ways of leveraging underlying computing power – all with the input of the populations and communities that those solutions are built for.

PARKES: We’re thinking of issues like social determinants of health, mass migration, and economic resilience, to name a few. To study any of these challenges requires working with complex data, both historical and that which is generated in real time. Harvard faculty are already leading advances in extracting scientific understanding from these data, but we’ve heard from many of them that they’re eager to do more and work at new scale. Our vision of impact computing will require building new coalitions between academia, industry, governments, and NGOs to create better outcomes for stakeholders everywhere.

GAZETTE: How will the HDSI and this new project with AWS help advance impact computing and data science more broadly? And what are the gaps in the field that this project will be able to fill because of this collaboration?

DOMINICI: We’re really taking a bottom-up approach by beginning with our faculty and research colleagues and listening to the data bottlenecks that they face, be it access to better data, resolving issues around data ownership, or access to computational scale. AWS is already a major player in data science and high-performance computing, especially in building solutions to real-world challenges. By bringing these groups together, our goal is to help faculty unstick these bottlenecks, for example improving access to data or building data environments that enable new kinds of work.

PARKES: I’d add that those solutions will be designed so that others at Harvard and beyond can incorporate them into their work in the future. This is a new kind of impact. In one way, you can view the cloud computing that this alliance allows us to engage in as a democratizing force — it enables access to data, access to tools, access to the high-performance computing environments that are required for this ambitious work. We hope the project will act as an accelerant for Harvard as a whole and beyond. I think there is an incredible opportunity here to leverage investments that have been made around campus — for example, the Kempner Institute and the Salata Institute — that directly address hard questions such as understanding natural and artificial intelligence or seeking durable climate solutions. The HDSI works side-by-side with these groups to advance the data science that will help realize their goals. And our alliance further supports and catalyzes that work.

“This is an opportunity, through data science, to simulate large complex problems … Combined with the resources of Harvard, with our expertise in computer science, and fields like economics, this collaboration means we’re in a strong position to have an impact on these issues.”

David Parkes

GAZETTE: So, what are the top societal challenges that HDSI will tackle in this collaboration?

PARKES: Prioritizing where to begin has been a challenge in itself — there are so many issues Harvard faculty are working on, and all of them are urgent. That being said, there are some opportunities that we can see taking shape as quickly as within the next few months. One challenge that we’re eager to explore is food insecurity related to droughts and climate change. Our colleague Peter Huybers, in Earth and Planetary Sciences, is at the forefront of interpreting satellite data, and wants to use this data to understand how climate change impacts food disparities in places like Madagascar, where the issue is so far-reaching, but the data that can be used to solve the problem — from satellites, from maps of agriculture, from yield outcomes — is incomplete.

We’ve also learned that there are similar efforts being conducted elsewhere in Africa, including in South Africa. We’re now bringing these groups together to share learnings, and hope that in the near term, we’ll be able to build a new community that can drive systematic understanding of the drivers of food insecurity and address crises due to famine.

DOMINICI: I’d also mention Caroline Buckee from epidemiology, who is studying global problems such as crisis-driven mass migration, including people fleeing the war in Ukraine. There are massive data-engineering problems associated with tracking people as they cross borders, associated with geographic scale, working through different regulatory environments, and handling data privacy and important ethical concerns. At the same time, there is an urgent need to use data effectively in responding to humanitarian crises, and driving good policy decisions, and part of this is an urgent need to be able to run statistical methods efficiently and at a new scale.

But beyond these two examples, there is huge interest in broad topics like understanding complex economic systems through multi-agent modeling, studying the drivers of trust and mistrust, and finding major social determinants of health, to name a few. Echoing David, there are so many urgent societal problems where we can hope to make positive impacts by enabling access to better technology, better data, and better partners for our faculty through this project. It’s really changing the speed with which Harvard faculty can respond to challenges as they arise.

“This is the future of data science impact computing that we envision building with AWS — more faculty involvement, larger projects, and at a faster speed, and being able to bring people together who would not otherwise be working together.”

Francesca Dominici

GAZETTE: What difference will the gift component of the AWS Impact Computing Project at the HDSI make?

PARKES: The gift will allow HDSI to commit to a vision of building the new field of impact computing, and doing so in a way that respects data, respects methods, and respects the challenges themselves. They understood that HDSI needed flexible support to continue building a community beyond what we’ve done so far — for example, investing in undergraduate research programs or supporting graduate, postdoctoral, and faculty work through open funding calls. The data-science community at Harvard is hungry for these kinds of training and education opportunities. We see these activities as symbiotic with the new impacting computing projects that we will be undertaking.

DOMINICI: To expand on David’s point, flexibility is key. We know we have only begun to scratch the surface of what it means to “do” impact computing. By committing their support, this alliance enables us to stay open to pursuing new opportunities that build the field of impact computing when they arise, for example piloting new projects, bringing in dedicated data-science expertise, or convening the community through events that celebrate and further the incredible work Harvard faculty are doing.

GAZETTE: How does this project build upon the work that’s been supported by the Data Science Initiative in its first five years already?

DOMINICI:  The HDSI launched in 2017 with the explicit goal of uniting computer scientists, statisticians, and domain experts to derive meaningful and actionable insights that shape the new science of data. We’re proud of the community of faculty that we’ve brought together, who are doing work on a wide range of topics, from understanding and preserving democracy, to improving AI modeling of chest X-rays, to creating new methods that can establish causal effects beyond correlation. But it was the overlapping crises of 2020 — the emergence of COVID-19 and our national reckoning with systemic racism in the aftermath of the murder of George Floyd — that allowed Harvard’s data-science community to use the full spectrum of our ability to amplify the impact of research, moving in real time with community stakeholders. This is the future of data science impact computing that we envision building with AWS — more faculty involvement, larger projects, and at a faster speed, and being able to bring people together who would not otherwise be working together.

On Nov. 15-16, HDSI will be hosting is a conference showcasing data science in research and education through panels, keynotes, workshops, and tutorials featuring speakers from across Harvard, academia, and industry. Please go to www.hdsiconference.org for more information.