Harvard and MIT researchers reflect on open data in MOOCs

2 min read

A follow-up study led by a joint team of Harvard and MIT researchers explores the promise and perils of de-identifying learner data from MOOCs (massive open online courses) and offers recommendations of how to balance privacy with open data.

The dataset (made available in May) contains the original learning data from the 16 HarvardX and MITx courses offered in 2012-13 that formed the basis of the first HarvardX and MITx working papers (released in January) and underpin a suite of powerful open-source interactive visualization tools (released in February).

Led by John P. Daries, Senior Research Analyst at MIT (Institutional Research/Office of the Provost), the new report takes a deep dive into the team’s motivations behind efforts to release learner data, the contemporary regulatory framework of student privacy, and their efforts to comply with those regulations in creating an open data set from MOOCs, and some analytical consequences of de-identification.

Published in the online computer magazine ACM Queue, “Quality social science research and the privacy of human subjects requires trust,” is available online.

Beyond just MOOCs and online learning, the team expects their work to help inform broader conversations about the use of open data in the social sciences, motivating either technological solutions or new policies that may allow open access to possibly re-identifiable data while policing the uses of the data.

Daries co-authors are Justin Reich (Harvard), Jim Waldo (Harvard), Elise M. Young (Harvard), Jonathan Whittinghill (Harvard), Daniel Thomas Seaton (MIT), Andrew Dean Ho (Harvard), Isaac Chuang (MIT).