News+

Harvard and MIT release de-identified learning data from open online courses

1 min read

A research team from Harvard University and MIT has released its third and final promised deliverable — the de-identified learning data — relating to an initial study of online learning based on each institution’s first-year courses on the edX platform.

Specifically, the dataset contains the original learning data from the 16 HarvardX and MITxcourses offered in 2012-13 that formed the basis of the first HarvardX and MITx working papers (released in January) and underpin a suite of powerful open-source interactive visualization tools (released in February).

The dataset was subjected to a careful process of de-identification: removing personally identifiable information, using best practices including aggregation, anonymization via random identifiers, and blurring to reduce individuality of sensitive data fields, among other techniques.