The proof of Harvard’s growing interest in data science became even clearer the third week of January when the inaugural session of the Harvard DataFest conference reached capacity (at 166), with several dozen students, researchers, and staff waitlisted.

“We were able to create new course materials, and build awareness of existing Harvard resources, for developing skills in working with data,” said Mercè Crosas, chief data science and technology officer at the Institute for Quantitative Social Science (IQSS), who organized the workshop.

Sifting data, seeking justice

Showcasing the depth and breadth of data science training at Harvard, the conference was a joint effort of the Data Science Services, Program Survey Research, and Center for Geographic Analysis groups at IQSS, the Research Computing groups at Harvard Medical School, the Faculty of Arts and Sciences and Harvard Business School, the Harvard Data Management Working Group, the Harvard Chan Bioinformatics Core, the Office of the Vice Provost for Advances in Learning (VPAL), the Harvard Libraries, and Digital Arts and Humanities at Harvard.

Ista Zahn (front of room), IQSS data science specialist, walks participants through the basics of web scraping and cleaning up messy data. Photo by Dwayne Liburd

Over the course of Jan. 17-18, 32 speakers explored themes such as data cleaning, data workflow management, and data visualization. Sessions ran the gamut from hands-on workshops in Python, R, and D3 to expert panel discussions.

“We have an amazing data community with smart people working on data workflows, data curation, data visualization, and frontier-level algorithms, that is really spread out across Harvard, and smart students interested in all of this,” said Professor Dustin Tingley, faculty director at VPAL-Research. “DataFest is amazing because it’s bringing us all together to share and teach. I think only an appearance by Bruce Springsteen could make it better.”

For those who missed DataFest, Crosas had this advice. “You don’t need to wait until the next DataFest to get started with R, Python, and other tools. At IQSS, we offer workshops year-round for all Harvard affiliates through Data Science Services and the Center for Geographic Analysis, partnering with HBS Research Computing Services. Harvard Chan Bioinformatics Core and the Harvard Library [data visualization team and Wolbach Library] offer similar workshops.”

More about DataFest and course materials can be found at DataFest2017 and IQSS.

Kareem Carr is a Ph.D. candidate in the Department of Biostatistics at Harvard’s Graduate School of Arts and Sciences and was the instructor of the DataFest workshop on Text Analysis in Python.