At Dataverse Community Meeting, an emphasis on data quality

Group photo

More than 180 attendees gathered for the Dataverse Community Meeting last month. Photo by Dwayne Liburd

3 min read

During the fifth annual Dataverse Community Meeting from June 19 to 21, more than 180 participants, representing over 70 universities and research organizations from around the world, gathered in Harvard’s CGIS South building to learn about, discuss, and improve Dataverse, a software platform for publishing, citing, and archiving research data. Led by a team at Harvard’s Institute for Quantitative Social Science (IQSS), a growing global community of developers, librarians, archivists, and researchers develop the open source software, which is used by 46 production sites on six continents to host data repositories.

Attendees and speakers held presentations, workshops, panels, and an ideathon/hackathon, working on issues related to reproducibility of research, scaling outreach programs, software integrations for increasing Dataverse’s data storage options, and enhancing interoperability between data repositories and tools for visualization and curation.

“One of the hardest but most essential elements of an open source software project is community engagement,” said Merce Crosas, chief data science & technology officer at IQSS and university research data officer at the office of the vice provost for research. “This year’s community meeting has shown that community engagement is an unquestionable success of the Dataverse project. While we celebrate our project’s success, we are also reminded of the importance of our work and the work we have ahead. Through Dataverse repositories now hosted on six continents, we are together making data accessible to more and more people, enabling them to verify research results, develop evidence-based decisions, and solve future challenges with high-quality, comprehensive data.” 

Crosas and Jonathan Crabtree spoke about the Global Dataverse Community Consortium, which formed a year ago with Crosas, Crabtree (UNC), and Peter Doorn (DANS) as chairs. The consortium has secured bulk pricing for registering persistent identifiers for Dataverse repositories and plans to provide services to support the Dataverse community.

Attendees heard from Martha Whitehead, who was just two weeks into her tenure as vice president for the Harvard Library, and Roy E. Larsen, librarian for the Faculty of Arts and Sciences. Drawing on experiences as vice provost and university librarian at Queen’s University in Canada, Whitehead shared her vision for the library’s role in influencing the layers of policy, infrastructure, and services needed to facilitate data creation, use, and preservation.

A look at the agenda and presentation slides also show how the community’s work continues to evolve with an emphasis on increasing the quality of shared data. Yin Shenqin, director of Science Data Center at Fudan University in Shanghai, spoke about the Shanghai Municipal Education Commission’s highly distributed deployment of Dataverse installations across dozens of universities and how they’ve scaled educational initiatives that help researchers and administrators improve how they share data.

Ceilyn Boyd, Harvard Library’s research data program manager, moderated a panel discussion with administrators of four Dataverse repositories, including Harvard’s free and open data repository Harvard Dataverse. The administrators spoke about the benefits and challenges of different curation models.

To learn more about Dataverse and how to get involved, visit the Dataverse Project.