Keynote speaker and Digital Public Library of America Executive Director Dan Cohen said that the digital age is moving libraries from static repositories to dynamic platforms — “modern discovery systems,” he said, that are open, interoperable, international, and poised to absorb the “mass digitization” the future will bring.

Jon Chase/Harvard Staff Photographer

Science & Tech

Books meet bytes

5 min read

At Radcliffe, thoughts on the future of digital library collections

The world of libraries is being shaken by the digital age, changing patterns of readership, information retrieval, perhaps even brain circuitry.

The dance toward the digital drew archivists from around the world to Harvard on Thursday. The occasion was a two-day workshop on technology and archival processing at the Radcliffe Institute for Advanced Study.

First was a look at the Digital Public Library of America, a free Internet platform that links U.S. libraries, archives, and museums. So far, viewers have access to about 6 million items, three times more than when DPLA went live just 11 months ago. A one-click link to all these books, pictures, and artifacts depends on 11 service hubs (Harvard is one) and about 1,200 smaller content suppliers in 15 states.

The very idea of DPLA had been conceived in the same room where this week’s workshop was held, the main hall of the Knafel Center. In 2010, a Radcliffe seminar convened by Carl H. Pforzheimer University Professor and Director of the University Library Robert Darnton brought 40 experts together around an idea dreamed of for two decades: a collective American library open to everyone on the Internet. “It really is great,” said Dan Cohen, “to be back at the place this all began.”

A historian, Cohen is DPLA’s first executive director and he delivered the workshop’s first-day keynote address. Among his messages: The digital age is moving libraries from static repositories to dynamic platforms — “modern discovery systems,” he said, that are open, interoperable, international, and poised to absorb the “mass digitization” the future will bring.

To round out the afternoon, the attendees heard a panel of young scholars — two archivists and two historians — ruminate on a more technical issue: the digital transformation of “finding aids.” That may seem like a walk in the weeds, but the idea is simple. Finding aids are the tools — catalogs, calendars, and annotated lists — that describe the contents of a particular collection. Without them, researchers would be lost in a sea of paper, without a moon or stars or an astrolabe.

Scholars Rhae Lynn Barnes, Maureen Callahan, Suzanne Kahn, and Trevor Owens asked: Are finding aids a technology the future will need? (One subtext being that they take a lot of time — that is, money.)

The collective answer, strained through caveats and jargon, was: Yes. The future of libraries, and the archives at the heart of them, will likely be a hybrid of old and new. On the analog side, archivists will still investigate collections thoroughly enough to describe contents, provide context, and even supply a sense of emotional fullness. “This is a choice to guide rather than map,” said Callahan, an archivist at New York University. She described finding aids as “sense-making.”

Kahn, a fifth-year doctoral student in history at Columbia University, is a veteran researcher at a half-dozen prominent archives. The experience left her with the impression that “it’s really invaluable, what archivists do,” she said. The core of that is the finding aid, a tradition that Kahn said was praised by every student in a senior thesis seminar, even though they “live online absolutely more than I do.”

Barnes, a Ph.D. candidate in history at Harvard, is a champion of the digital realm. She is a co-founder of U.S. History Scene, a curriculum website, and a believer in digital archives’ power to “get out new stories in new ways” and “complicate the canon” with multimedia holdings that in turn will require multimedia finding aids. At the same time, she said of old and new finding aids, “there’s no reason those two systems can’t coexist.”

The emerging technology of those new finding aids is the backbone of an invitation-only workshop today. In the morning: a session on optical character recognition technologies, which have the potential to read handwritten documents, led by plenary speaker Lambert Schomaker, a Dutch professor of artificial intelligence. Schomaker directs the landmark Monk software project on machine-recognition of handwriting.

An afternoon session will investigate advances in digital technologies for facial recognition (to scan digitally archived images) and speech recognition (for the new frontier of archived audio). Introducing it will be Cat Holbrook, a manuscript cataloger and processor at the Radcliffe Institute’s Schlesinger Library.

Tools like these are right within the digital future that Owens imagines for archives — one that includes robotlike programs that he said will do some of the heavy lifting in the realm of finding aids. “The value archivists add can be supplemented by other means,” he said. That includes, perhaps, a future in which the task of “summarizing and contextualizing records” for finding aids is left to what Owens called “cyborg overlords.”

He outlined how archivists might unpack and organize an acquisition from the digital realm. His test case: 300 floppy disks containing 19,000 documents in WordPerfect, a technology dating back 20 years. Future archivists might rely on topic modeling, a probabilistic strategy used to uncover thematic structures within large blocks of text.

But in the end, the primacy of the archivist stands. “Topics don’t mean anything until human beings interpret them,” he said. “Most important is the judgment call of the archivist.”

Still, won’t the future be cool, even in libraries? Owens, a millennial who displayed affection for the word “awesome” during the panel, likened digital tools for the archivist to a means of enhancing traditional analog skills. Imagine, he said, “a mechanized shirt of armor that extends your capabilities.”