The ability to search the actual text of millions of books — instead of just titles or summaries — will change the way students and academics conduct research, revealing a host of new sources invisible to current search methods, a Harvard University Library official working on the Google project said on March 28.
Dale Flecker, associate director of the Harvard University Library for planning and systems, gave an auditorium full of Harvard library, museum, and information technology administrators an overview of Harvard’s collaboration with Google Inc. to digitize a portion of the University’s enormous collection of library books. He also gave a tour of the capabilities of Google Inc.’s “book search” system, hinting that these capabilities are poised to continue to grow rapidly.
Flecker was just one of several speakers at a half-day conference, “Libraries, Museums, and Instructional Technology Program,” sponsored by the Provost’s Office. The conference, held at the Center for Government and International Studies (CGIS), featured speakers on topics including course Web sites, portals, and other digital resources; speakers on the Harvard University Library’s digital initiative; and breakout sessions on topics such as Geographic Information Systems, the Harvard Map Collection, Presidential Instructional Technology Fellows, and finding digital images at Harvard.
Harvard Chief Information Officer Daniel Moriarty said the conference provided an opportunity for leaders from Harvard’s libraries, museums, and instructional computing groups — which provide digital content and software used by faculty and students — to meet and potentially to collaborate.
Moriarty said the University has an impressive record of “distributed innovation” created by individuals across the University seeking to enhance teaching and research and find new ways to solve old problems. As that process moves forward, Moriarty said, it is important that lines of communication are established and remain open, so that innovations and resources are shared to gain the widest possible benefit.
“What is striking is the sheer amount of activity and innovation that is going on,” Moriarty said in introductory remarks. “I think it is collectively enormously impressive what you all are doing.”
Harvard University Library director and Pforzheimer University Professor Sidney Verba gave a brief history of the Harvard University Library. Verba said the digitization of the library catalog through Hollis created a unified Harvard University Library from the collections held in the dozens of libraries around the University.
Verba said that people congratulate Harvard University Library for creating the Hollis catalog, but he said they really have it backwards. It was the Hollis catalog, by creating a central resource to find books and other works held in various libraries, that created the Harvard University Library.
Even today, Verba said, students and faculty don’t realize just how complex the system is. They don’t worry about which School or department owns a particular book or journal, or who pays for the subscription or the overhead for the library where it’s held. All they care about is that they find their needed reference. And that, Verba said, is a mark of the success of the University library system.
“Our students don’t have any idea of the complexity behind it, and that’s a good thing,” Verba said.
Verba hailed the Google collaboration as the next breakthrough in the evolving character of digital libraries, a sentiment echoed by Flecker.
Flecker said Google’s project to digitize 15 million books is an enormous undertaking that will consume at least 10 years. Google is negotiating with publishers to scan current books, but the vast collection of books held in libraries will eventually dwarf that effort, Flecker said.
Google’s current work with several prominent libraries, including Harvard’s, will scan and digitize only books in the public domain. In addition, however, some libraries are including books that are still in copyright, though with greatly restricted access to the content. A book search of a copyrighted work, Flecker said, will just show a page or, in some cases, a few paragraphs holding the text of interest.
Because of the need to scan so many books, industrial scanning methods are being used, meaning rare or fragile books are not being included.
“I bet that library books will shortly be the majority [of books included in the search] because there are an awful lot more old books in the world than there are new books,” Flecker said.
Flecker gave a short tour of Google’s book search engine, http://books.google.com, which links results with a variety of other useful tools, such as booksellers, maps showing places named in the book, and related items.
Flecker used an anecdote to illustrate the potential power of a text search as a research tool, telling the story of a doctoral student who, when she was getting close to completing her dissertation, used the book search and found 20 books central to her topic.
Flecker predicted that new kinds of scholarship will be enabled by such a search tool.
“Once you have all the text of all published works of the 19th century, you can do the kind of research on aggregate data you couldn’t if you had to read them one by one,” Flecker said.
Other speakers outlined coming changes to the Harvard University Art Museums, which is considering new ways to allow access to its collections as HUAM plans for pending renovations of existing facilities and a move into a new facility in Allston.
Susan Rogers, the manager of iCommons, and Paul Bergen, director of the Faculty of Arts and Sciences’ Instructional Computing Group, gave an overview of the growing use of technology and innovation in creating course Web sites and other technology-assisted ways to enhance learning.
Bergen said the Presidential Instructional Technology Fellows program, which matches computer-savvy students with faculty needing IT help, has been a great success, and said he’s beginning to see a change in how faculty are using technology.
Instead of asking for help creating Web sites and other tools for instruction, he said, he’s fielding increasing requests for ways to increase communication between instructors and students.