BHL, Internet Archive, and the eBook

On October 24-25, BHL Program Director Martin Kalfatovic and BHL Program Manager Grace Costantino attended the Internet Archive Leaders Forum in San Francisco, CA. The meeting, attended by 23 representatives from projects partnering with Internet Archive, provided attendees with a chance to showcase their projects, discuss their collaborations with IA, and nurture ideas associated with eBooks.

Since BHL’s inception, we have partnered with Internet Archive for digitization services. Whether by sending large shipments of books to local IA mass scanning centers, or by operating a single scanning machine provided by IA within their own institution, most BHL members have contributed material to BHL via IA collaborations. While we ingest, store, and serve book metadata via our own hardware, the page images on BHL are served directly from Internet Archive. As such, even books scanned outside of IA are ultimately ingested into the Archive (via a software developed by Joel Richard at the Smithsonian Libraries) so they can be served via the BHL portal.

The Internet Archive was founded in 1996 by Brewster Kahle as a non-profit digital library that provides free, unrestricted access to digitized materials, including not only books but also music and images and, through the Way Back Machine, an archive of the Internet itself. The Archive recently began work in television, archiving all news produced over the past three years on twenty news channels. On the last day of the IA Leaders Forum, the Archive announced that they now house over 10 petabytes of digitized material.


Though the meetings took place at the San Francisco Public Library Richmond Branch just a few blocks from the IA headquarters, Robert Miller, IA’s Global Director of Books and the meeting’s emcee, gave attendees a tour of the main IA building – a repurposed former Christian Science facility. Though transformed to meet the needs of a digital archive, the building still maintains the charms of the 100-year-old church, including marble drinking fountains, stained glass, and a massive auditorium whose pews are replete with statues crafted in the likeness of IA employees that have been with the project at least three years – a practice modeled after the famous “Terracotta Army” of China’s first emperor, Qin Shi Huang.

The theme of this year’s meeting was “eBook Lending Library: the 2nd Million eBooks.” In early 2011, IA launched an In-Library Lending Program through their OpenLibrary project. The project now provides access to over 1,000,000 free eBooks, 250,000 of which are contemporary (post-1923), with another 200,000 contemporary titles available in DAISY format for the print disabled. These eBooks are freely available to all participating libraries and OpenLibrary users and can be downloaded directly to your computer or eReader, such as a Kindle. For more information about the OpenLibrary project, visit this website. IA is working diligently to provide another 1,000,000 eBook titles in OpenLibrary, and has already obtained funding for 500,000 of those titles.

This meeting was an opportunity for IA’s stakeholders (those of us who partner with IA for digitization services) to help shape the future of the OpenLibrary lending program. 11 states, representing 110 million people, are currently members of the program (meaning residents in those states can freely “check out” the contemporary eBooks in OpenLibrary), and IA hopes to extend that access to another 50 million people by Q1 of 2013. IA also has an MOU in place with COSLA (Chief Operating State Librarians Association) to implement the eBook lending program in all 50 states and the District of Columbia. For many libraries, particularly smaller state libraries, OpenLibrary may constitute the only eLending program available to patrons.

IA hopes particularly to build the contemporary collection in OpenLibrary, and discussions at the meeting were focused on brainstorming ways to encourage participation among the organizations represented around the table. Of particular concern for many present were the legal implications of participation, as well as ensuring adequate access to popular titles within the collection, which essentially constitutes ensuring that multiple copies of each eBook are available.

Each participant also had a chance to talk about eBooks in the context of their own project, and elaborate on their requirements for such a program. Grace Costantino provided an overview of the BHL project, highlighting that users can freely download PDFs, custom PDFs, OCR, and high resolution images from BHL. The “View at Internet Archive” option in BHL also allows users to download additional eBook formats generated by IA and based on uncorrected OCR. BHL has also expanded eBook possibilities with their recent collections in iTunes U. However, what makes BHL particularly invaluable for users are the additional services we provide on top of the digitized texts, chiefly our name finding technology, which locates species names throughout the entire corpus. As Costantino articulated, if eBooks are going to work for the science community, they must be openly available, users must be able to “mine” them for the data required, and name finding services and linked data options must be supported.

We are entering an exciting time in this digital era. Open access projects like Internet Archive and BHL are providing unprecedented access to materials formerly confined to physical availability in select libraries. Such projects remove the barriers to research and repatriate knowledge to all parts of the world. As we explore ways to make this content even more usable and accessible, it’s exciting to know that projects like IA are listening to the needs of their stakeholders and striving to build programs that fulfill those requirements. While BHL’s primary eBook efforts will continue to revolve around improving PDF download functionality (with various other eBook options available through the IA link in BHL – see fig. 1 above), and expanding collaborations with iTunes, we’re invested in exploring eBook options with IA and OpenLibrary, and are anxious to see where these initiatives take us!


Avatar for Michelle Strizever
Written by