Export of titles & scientific names in BHL now available for download

A series of files is now available for download that will enable libraries and other data providers to identify digitized titles available within BHL.

This suite of files also includes metadata about each volume scanned, as well as information about the millions of scientific names that have been identified throughout the BHL corpus and the pages on which those names occur.

Download files:

NOTE: These files represent a first cut at how we want to make data providers and libraries aware of the content within BHL. Yes, we will build services, including an OpenURL resolver, but for now our partners have asked for a low-barrier export that they can manipulate for their own specific uses. The files above are automatically generated from the BHL database on a monthly basis. The datestamp on the files themselves indicate when they were last generated.

If you are interested only in the titles we have digitized, and the items (“books” or “volumes”) for each title, you only need to download the (significantly smaller) files for the following tables:

The full .zip download is not for the faint of heart! It’s a monster file because it includes the export of the 27 million 36 million occurrences of scientific names (updated 3/13/2009) identified in the BHL corpus through indexing by TaxonFinder.

Finally, we are considering this version a “warts and all” export. Merging the contents of multiple library catalogues and streamlining the digitization process to avoid duplication are the biggest challenges we face in building BHL, and to be frank our metadata is far from pristine in these early stages of our project. We are building functionality that allows librarians at BHL institutions to curate these digital books in ways that make sense to both scientists and librarians and that accommodate the variety of ways in which historic works have been catalogued over time. It’s a challenge we’ve just begun to tackle, and we look forward to any and all feedback you care to provide.

Avatar for Chris Freeland
Written by

Chris Freeland served as the BHL Technical Director from 2006-2012. He is currently the Director of the Open Libraries program at Internet Archive. In this capacity he works with libraries & publishers to digitize their collections, working towards the Archive’s mission of providing “universal access to all knowledge.”