BHL News, Blog Reel, Tech Updates

Export of titles & scientific names in BHL now available for download

A series of files is now available for download that will enable libraries and other data providers to identify digitized titles available within BHL.

This suite of files also includes metadata about each volume scanned, as well as information about the millions of scientific names that have been identified throughout the BHL corpus and the pages on which those names occur.

Download files:

Documentation -updated 3/13/2009
Download .zip file of all tables (147MB)

NOTE: These files represent a first cut at how we want to make data providers and libraries aware of the content within BHL. Yes, we will build services, including an OpenURL resolver, but for now our partners have asked for a low-barrier export that they can manipulate for their own specific uses. The files above are automatically generated from the BHL database on a monthly basis. The datestamp on the files themselves indicate when they were last generated.

If you are interested only in the titles we have digitized, and the items (“books” or “volumes”) for each title, you only need to download the (significantly smaller) files for the following tables:

The full .zip download is not for the faint of heart! It’s a monster file because it includes the export of the 27 million 36 million occurrences of scientific names (updated 3/13/2009) identified in the BHL corpus through indexing by TaxonFinder.

Finally, we are considering this version a “warts and all” export. Merging the contents of multiple library catalogues and streamlining the digitization process to avoid duplication are the biggest challenges we face in building BHL, and to be frank our metadata is far from pristine in these early stages of our project. We are building functionality that allows librarians at BHL institutions to curate these digital books in ways that make sense to both scientists and librarians and that accommodate the variety of ways in which historic works have been catalogued over time. It’s a challenge we’ve just begun to tackle, and we look forward to any and all feedback you care to provide.

September 11, 2008

Written by oneclickorders

1 Comment

Gerwin Kasperek February 11, 2009 at 5:31 am Reply

For our virtual library of biology (vifabio), we have downloaded BHL’s title data and included it in our federated search, so information on digitized taxonomic literature from BHL can be retrieved together with information on holdings of some German biology libraries and information from several bibliographic databases.

In our first approach, we had to limit our efforts to title information from Title table and TitleIdentifier table, neglecting data on specific volumes of serials or multi-volume titles. We used Library of Congress Subject Headings (extracted from Call numbers) to enrich title data with coarse descriptive terms for many titles. There have been many difficulties with heterogeneous character encoding, making any diacritical character a problem. Nevertheless, the task was no doubt worthwile, and we see the intergration of BHL’s title data as a great enhancement of our virtual catalogue ( http://www.vifabio.de/servlet/Top/searchadvanced?language=en ).

In the future, we intend to update our downloads from time to time, and we would greatly welcome any improvements in the data, especially regarding completeness of author information, and character encoding. Of course, for an implementation like ours, a dynamic interface to query bibliographic data in BHL would be very useful. But we know that these things form difficult tasks, and that the digitization process itself will be your focus for a long time.

(vifabio’s web pages are in German language, sorry, but some of the most important pages are available in English as well, and others will be so in the near future.)

Cancel Reply

About BHL

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at the Smithsonian Libraries and Archives in Washington, D.C., BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access as part of a global “biodiversity community.”

Export of titles & scientific names in BHL now available for download

Related Posts

1 Comment

Leave a Comment

Cancel Reply

Help Support BHL

Search

About BHL

Follow BHL

Join Our Mailing List

Subscribe to our Blog Via RSS

Export of titles & scientific names in BHL now available for download

Related Posts

Once There Were Billions: Carolina Parakeet

Happening NOW! Latin American Orchid Exhibition

Death by Corset: A Nineteenth-Century Book about Fatal Women’s Fashions (and Animal Physiology)

1 Comment

Leave a Comment

Cancel Reply

Help Support BHL

Search

About BHL

Follow BHL

Join Our Mailing List

Subscribe to our Blog Via RSS