This suite of files also includes metadata about each volume scanned, as well as information about the millions of scientific names that have been identified throughout the BHL corpus and the pages on which those names occur.
Download files:
- Documentation -updated 3/13/2009
- Download .zip file of all tables (147MB)
If you are interested only in the titles we have digitized, and the items ("books" or "volumes") for each title, you only need to download the (significantly smaller) files for the following tables:
- Documentation
- Download contents of Title table as a tab-delimited text file. (4MB+)
- Download contents of TitleIdentifier table as a tab-delimited text file. (400KB)
- Download contents of Item table as a tab-delimited text file. (3MB+)
Finally, we are considering this version a "warts and all" export. Merging the contents of multiple library catalogues and streamlining the digitization process to avoid duplication are the biggest challenges we face in building BHL, and to be frank our metadata is far from pristine in these early stages of our project. We are building functionality that allows librarians at BHL institutions to curate these digital books in ways that make sense to both scientists and librarians and that accommodate the variety of ways in which historic works have been catalogued over time. It's a challenge we've just begun to tackle, and we look forward to any and all feedback you care to provide.
Chris Freeland
BHL Technical Director
chris dot freeland at mobot dot org
1 comment:
For our virtual library of biology (vifabio), we have downloaded BHL's title data and included it in our federated search, so information on digitized taxonomic literature from BHL can be retrieved together with information on holdings of some German biology libraries and information from several bibliographic databases.
In our first approach, we had to limit our efforts to title information from Title table and TitleIdentifier table, neglecting data on specific volumes of serials or multi-volume titles. We used Library of Congress Subject Headings (extracted from Call numbers) to enrich title data with coarse descriptive terms for many titles. There have been many difficulties with heterogeneous character encoding, making any diacritical character a problem. Nevertheless, the task was no doubt worthwile, and we see the intergration of BHL's title data as a great enhancement of our virtual catalogue ( http://www.vifabio.de/servlet/Top/searchadvanced?language=en ).
In the future, we intend to update our downloads from time to time, and we would greatly welcome any improvements in the data, especially regarding completeness of author information, and character encoding. Of course, for an implementation like ours, a dynamic interface to query bibliographic data in BHL would be very useful. But we know that these things form difficult tasks, and that the digitization process itself will be your focus for a long time.
(vifabio's web pages are in German language, sorry, but some of the most important pages are available in English as well, and others will be so in the near future.)
Post a Comment