Additions to Text Exports Coming Soon

The BHL website was recently updated for new fields to download content. The TSV Data Exports are being updated on 1 September 2019 to mirror this change.


Recently, we added new URLs to the site to facilitate getting the text, Images or PDFs of the items at BHL. When viewing an item (for example, Darwin’s Origin of Species), the Download Contents > Download Book option presents four choices for downloading the contents of an item. Three of these have new, normalized URLs to download the content of an item.

We added these links because we discovered that there were some inconsistencies in connecting our content to the Internet Archive. Additionally with the new ability of BHL Partners to upload transcribed text, we needed a method of downloading the updated text rather than the original OCR.

What has changed?

In summary, these three new URLs have been added to the tab-delimited Item (volumes) TSV download files. The presence of these fields will impact any downstream processes that rely on the order of the fields. Please review your code if you rely on the field order instead of the field names of the TSV file.

In the past, the fields were:

ItemID, TitleID, ThumbnailPageID, BarCode, MARCItemID, CallNumber, VolumeInfo, ItemURL, LocalID, Year, InstitutionName, ZQuery, CreationDate

On 1 September 2019, the fields will change to the following:

ItemID, TitleID, ThumbnailPageID, BarCode, MARCItemID, CallNumber, VolumeInfo, ItemURL, ItemTextURL, ItemPDFURL, ItemImagesURL, LocalID, Year, InstitutionName, ZQuery, CreationDate

These fields mirror those of the new links mentioned above and will save you from needing to create the URLs to download content.

Please update your code or processes if necessary!

Avatar for Joel Richard
Written by

Joel Richard is the head of Web and IT department for the Smithsonian Libraries and Archives, and the Technical Coordinator for the Biodiversity Heritage Library. Joel is also the creator and developer of the Macaw software used by BHL partners to add content to BHL.