Additions to Text Exports Coming Soon
The BHL website was recently updated for new fields to download content. The TSV Data Exports are being updated on 1 September 2019 to mirror this change.
Recently, we added new URLs to the site to facilitate getting the text, Images or PDFs of the items at BHL. When viewing an item (for example, Darwin’s Origin of Species), the Download Contents > Download Book option presents four choices for downloading the contents of an item. Three of these have new, normalized URLs to download the content of an item.
- PDF: https://www.biodiversitylibrary.org/itempdf/124544
- All: (unchanged)
- JPEG 2000: https://www.biodiversitylibrary.org/itemimages/124544
- Text: https://www.biodiversitylibrary.org/itemtext/124544
We added these links because we discovered that there were some inconsistencies in connecting our content to the Internet Archive. Additionally with the new ability of BHL Partners to upload transcribed text, we needed a method of downloading the updated text rather than the original OCR.
What has changed?
In summary, these three new URLs have been added to the tab-delimited Item (volumes) TSV download files. The presence of these fields will impact any downstream processes that rely on the order of the fields. Please review your code if you rely on the field order instead of the field names of the TSV file.
In the past, the fields were:
ItemID, TitleID, ThumbnailPageID, BarCode, MARCItemID, CallNumber, VolumeInfo, ItemURL, LocalID, Year, InstitutionName, ZQuery, CreationDate
On 1 September 2019, the fields will change to the following:
ItemID, TitleID, ThumbnailPageID, BarCode, MARCItemID, CallNumber, VolumeInfo, ItemURL, ItemTextURL, ItemPDFURL, ItemImagesURL, LocalID, Year, InstitutionName, ZQuery, CreationDate
These fields mirror those of the new links mentioned above and will save you from needing to create the URLs to download content.
Please update your code or processes if necessary!
Leave a Comment