Article download now available!

Since the public launch of BHL in Feb 2008, the BHL Technical development team has received repeated requests for an interface that would allow users to download a PDF for an individual article within one of the digitized books in BHL. This is actually a fairly challenging task, as previously reported, but with the right technology and a little bit of luck we’ve devised a solution that is working very well in production and is receiving positive feedback. Here’s how it works.

You come across following reference:

Wormald, H. “Variation in the male hop, Humulus lupulus L.” The Journal of agricultural science. 7:175-197. 1915.

A quick title search for The Journal of agricultural science shows that it is available through BHL at http://www.biodiversitylibrary.org/bibliography/8643, and that volume 7 is online at http://www.biodiversitylibrary.org/item/35866. Scrolling to page 175, you find Wormald’s article.

To download a PDF of this article, hold your mouse over the “Download/About this book” link and click “Select pages to download”. From the resulting page, check the boxes next to pages 175 through 197, then click the “Next” button.

From here we ask you to do a little optional data entry by adding the article’s title and author(s). We don’t require this, but if you take the time to fill out this information we’ll hold onto it and index it so that other users will be able to find the article in future searches. In this way your work will benefit the wider community of BHL users.

After clicking “Submit”, your job will get added to our queue and you’ll get an e-mail notification that we’ve received your request. And then the tech-fun begins! We use an open source application called iText to generate the PDF by passing off URLs to the JPEG2000 image for each page, stored on Internet Archive’s servers. iText converts that series of pages into a single PDF and writes the file out to BHL servers. Depending on server & network load, and the size of each article, this process can take anywhere from a few seconds to several minutes. Once complete, you’ll then receive another e-mail notifying you that your PDF is available for download. For the request above, you would receive the following:

Your PDF generation request has been completed.

The PDF can be downloaded from the following location: http://www.biodiversitylibrary.org/pdf1/000107600035866.pdf.

We include a cover sheet in the PDF that lists value-added information like bibliographic metadata about the title and volume, as well as attribution for the library that contributed the volume and the organization that sponsored its digitization. We also include the OCR text for the article (actually it’s a selection you can make when choosing pages to include; the PDF above has the text included).

This feature is in production now, but it’s still new and needs refinement, so we encourage users to try it out and provide feedback. We’ll continue to improve the functionality based on requests and suggestions from users over time. Please either leave your comments below or submit them to our Feedback form.

Avatar for Chris Freeland
Written by

Chris Freeland served as the BHL Technical Director from 2006-2012. He is currently the Director of the Open Libraries program at Internet Archive. In this capacity he works with libraries & publishers to digitize their collections, working towards the Archive’s mission of providing “universal access to all knowledge.”