SHARE

Thursday, January 15, 2009

Article download now available!

Since the public launch of BHL in Feb 2008, the BHL Technical development team has received repeated requests for an interface that would allow users to download a PDF for an individual article within one of the digitized books in BHL. This is actually a fairly challenging task, as previously reported, but with the right technology and a little bit of luck we've devised a solution that is working very well in production and is receiving positive feedback. Here's how it works.

You come across following reference:
Wormald, H. "Variation in the male hop, Humulus lupulus L." The Journal of agricultural science. 7:175-197. 1915.
A quick title search for The Journal of agricultural science shows that it is available through BHL at http://www.biodiversitylibrary.org/bibliography/8643, and that volume 7 is online at http://www.biodiversitylibrary.org/item/35866. Scrolling to page 175, you find Wormald's article.

To download a PDF of this article, hold your mouse over the "Download/About this book" link and click "Select pages to download". From the resulting page, check the boxes next to pages 175 through 197, then click the "Next" button.

From here we ask you to do a little optional data entry by adding the article's title and author(s). We don't require this, but if you take the time to fill out this information we'll hold onto it and index it so that other users will be able to find the article in future searches. In this way your work will benefit the wider community of BHL users.

After clicking "Submit", your job will get added to our queue and you'll get an e-mail notification that we've received your request. And then the tech-fun begins! We use an open source application called iText to generate the PDF by passing off URLs to the JPEG2000 image for each page, stored on Internet Archive's servers. iText converts that series of pages into a single PDF and writes the file out to BHL servers. Depending on server & network load, and the size of each article, this process can take anywhere from a few seconds to several minutes. Once complete, you'll then receive another e-mail notifying you that your PDF is available for download. For the request above, you would receive the following:
Your PDF generation request has been completed.

The PDF can be downloaded from the following location: http://www.biodiversitylibrary.org/pdf1/000107600035866.pdf.
We include a cover sheet in the PDF that lists value-added information like bibliographic metadata about the title and volume, as well as attribution for the library that contributed the volume and the organization that sponsored its digitization. We also include the OCR text for the article (actually it's a selection you can make when choosing pages to include; the PDF above has the text included).

This feature is in production now, but it's still new and needs refinement, so we encourage users to try it out and provide feedback. We'll continue to improve the functionality based on requests and suggestions from users over time. Please either leave your comments below or submit them to our Feedback form.

Chris Freeland

6 comments:

kehan said...

Great thanks Chris,

Seeing as you're sticking a cover page on the article and it's being generated anyway, would it be too much bother to embed XMP metadata in the article so that reference managing software like Jabref and Papers can parse this information automatically from the article? I know it's a tall order but just to stick it on the radar somewhere.

tompasley said...

Hi Chris,

Good to see content like the exmaple you've given "Journal of Agricultural Science" in the Biodiversity Heritage Library.

A couple of questions:

Cambridge University Press also have an archive of their content which they sell [http://journals.cambridge.org/action/displaySpecialPage?pageId=852]- is this material really out of copyright?

I see that you mention on your site that you'll support openurl linking... [http://www.biodiversitylibrary.org/Tools.aspx] any idea of a timeline, and how do we get to add you items to our knowledgebase so we can implement the linking?

Any further information or an API would be appreciated.

On a related note, I've tried finding information about how to access the information (using a machine API) for Open Content Alliance material, and have found the OAI-PMH interface for the Internet Archive, but can you give an further info on this too?

jeromine said...

Hi Nice Blog.web based timecard Labor Time Tracker is a “labor time tracker” for your business. It is a smarter, easier and faster way to track employee time for payroll and job costing.

I am RIFAT said...

your blog is nice

Anonymous said...

We always wow gold and world of warcraft power leveling or wow gold

mygamebest said...

At this FFXI GIL point, a white Knight Online Gold flowing Perfect World gold purple Ling, Yu Mian Lip, handsome extraordinary, a Dragonica Gold
black silk-fat Metin2 Yang dish Ragnarok Zeny into a Knight Online Gold bun, the hands of a delicate paper fan, 2Moons Dil full scholar dressed, but refined it Cabal Alz without Flyff Penya losing the charming woman, so that Brothel woman to see her every air of obsession, winks thrown straight, she was mistakenly treated as a handsome son of FFXIV GIL extraordinary Columbia.