New Article PDF Content Available

The BHL Tech Team is pleased to announce a new form of content available in BHL: Article PDFs. While this may not sound like anything new, after all, we have had a tool to download PDF content for some time, this update changes both how the PDFs are created and maintained, and how BHL is viewed by content aggregators on the internet, most notably Unpaywall.

Screenshot of the Download PDF icon.

The new Download PDF icon

How to use it? While browsing an article, you will now see a Download PDF icon below the View Article link on the right side of the page. Clicking the link will immediately download the PDF to your computer (or view it in your web browser, depending on your settings.)

The benefits of the immediate download are:

  • No waiting.
  • No selecting pages.
  • The PDF contains embedded, searchable, copy-paste-able text.
  • The PDF contains rich XMP-based metadata about the article.

An important change to note is that when viewing an article within an item at BHL, the Download Contents > Download Article link will now direct the visitor’s browser to the new PDFs for immediate download. This is a departure from what we had before in that the pages of the article were pre-selected for download and the visitor was then required to complete the process and wait for the PDF to be generated. We expect the new PDFs to be an improvement for our visitors who come to download articles. View the How do I download a PDF of an article? FAQ for simple download instructions.

Visitors to BHL are still able to manually create PDFs using the Download Contents > Select Pages to Download feature. This feature has not been removed, but it still means that it takes some time to create those PDFs and email the person when the PDF is ready. This option is useful for articles that have not been indexed in BHL, and therefore do not have a Download Article link. View the How do I generate a custom PDF of selected pages from the book? FAQ for complete instructions.

The most important feature of the new Article PDFs is the embedded text within the document. The select-able text is an invisible text layer in the PDF, but it appears when you select or search for text within the document:

A sample of a printed page of a book with highlighted text superimposed over the printed text

An example of select-able text in an Article PDF.

While the appearance of the text may look… less than ideal, rest assured that the text can be copied out intact and used in another program. Example:

It is perhaps needless for me here to reiterate the great importance
of arriving at a final decision as to the real nature of
the haloliranic forms, for it will be obvious that if they have
nothing to do with the normal fresh-water series, and are to
be regarded as the remnant of an ancient sea, our views
respecting the past history of the African interior must be
greatly changed.

Other, less visible benefits to the PDFs are that they are directly linked from the citation_pdf_url meta-tag on the web page which makes them more findable by Google Scholar, Unpaywall, and potentially other aggregators.

For the technical-minded, the PDFs (many tens of thousands of them) are created in advance and stored on BHL’s servers. Changes to data within BHL will cause the PDF to be updated automatically, usually within several hours.

We hope that this is a welcome addition to BHL.

 

– Please note that the text is only as good as the OCR that was generated for the text on the page. While the OCR text is probably very good for the prose sections of an article, titles, tables, and other special content may not appear as expected.

Avatar for Joel Richard
Written by

Joel Richard is the head of Web and IT department for the Smithsonian Libraries and Archives, and the Technical Coordinator for the Biodiversity Heritage Library. Joel is also the creator and developer of the Macaw software used by BHL partners to add content to BHL.