Biodiversity Heritage Library - Program news and collection highlights from BHL
  • Home
  • News
  • Featured Books
    • All Featured Books
    • Book of the Month Series
  • User Stories
  • Campaigns
    • Fossil Stories
    • Garden Stories
    • Monsters Are Real
    • Page Frights
    • Her Natural History
    • Earth Optimism 2020
  • Tech Blog
  • Visit BHL
Home
News
Featured Books
    All Featured Books
    Book of the Month Series
User Stories
Campaigns
    Fossil Stories
    Garden Stories
    Monsters Are Real
    Page Frights
    Her Natural History
    Earth Optimism 2020
Tech Blog
Visit BHL
  • Home
  • News
  • Featured Books
    • All Featured Books
    • Book of the Month Series
  • User Stories
  • Campaigns
    • Fossil Stories
    • Garden Stories
    • Monsters Are Real
    • Page Frights
    • Her Natural History
    • Earth Optimism 2020
  • Tech Blog
  • Visit BHL
Biodiversity Heritage Library - Program news and collection highlights from BHL

All posts in Tech Updates

BHL News, Blog Reel, Tech Updates

BHL is Round Tripping Persistent Identifiers with the Wikidata Query Service

Read the full blog post

In the Spring of 2022, the BHL Cataloging and Metadata Committee investigated the possibility of harvesting persistent identifiers (PIDs) from Wikidata as part of the group’s longstanding project to disambiguate and deduplicate author records in the BHL database. The motivation behind this one-time experimental data harvest was to see if BHL could:

  1. Enhance BHL author records with additional PID data points;
  2. Improve the committee’s ability to disambiguate author names in the BHL database; and
  3. Respond to an outstanding user request from two of Wikimedia’s super star editors, Siobhan Leachman and Andy Mabbett, to expose BHL’s author data on BHL and include hyperlinks to other authoritative knowledge bases on the web.

In particular, Wikimedians wanted to see the Wikidata Q identifier exposed, providing a link to the corresponding creator item record in Wikidata.

There are multiple motivations for undertaking this work. By adding the BHL Creator ID to the corresponding Wikidata item, Wikidata editors help link BHL to the richer biographical data about that person held in Wikidata. The Wikidata item for a person may contain links to their Wikipedia page or to images of the person held in the image repository Wikimedia Commons. Wikidata items also act as identifier hubs and contain links to other databases and identifiers.

By adding the BHL Creator ID to this list of identifiers, the Wikidata editor is linking the content held in BHL to the content held in multiple other datasets and repositories.

These extra author data points provide Wikimedians and BHL catalogers with crucial clues that aid in name disambiguation. In particular, hyperlinks to other knowledge bases are incredibly valuable because they lead to new knowledge pathways that help confirm a person’s identity in a complex game of “Who’s Who?”

Continue reading
February 15, 2023byJJ Dearborn and Siobhan Leachman
BHL News, Blog Reel, Tech Updates

Providing More Robust Data in BHL’s OAI-PMH Dublin Core Feed

Recently, BHL performed a comprehensive review of all live data feeds and outputs to ensure that we are providing robust metadata to our downstream consumers. Live BHL data can be found at BHL’s Developer and Data Tools. BHL’s live data outputs include:

  • API v3
  • OAI-PMH

OAI-PMH is an acronym for the Open Archives Initiative Protocol for Metadata Harvesting. It allows other discovery services and aggregators to harvest BHL’s metadata in standard formats such as Metadata Object Description Schema (MODS) and Dublin Core (DC).

To provide more robust data in BHL’s OAI-PMH Dublin Core feed, three changes have been made to the feed:

  1. Creative Commons (CC) license information was added as a second <rights> element;
  2. A <relation> element was added to titles that are part of a monographic series, allowing BHL to model more complex bibliographic relationships that exist in the BHL database; and
  3. A non-standard “type” attribute was removed from the <relation> element for parts.

Important: If you are a developer, using the non-standard “type” attribute in your code at the part-level, this is a breaking change. Please take note and update your code accordingly.

Continue reading
February 3, 2023byJJ Dearborn
BHL News, Blog Reel, Tech Updates

BHL Technical Development: Year in Review

Read the full blog post

For BHL, 2022 was a year to focus on critical upgrades for the BHL platform to ensure the sustainability of our services for our global users. Although BHL’s basic technical infrastructure remains the same, consisting of years of refinement, knowledge, and reliability, a few updates were definitely in order. Most of these upgrades were “behind-the-scenes” work and would not be noticeable to a majority of our users. However, keeping up with these important enhancements is a crucial component of any technology project.

Continue reading
January 31, 2023byJJ Dearborn
Blog Reel, Tech Updates

OCR Improvements: An Early Analysis

Read the full blog post

Optical character recognition (OCR) plays a critical part in BHL’s contributions to the scientific community. OCR in and of itself is a remarkable achievement, converting images of typewritten text to computer-readable text with “pretty good” accuracy. OCR on handwritten text is an even greater challenge to address and is beyond the scope of the improvements discussed here. The scientific work that BHL supports demands the best accuracy that we can provide using available tools, and let’s be honest, available budgets.

Recently, our colleagues at the Internet Archive made the transition away from the ABBYY FineReader OCR software to the Tesseract Open Source OCR engine. Over the past year or more, the OCR team at the Internet Archive has adapted and fine-tuned Tesseract to their workflows. Our first impression is that Tesseract OCR is more than “pretty good” in its ability to identify text from the page images provided to it.

The downside to this is that the Internet Archive has rightfully chosen to not re-process all existing text content through the Tesseract OCR engine. This is a prohibitively expensive and time-consuming prospect given that they have 35 million text-based items and reprocessing them would take several years and use up resources that could otherwise be used for gathering new content.

However, in the interests of supporting the efforts of the BHL community, the BHL Tech Team is working with our Internet Archive partner to reprocess some of BHL’s oldest content with the newest available version of Tesseract OCR. We are currently in a testing phase, and this blog post details some of our early results.

Continue reading
July 19, 2022byJoel Richard
BHL News, Blog Reel, Tech Updates

New Article PDF Content Available

Read the full blog post

The BHL Tech Team is pleased to announce a new form of content available in BHL: Article PDFs. While this may not sound like anything new, after all, we have had a tool to download PDF content for some time, this update changes both how the PDFs are created and maintained, and how BHL is viewed by content aggregators on the internet, most notably Unpaywall.

Continue reading
March 14, 2022byJoel Richard
BHL News, Blog Reel, Tech Updates

What Is BHL’s New Persistent Identifier Working Group DOI’ng?

Read the full blog post

In October 2020, BHL launched a new working group with a momentous goal: to make the content on BHL persistently discoverable, citable and trackable using DOIs (Digital Object Identifiers).

A DOI is like an electronic fingerprint in the form of a unique and permanent alphanumeric string that provides a persistent link to a piece of content online. Modern publications receive a DOI at the point of publication. A DOI is a key part of a publication’s bibliographic metadata and should be included in any mention or citation of that publication. Reference lists in modern publications are filled with DOIs, which allows readers to click from publication to publication in (in theory) a never-ending chain of knowledge.

Continue reading
May 10, 2021byNicole Kearney
BHL News, Blog Reel, Tech Updates

Updates to Bibliography Pages in BHL

Read the full blog post

We have updated the bibliography pages in BHL to streamline the presentation of information about and metadata export options for content in the Library.

Continue reading
February 11, 2021byGrace Costantino
Page 1 of 111234»10...Last »

Tech Updates

Keep up with all the latest technical development news from the Biodiversity Heritage Library, including announcements of new features and improvements to library services, with our Tech Blog.
Subscribe to Tech Updates

Help Support BHL

BHL’s existence depends on the financial support of its patrons. Help us keep this free resource alive!

Donate Now

search

About BHL

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at the Smithsonian Libraries and Archives in Washington, D.C., BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access as part of a global “biodiversity community.”

Follow BHL

Join Our Mailing List

Sign up to receive the latest news, content highlights, and promotions.

Subscribe Now

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Subscribe to Blog Via RSS

Subscribe to the blog RSS feed to stay up-to-date on all the latest BHL posts.

Access RSS Feed

BHL on Twitter

Tweets by @BioDivLibrary

Inspiring Discovery through Free Access to Biodiversity Knowledge.

The Biodiversity Heritage Library makes it easier than ever for you to access the information you need to study and explore life on Earth…for free, anytime, anywhere.

60+ Million Pages of
Biodiversity Literature Online.

EXPLORE

Tools and Services
to Transform Research.

EXPLORE

300,000+
Illustrations on Flickr.

EXPLORE

 

ABOUT | BLOG AUTHORS | HARMFUL CONTENT | PRIVACY | SITE MAP | TERMS OF USE

Download Adobe Acrobat Reader