What Is BHL’s New Persistent Identifier Working Group DOI’ng?

In October 2020, BHL launched a new working group with a momentous goal: to make the content on BHL persistently discoverable, citable and trackable using DOIs (Digital Object Identifiers).

Graphic showing the members of BHL's Persistent Identifier Working Group

The members of BHL’s new Persistent Identifier Working Group (PIWG).

A DOI is like an electronic fingerprint in the form of a unique and permanent alphanumeric string that provides a persistent link to a piece of content online. Modern publications receive a DOI at the point of publication. This DOI becomes a key part of a publication’s bibliographic metadata that should be included in any mention or citation of that publication. Reference lists in modern publications are filled with DOIs, which allows readers to click from publication to publication in (in theory) a never-ending chain of knowledge.

This reciprocal linking of DOIs has created a great linked network of scholarly research, but that network is missing the historic literature. The vast majority of historic publications lack DOIs. This means they appear in reference lists as unlinked citations. In our increasingly online world, readers are far more likely to read (and thus cite) publications they can click through to (particularly when libraries are inaccessible during a global pandemic). The upshot of this is that the millions of pages of historic literature on BHL—the foundation of our understanding of biodiversity—is in danger of falling into obscurity.

BHL has been retrospectively minting DOIs for historic publications since 2011, but the focus has primarily been on monographs. BHL’s new Persistent Identifier Working Group (PIWG) is (at least initially) focusing on journal articles. Minting DOIs for articles on BHL is a far more complex and time-consuming task than minting DOIs for monographs. This is because article DOIs need article data: every journal volume uploaded onto BHL must be accompanied by journal and volume data, but there is no requirement that contributors provide article data.

Thankfully, there have been considerable efforts to add article data to BHL (and thereby make it possible to search for the titles and authors of these articles both within BHL and via external engines). A huge proportion of this article data has been contributed to BHL by Roderic Page via BioStor: 75% of the 300,753 articles indexed in BHL as of 4 May 2021 were “defined” by BioStor. It is very difficult to determine how many articles are actually on BHL (hidden within all those journal volumes). But, while we don’t know what proportion of BHL’s journal content still needs to be made discoverable, we know there is still a huge amount of work to do.

COVID-19 provided an unexpected opportunity to make a considerable dent in this work. With no access to scanners or library materials, a number of BHL contributors, including Harvard University Libraries, Muséum National d’Histoire Naturelle and BHL Australia, pivoted from making new content accessible to making their existing content on BHL more discoverable. For example, BHL Australia’s digitisation volunteers gathered, gap filled and checked article-level metadata for over 30,000 articles in 2020.

Once an article has been defined, i.e. it exists as a publication unit in BHL and has its own article landing page (and we’ve checked that the article does not already have a DOI), we can assign a DOI to it. Articles that have recently been assigned BHL DOIs include some very old publications, such as the first scientific description of the Duck-billed Platypus, published in 1799 (https://doi.org/10.5962/p.304567), and the species descriptions from A specimen of the botany of New Holland, the first publication dedicated to Australian flora (1793-5), e.g. https://doi.org/10.5962/p.312432. The PIWG has also started assigning DOIs to in-copyright publications (with permission from the rights holders). These include articles from the Bulletin of the British Museum, e.g. https://doi.org/10.5962/p.310418, and the Bulletin of the African Bird Club, e.g. https://doi.org/10.5962/p.308885.

Screenshot of the landing page in BHL for the description of the The Duck-Billed Platypus, Platypus anatinus.

Shaw, George (1799), The Duck-Billed Platypus, Platypus anatinus, The Naturalist’s Miscellany: https://doi.org/10.5962/p.304567 (illustration by Frederick Polydore Nodder).

If an article on BHL has an existing non-BHL DOI, we add this key piece of bibliographic metadata to the BHL landing page for the article. This ensures that BHL users can link to the definitive version of the article (the one the DOI resolves to), and more importantly, that other parties (and their algorithms) can find our versions from elsewhere. This is particularly important when commercial websites lock their DOI’d versions of public domain articles behind paywalls. Having their DOIs on our freely accessible versions ensures that services like Unpaywall can find them. To learn more about how this works, see our blog post: BHL Journal Articles Are Now Discoverable via Unpaywall.

DOIs not only improve discoverability and enable persistent linking to our historic content; they also allow us to track how BHL content is being used. In the six months following the minting of its new DOI (Oct 2020 to April 2021), the 1799 Platypus description was tweeted by 219 Twitter accounts, referenced in six Wikipedia pages, picked up by one news outlet and cited in one academic paper (data from Altmetric, April 2021). We know this because the article has a DOI.

Screenshot of the Altmetric dashboard for the first scientific description of the Duck-billed Platypus

Altmetric’s overview of attention for the first scientific description of the Duck-billed Platypus (Shaw 1799): https://www.altmetric.com/details/91788579.

The PIWG has spent the past six months creating, refining and testing tools that will allow BHL contributors to do this work themselves. We have also been producing documentation that explains a) how to use the new tools, and b) why this work is so important. These tools will facilitate every step in the article discoverability and DOI assignment process including: downloading existing article data for a given journal title to allow for correction and gap-filling (in development); bulk uploading of article data for new articles (available now); and adding articles and titles to BHL’s (new) DOI Assignment Queue (available now). Our dream is that, whenever anyone uploads a journal volume to BHL, they also provide the data for the articles it contains (and thus take responsibility for making that content discoverable).

The Persistent Identifier Working Group (PIWG) is fueled by the technical expertise, metadata dexterity and incredible passion of:

  • Nicole Kearney, Manager BHL Australia (Chair)
  • Mike Lichtenberg, BHL Lead Developer
  • Susan Lynch, Systems, Digitization & Web Services Librarian, The New York Botanical Garden
  • Bess Missell, Metadata Librarian, Smithsonian Libraries and Archives
  • Roderic Page, Professor of Taxonomy, University of Glasgow
  • Joel Richard, BHL Technical Coordinator | Head of Web Services & IT, Smithsonian Libraries, Smithsonian Libraries and Archives
  • Diane Rielinger, Digital Projects Librarian, Botany Libraries, Harvard University Herbaria
  • Colleen Funkhouser, BHL Program Manager

The specific goals of the group are:

  • To add article-level metadata to journal articles on BHL
  • To add existing DOIs to (new and existing) article landing pages on BHL (particularly for those articles where the DOI’d version is behind a paywall elsewhere)
  • To assign BHL DOIs to articles that lack DOIs

Want to know more about BHL’s Persistent Identifier Working Group? See:

  • Discovering the Platypus: From its scientific description to its DOI, Biodiversity Information Science and Standards (TDWG) Conference, 6 October 2020: https://youtu.be/4UVSEoWsSrw?t=1285
  • #RetroPIDs: making historic Platypus Infinitely Discoverable (PID), PIDapalooza: the Festival of Persistent Identifiers, 28 January 2021: https://youtu.be/CSeQNe5KR5U

For the latest news about BHL’s DOI work, check out #RetroPIDs on Twitter.

Photo of a woman
Written by

Nicole Kearney is the Manager of Biodiversity Heritage Library Australia and chairs BHL's Persistent Identifier Working Group. She is obsessed with open access, persistent identifiers, and Striped Possums.