Providing More Robust Data in BHL’s OAI-PMH Dublin Core Feed

Overview of Live Data from BHL

Recently, BHL performed a comprehensive review of all live data feeds and outputs to ensure that we are providing robust metadata to our downstream consumers. Live BHL data can be found at BHL’s Developer and Data Tools. BHL’s live data outputs include:

  • API v3
  • OAI-PMH

OAI-PMH is an acronym for the Open Archives Initiative Protocol for Metadata Harvesting. It allows other discovery services and aggregators to harvest BHL’s metadata in standard formats such as Metadata Object Description Schema (MODS) and Dublin Core (DC).

The OAI-PMH protocol only requires metadata to be expressed in the unqualified Dublin Core (DC) format. However, it also can be extended to express the metadata in other formats. Because unqualified DC is constrained to just 15 core elements, it is not uncommon for OAI-PMH repositories to also provide metadata in a more robust format. BHL provides MODS in addition to DC. Consumers of the BHL OAI-PMH feed are encouraged to use the MODS-formatted data instead of Dublin Core because it provides additional data elements that aid in user discovery.

It’s important to note that BHL provides five sets of metadata via OAI-PMH:

  1. Item = This set contains individual volumes hosted by BHL. The content is viewable in BHL.
  2. Title = This set contains metadata about the monographs and journals represented in BHL.
  3. Part = This set contains articles/chapters/treatments/etc. hosted by BHL. The content is viewable in BHL.
  4. Item External = This set contains individual volumes not hosted by BHL. The content must be viewed on a site not maintained by BHL.
  5. Part External = This set contains articles/chapters/treatments/etc. not hosted by BHL. The content must be viewed on a site not maintained by BHL.

Improving BHL’s OAI-PMH Dublin Core Feed

To provide more robust data in BHL’s OAI-PMH Dublin Core feed, three changes have been made to the feed:

  1. Creative Commons (CC) license information was added as a second <rights> element;
  2. A <relation> element was added to titles that are part of a monographic series, allowing BHL to model more complex bibliographic relationships that exist in the BHL database; and
  3. A non-standard “type” attribute was removed from the <relation> element for parts.

Important: If you are a developer, using the non-standard “type” attribute in your code at the part-level, this is a breaking change. Please take note and update your code accordingly.

Below is an example output of a monographic series using the <relation> element at the title-level:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2023-01-31T13:45:05Z</responseDate>
<request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:biodiversitylibrary.org:title/965"> https://www.biodiversitylibrary.org/oai </request>
<GetRecord>
<record>
<header>
<identifier>oai:biodiversitylibrary.org:title/965</identifier>
<datestamp>2008-01-02T11:47:12Z</datestamp>
<setSpec>title</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Adephagous and clavicorn Coleoptera from the Tertiary deposits at Florissant, Colorado, with descriptions of a few other forms and a systematic list of the nonrhynchophorous Tertiary Coleoptera of North America</dc:title>
<dc:creator>Scudder, Samuel Hubbard, 1837-1911</dc:creator>
<dc:subject>Beetles, Fossil</dc:subject>
<dc:subject>Paleontology</dc:subject>
<dc:subject>Tertiary</dc:subject>
<dc:publisher>Washington, Govt. print. off, 1900</dc:publisher>
<dc:contributor>Smithsonian Libraries</dc:contributor>
<dc:date>1900</dc:date>
<dc:type>Book</dc:type>
<dc:type>text</dc:type>
<dc:identifier>https://www.biodiversitylibrary.org/bibliography/965</dc:identifier>
<dc:identifier>info:doi/10.5962/bhl.title.965</dc:identifier>
<dc:language>English</dc:language>
<dc:relation>https://www.biodiversitylibrary.org/bibliography/42496</dc:relation>
</oai_dc:dc>
</metadata>
</record>
</GetRecord>
</OAI-PMH>

Data harvesters currently consuming BHL’s OAI-PMH feeds, please update your code accordingly. Also, feel free to leave a comment for the BHL Technical Team with any feedback regarding any of our live data outputs. Thank you and happy harvesting!

Related Links

Avatar for JJ Dearborn
Written by

JJ Dearborn joined the Biodiversity Heritage Library as Data Manager in 2022 and works to open-up BHL data to the larger biodiversity community and the world. As a longtime advocate for the free-culture movement, she has worked on open access projects for the Peabody Essex Museum, Harvard University’s Department of Organismic and Evolutionary Biology, the Smithsonian Museum of Natural History, Harvard-Smithsonian Center for Astrophysics, the City of Boston, and the State of Massachusetts.