Improving the Efficiency of Scientific Research

The Biodiversity Heritage Library and the Hymenoptera Anatomy Ontology

The realm of ontology concerns the nature of reality, determining what exists, how it fits within a hierarchy, and how various elements are organized according to similarities and differences. Traditionally a philosophical question within metaphysics, today ontology has a firm application within systems biology as well.

Anatomy ontologies describe the structural and developmental relationships between the various parts of an organism. Defining anatomical ontologies reveals a complete list of distinguishing characteristics for that organism or group of organisms. The act of creating an anatomical ontology requires precise definitions of the terminology used to describe a variety of phenotypes.

Authors that have contributed to the past 250 years of taxonomic literature did not use standardized vocabularies. Katja C. Seltmann (Project Manager for the Tri-Trophic Thematic Collection Network at the American Museum of Natural History) desired to find a way to efficiently analyze this multi-century body of literature to create a single anatomical ontology, specifically for the insect order Hymenoptera. Accomplishing this feat required utilizing the Biodiversity Heritage Library (BHL).

Millions of pages of analog biodiversity literature, spanning the 15th-21st centuries, are digitized and made freely available online by the Biodiversity Heritage Library. Among the over 59,000 titles in the collection is the Journal of Hymenoptera Research (JHR), published by the International Society of Hymenoptera since 1992. Seltmann and a team of four other researchers utilized this publication from BHL to help build the Hymenoptera Anatomy Ontology (HAO).

The NSF-funded Hymenoptera Anatomy Ontology is based on a language recognition tool (called the “Proofer”), which can be implemented across biodiversity literature in order to discover domain-specific anatomy terms. Employing the tool across the OCR for JHR resulted in the discovery of nearly 1,200 new terms for HAO. Furthermore, the development of the ontology is iterative. As the “Proofer” is applied to new collections of literature, it finds matches to existing terms as well as proposes new terms to add to the ontology. A human is required to review the proposed terms, selecting those to be added to the growing database.

After creation, this ontology can be applied as a filter to the literature in order to reveal trends in term occurrence within species descriptions, ultimately allowing researchers to analyze hundreds of years worth of scientific publications without having to sift page by page through the texts. The tool is thus instrumental in improving the efficiency of scientific research, and the process and impact was detailed in the 2012 PLoS ONE article “Utilizing Descriptive Statements from the Biodiversity Heritage Library to Expand the Hymenoptera Anatomy Ontology” (Seltmann et al.).*

According to Seltmann, the Biodiversity Heritage Library plays a critical role in modern scientific research, including her own work:

“I am very fond of the BHL. It set a precedent for open access to literature that I feel initiated a cascading of change in our expectations. Sharing information, publications and open access is no longer the suspicious topic it used to be only a few years ago. Now, expectation is that publications, data and otherwise will be readily available. BHL, in my opinion, was truly one of the first examples of an open model becoming successful in the biological community, and, because it was useful, it changed attitudes.”

The process used to create the Hymenoptera Anatomy Ontology can be applied to other disciplines in order to build any phonotype-relevant ontology. However, as the PLOS article articulates,

“Natural language processing methods for biological data discovery is only possible through open access publications, and efforts such as the Biodiversity Heritage Library to make legacy literature freely available. This exercise to observe trends in the terminology illustrates how the accessibility to literature facilitates anatomy ontology construction.”

This use case thus provides a clear example of how the BHL is inspiring scientific discovery through free access to biodiversity knowledge.

Interested in telling us about how BHL has helped support your research? Send us feedback or write to

* Seltmann KC, Pénzes Z, Yoder MJ, Bertone MA, Deans AR (2013) Utilizing Descriptive Statements from the Biodiversity Heritage Library to Expand the Hymenoptera Anatomy Ontology. PLoS ONE 8(2): e55674. doi:10.1371/journal.pone.0055674

Avatar for Grace Costantino
Written by

Grace Costantino served as the Outreach and Communication Manager for the Biodiversity Heritage Library from 2014 to 2021. In this capacity, she developed and managed BHL's communication strategy, oversaw social media initiatives, and engaged with the public to excite audiences about the wealth of biodiversity heritage available in BHL. Prior to her role as Outreach and Communication Manager, Grace served as the Digital Collections Librarian for Smithsonian Libraries and as the Program Manager for BHL.