Thursday, August 30, 2012

Interested in improving access to millions of digital images?

The Biodiversity Heritage Library (BHL) has made significant contributions to the research community over the past five years.  One of the largest has been to successfully digitize a significant mass of biodiversity literature (nearly 40 million pages) and make that literature available for open access and responsible use as a part of a global “biodiversity commons.”

Yet despite this success, BHL continues to have several challenges with access to and distribution of its digitized content.  One of which is the ability for users to easily find the millions of natural history illustrations hidden within the pages of the BHL corpus.  Only a small percentage of pages have been tagged as having illustrations because this is currently a labor-intensive manual task (a small selection of the diversity of BHL images can be viewed in its Flickr stream at   Once tagged, users still cannot search on the illustration’s content using criteria such as species names, dates, and creators because images have not been described at that level of detail.

The NEH-funded Art of Life project has set out to solve this problem both by developing an algorithm to automatically identify which pages contain illustrations and by creating a schema to further classify and guide the description of the illustrations so as to increase their accessibility to users.   Once the algorithm tags pages containing illustrations, they will be pushed out to image-sharing platforms such as Flickr and Wikimedia Commons for crowdsourcing of the descriptions.  The schema will provide guidance on the recording of fields and their values.  (see an example of a BHL illustration marked up with the Art of Life schema below)

Example of BHL illustration marked up with proposed Art of Life schema


Here’s how you can help

A draft  of the schema has been developed; we are looking for feedback on how well it will serve the needs of five primary audiences that we believe would benefit from access to these illustrations:   1) Artists, 2) Biologists, 3) Humanities Scholars, 4) Librarians, and 5) Educators. We particularly want to know if the schema incorporates the access points by which these user groups want to find images, or whether they might want to search for images based on fields not incorporated in the schema.

Whether you anticipate being a user of the illustrations from the BHL or you are a subject specialist or cataloger interested in helping us describe their content, we are interested in hearing from you as to how this schema may be improved to support the description of and access to these images.

We have provided a brief survey for feedback here:


Feedback can also be posted to this blog, added directly on the schema draft (with Google Docs comments) or emailed to me ( )

Trish Rose-Sandler, Data Analyst, Missouri Botanical Garden