The Art of Life: Data Mining and Crowdsourcing the Identification and Description of Natural History Illustrations from the Biodiversity Heritage Library
Missouri Botanical Garden has received $260,000 in funding from the National Endowment for the Humanities to identify and describe natural history illustrations from the digitized books and journals in the online Biodiversity Heritage Library. The Art of Life project will develop software tools for automated identification and description of visual resources contained within the more than 100,000 volumes and 38 million pages of core historic literature made available through BHL digitization activities.
Contained within BHL’s digitized texts are millions of visual resources (plates, illustrations, figures, maps, and other images), many of which were produced by the finest botanical and zoological illustrators in the world, including the likes of John James Audubon, Georg Dionysus Ehret, and Pierre Redouté. These images are currently minimally described at a structural page level, enabling citation resolvers and human users to navigate to illustrations by page numbers, but the images lack sufficient descriptive metadata to enable dynamic filtering and inquiry based on factors like image type, color content, subject matter, or even names of the organisms depicted in the images.
Project funding will help automate the manual processes taken by BHL staff to curate the images delivered via Flickr at http://www.flickr.com/photos/biodivlibrary/sets/. BHL technical staff at Missouri Botanical Garden will build new software tools and augment existing electronic publishing frameworks to run across the BHL corpus and identify the visual resources within, thereby ensuring these images are not only more useful to the current audience of scholars who consult BHL on a regular basis, and discoverable by new audiences, but also better interconnected with related materials across the Web, including the Encyclopedia of Life. Scholars and educators who rely heavily on visual resources in their research and teaching (e.g. biologists, art historians, curators, historians of science) will be able to find and view a wealth of illustrations of plant and animal life from which to make connections between science, art, culture, and history.
To realize the vision of a comprehensive and interactive repository for visual resources describing the world’s biota, the project team aims to achieve five primary objectives over the course of a two-year period:
Objective 1: Define an appropriate metadata schema for natural history illustrations, enabling capture of comprehensive scientific, thematic, and descriptive data;
Objective 2: Build software tools to automatically identify illustrations in the BHL corpus using various files and characteristics to determine location and placement of any type of visual resource;
Objective 3: Enhance existing tools to enable the initial sorting, viewing, and editing of these identified visual resources;
Objective 4: Integrate the Steve.museum application and Flickr APIs to enable a community of users to edit descriptive metadata for the illustrations identified through automated means;
Objective 5: Commit born-digital descriptive metadata generated by users into BHL’s preservation system, based on Fedora Commons.
A complete list of awarded projects and descriptions is available at http://www.neh.gov/files/press-release/march2012statebystatefinal.pdf.