Thursday, January 29, 2015

We need your Help to Tag over a Million Biodiversity Images in Flickr

BHL Images in the new IA Book Images Flickr Stream

Two Ways to Access BHL Images in Flickr 

Images from the books and journals of the Biodiversity Heritage Library (BHL) are now more readily available in Flickr than ever before. Thanks to the work of researcher Kalev Leetaru and developers at Smithsonian Libraries (SIL), Missouri Botanical Garden (MBG), and the Internet Archive (IA), over 1 million images from BHL are being added to the IA's Book Images Flickr stream. This work began in the summer of 2014 when Leetaru extracted over 14 million images from 2 million IA public domain books and pushed them to the Flickr Commons. BHL images are a subset of this collection because, as a digitization partner for BHL, IA not only scans many of BHL’s books and journals but also hosts all of its content at the Internet Archive as a mirror of the content found at the BHL portal. To improve the discoverability of the images, developers at SIL, MBG, and IA added additional metadata, such as BHL Collections, contributing library, and digitization sponsor tags, to the images exposed through the IA Flickr stream.

As a result, BHL users now have 2 streams from which to access BHL images - the BHL Flickr stream and the IA Book Images Flickr stream.

The primary differences between the two streams have to do with the number of images in each, their organization, and presentation. For example, the BHL stream, which now contains approximately 94,000 images, is manually curated by BHL staff and can be browsed by albums (i.e. book titles) with images presented at the page level. The IA stream currently contains 350,000 BHL images but will rapidly grow to over 1 million in the next few months. While the numbers are significantly larger than the BHL stream, this stream is not organized into subsets or albums and the images are cropped to the illustration’s borders, removing it from the context of the larger page. The full page can be viewed by clicking on the "View Book Page" link beneath each image in the IA stream.

The new IA stream, along with BHL’s current Flickr stream, provides an even larger pool of content for our users to both view and tag BHL images. This moves us further towards our goals on the Art of Life project – to automatically identify, classify and describe BHL images and improve their access for both the BHL community and any scholars and educators who rely on visual resources in their research and teaching.

We Need Your Help! 

We need your help to add species common name tags to the BHL images in Flickr, either within the BHL or IA streams. Adding common names is a low barrier way for non-specialists to engage with these images, and common name tags can be very useful for educators trying to locate images of plants and animals with which to illustrate their lessons.

In order to tag BHL images with common names we recommend the following format:


If the common name is composed of more than one word use quotations around the phrase:

taxonomy:common="black bear"

And we still need your help to add taxonomic species name machine tags to images in the BHL Flickr stream. In 2011 we began asking users to add machine tags for taxonomic binomials to images in the BHL Flickr stream, and the community responded generously (as of Sept 2014 over 22,000 of these tags have been added to 14,000 BHL images). These tags allow the Encyclopedia of Life (EOL) to harvest the images and support their efforts to create a web page for every species on earth. See detailed instructions for adding species name machine tags.

Adding common and species name machine tags not only helps users in both Flickr and EOL discover our images, but these tags will also be used to enhance the BHL portal. Currently, the portal does not allow for searching on image metadata, but BHL is planning on incorporating this functionality in the near future. The Flickr tags added by users will be ingested into BHL to eventually enable image search within the portal itself.

The British Library also has a great guide to additional tag formats, such as those for adding artist names, dates, or VIAF information. Take a look and feel free to add as much of this information as you can! 

Have questions while you're tagging? Add a comment to this blog post, post to our page in Facebook, send us a message on Flickr, or tweet us with the hashtag #BHLTags.

Tips for Navigating the IA Flickr Stream 

With the browsing limitations of the IA stream we’d like to offer some search tips to help users navigate this collection. All BHL images are tagged with bookcollection:biodiversity, which can either be typed into the search box from any Flickr page or you can go directly to the subset.

To further filter results you will need to use the advance search in Flickr. Because all images in the IA Flickr stream are tagged with bookyear, bookdecade, and bookcentury these can be used as further criteria for searching.

More than one tag can be added to the advanced search box. Here is an example of searching on the biodiversity collection and year 1910.

Results of an advanced search on "bookcollection:biodiversity bookdecade:1910"

Users may even want to search a single year of a specific journal.

Results of advanced search on Hardwicke’s Science Gossip bookyear:1967

We want to thank the BHL community for helping us in our efforts to crowdsource these image descriptions and we look forward to seeing the fruits of your labor in a future version of the BHL portal - Happy Tagging!

Trish Rose-Sandler
BHL Data Analyst, Missouri Botanical Garden

Wednesday, January 28, 2015

Notes and News from BHL

We are pleased to announce that the latest BHL Quarterly Report and Newsletter for Winter, 2015, are now available.

You'll notice a new format for our Quarterly Reports and a new design for our newsletters.

We're restructuring our Quarterly Reports as narratives detailing our quarterly activities and presenting content based on three themes: BHL Users, Member and Affiliate Activities, and Science. Our fourth Quarterly Report will constitute our BHL Annual Report, providing not only a report of the year's activities, but the statistics and program evaluations you've been accustomed to seeing in each Quarterly Report. See past reports on our website.

This quarter's report is themed "BHL Users," and in it we're pleased to feature many of the ways users across the world are using BHL and its resources to support their work. Next quarter, we'll feature the contributions our Members and Affiliates have made to BHL, including a summary of our Annual Member's Meeting in March. For our Summer Quarterly Report, we'll feature the many impacts BHL is having on global scientific research.

And don't forget our newsletter, which is a great way to get quarterly updates about BHL news and events directly to your inbox. See archived versions of past newsletters on our website, and if you're not on our mailing list, sign up today!

Tuesday, January 27, 2015

A Bridge to the Past: The Writings of William Brewster

William Brewster was a self-educated ornithologist who lived in Cambridge, Massachusetts. From the mid-1800s until his death in 1919, he amassed a tremendous specimen collection and became one of the foremost experts on birds in the northeastern United States. In 1906, the Nuttall Ornithological Club published The Birds of the Cambridge Region of Massachusetts, Brewster’s exhaustive work on the avian fauna of his own backyard. While the book is a valuable historical resource, it is Brewster’s journals and diaries—spanning over 50 years of his life—that contain the goldmine of his recorded observations. Last year, the Ernst Mayr Library of the Museum of Comparative Zoology at Harvard University made these journals and diaries available on BHL.

Portrait of William Brewster. The Auk. v. 37 (1920).

Increasingly, researchers and conservationists rely on collections of data points to understand species’ habits, population decline, and migration patterns. One such collection is eBird, a website created by the Cornell Lab of Ornithology and the National Audubon Society. eBird harnesses the contributions of bird-watchers around the world to create interactive maps that display individual observations as data points. These data points are integrated into systems such as the Global Biodiversity Information Facility (GBIF), where they paint a rich picture of global environmental health that transcends individual, component snapshots of information.

Since eBird launched in 2002, it has captured millions of bird observations. Prior to the World Wide Web, however—and especially prior to the advent of bird-watching as a common recreational activity around 1900—the record is more spotty. This makes the existing historical data, such as William Brewster’s carefully recorded observations, all the more valuable. 

A bird list from Brewster's 1890 journal.

Brewster saw the effects of urbanization and development on the Cambridge of his boyhood; more than the changed landscape, he lamented the loss of birds. In Birds of the Cambridge Region, he wrote of the Mt. Auburn area:

Knolls and ridges have been levelled, swamps and meadows drained or filled, and woods, groves, thickets and orchards swept away, to make place for settlements of houses...Most of the native birds have disappeared…So complete has been the transformation, that it is only by appealing to the imagination…that one can hope to reconstruct even the more prominent features of the landscape as it was twenty or thirty years ago.

In addition to the effects of development, Brewster witnessed the explosion of non-native House (English) Sparrows, the effects of 0ver-hunting, and the cost of human attitudes and ignorance. Brewster listed the Great Horned Owl as an “occasional or accidental” visitor to Cambridge and the Cooper’s Hawk as “expunged or doubtful”; at the time, people believed that these birds were a serious threat to domestic fowl, and they killed them at every opportunity. Wild Turkeys were totally extirpated from the region during Brewster’s lifetime.

Map of Fresh Pond c.1866, from The Birds of the Cambridge Region of Massachusetts. Image provided by Charles Sullivan, Cambridge Historical Society.

Fresh Pond, one of Brewster’s favorite birding spots, is a case study in the harmful effects of human engineering. Before Fresh Pond was made a public park in 1884, hunters decimated the migratory duck populations that had once been abundant. The establishment of the park allowed the ducks to return, but at the cost of marsh birds: the vegetation at the water’s edge was cut down for a perimeter path, and some of the pond’s coves were filled in. With their habitat gone, several Rail species that Brewster commonly encountered around the pond disappeared. According to eBird, they’ve rarely been seen since. Even as the duck population rebounded, the city charged policemen with shooting their guns to scare them off, afraid that they would pollute the municipal water supply.

eBird Cooper's Hawk sightings in the Cambridge region for 2014.

House Sparrows may be here to stay, but not all the damage has been permanent. Cambridge is working to restore Fresh Pond as a sanctuary for local wildlife. Cooper’s Hawks are now spotted regularly in the city. And Great Horned Owls and Wild Turkeys reside in Mt. Auburn Cemetery, where Brewster is buried. As we continue to make progress, the historical information provided by Brewster and others serves as a guide to conservation, filling in critical gaps in the stories of hundreds of bird species. 

Making Brewster’s writings available on BHL is an important step, but the work doesn’t end there. In order for his journals and diaries to be truly useful, they need to be converted to searchable text files. Ordinarily, a computer would do this using Optical Character Recognition (OCR), but because the technology has trouble reading cursive handwriting, transcriptions must be typed out one page at a time. Thanks to a grant from the Institute for Museum and Library Services (IMLS), volunteers and project partners at the Ernst Mayr Library are doing just that. Read about BHL’s involvement in the Purposeful Gaming grant, and if you want to help, try your hand at transcribing Brewster’s diaries and journals. By making these writings accessible, we can reach into the past to find information that will help us plot a course for the future. 

Patrick Randall | Ernst Mayr Library, Museum of Comparative Zoology, Harvard University

Thursday, January 22, 2015

Wildflowers of Ecuador: Watercolors and eBooks

Missouri Botanical Garden, Peter H. Raven Library's first eBook: Wildflowers and Landscapes of Ecuador: The Way We Knew It.

Every now and then an unusual and exciting opportunity arises to digitize a very unique item. Such an opportunity arrived in the email box of Doug Holland, the director of the Peter H. Raven Library at the Missouri Botanical Garden, one afternoon in January 2014. Anne Hess, daughter of artist Mary Barnas Pomeroy and grand-daughter of artist/teacher Carl Barnas, had decided to donate a collection of artwork and her mother’s unfinished manuscript to the library. It was with great honor that the Raven Library accepted this collection. Not only are the paintings themselves beautiful, but the backdrop to history that this collection provides is also fascinating.

Orchidaceae. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

Escaping the rise of Nazism in 1930s Laubach, Germany, the Barnas family travelled to Quito, Ecuador, after a one-year stay in Czechoslovakia. It is in Ecuador that most of the artwork was created. Ms. Pomeroy created detailed watercolors of many botanical specimens, but she also labeled their location and sought to identify each specimen.

Ms. Pomeroy’s interest in botanical illustrations piqued at the suggestion of her father. In 1938, she began the first of what would become more than 200 illustrations, when she decided to start compiling all the information six years later for a book that would contain her numerous illustrations. She decided to focus on illustrating smaller plants and fungi with the thought that the larger specimens might have already been described as “the more attractive ones might have been easier noticed and better known to science.” The 41 different botanical families were examined under a magnifying glass to be able to replicate all visible details.

Ericaceae. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

As her portfolio kept increasing, she thought it prudent to try to identify and label each of the specimens she had already painted. Although never formally trained as a botanist herself, she enlisted the assistance of many other botanists, such as Dr. Alfredo Paredes of the Universidad Central del Ecuador and Dr. Francis Pennel of the Academy of Natural Sciences of Philadelphia, to help identify or confirm the identity of various specimens to label each one as accurately as possible. Ms. Pomeroy was also employed at various times as a botanical illustrator; for instance, she worked for the University of California – Berkeley’s Herbert L. Mason who was laboring on A Flora of the Marshes of California.

After a more than 50-year hiatus, Ms. Pomeroy returned to her book of botanical illustrations of plants and fungi from Ecuador. Unfortunately, she passed away before being able to complete her book.

During her life, Ms. Pomeroy enthralled Dr. Peter H. Raven, former president of the Missouri Botanical Garden, with her early Ecuadorian illustrations. This led to a showing of the artwork both at the Missouri Botanical Garden in St. Louis and in various other cities.

Caritas rojas / Alonsoa meridionalis. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

This is where Ms. Pomeroy’s daughter, Anne Hess, stepped into the foreground and offered her mother’s collection to the Raven Library. Upon learning that most of the artwork was meant to tie-in with an unpublished manuscript, suggestions arose that the unpublished manuscript should evolve into the library’s first e-book publication entitled Wildflowers and Landscapes of Ecuador: The Way We Knew It.

Ms. Pomeroy had already sorted all the images into major categories, which facilitated easy pairing of the images with their respective chapters and descriptions.

Viguiera sodiroi. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

In addition, Ms. Pomeroy had written a short fictional story titled “An Indian boy meets Mount Pichincha's flowers,” which is included toward the end of the book. The reader is taken on an expedition through the eyes of a young boy, Hilario, as he collects botanical specimens with a botanist named Professor Flores.

View Mr. Barnas’ landscapes and Ms. Pomeroy’s enchanting botanical illustrations, as well as relive her experiences when you peruse this wonderful work on the Biodiversity Heritage Library’s website.

Senna hirsuta. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

Epidendrum secundum. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

Campanea sp. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

Cavendishia gilgiana. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

Cephalis tomentosa. Pomeroy, Mary Barnas. Wildflowers and Landscapes of Ecuador: The Way We Knew It (2015).

Randy Smith
Image Technician | Missouri Botanical Garden  

Tuesday, January 20, 2015

Finding Agriculture among Biodiversity: Metadata in Practice

The Biodiversity Heritage Library is committed to providing free and open access to over 500 years of natural history literature from across the globe. Towards that goal, the Library currently contains over 45 million pages of biodiversity content, representing over 155,000 volumes and 90,000 titles.

However, hosting the content online is just part of our vision to "inspire discovery through free access to biodiversity knowledge." Users must be able to identify content relevant to their work and interest from amongst this vast corpus. For this, metadata is all-important.

Metadata is "data that describes other data." BHL's metadata describes the digital resources in our collection, providing not only author, title, volume, publication year and place, and article information, but also keywords describing the topics discussed within each book. Eventually, full-text searching will also allow users to search across the actual text within a book to discover items relevant to their search parameters.

Dr. Jane Bromley, with her home-grown banana plant.
Dr. Jane Bromley knows the importance of metadata all too well. It is a critical component of her daily work.

Bromley is a research fellow at The Open University, where she has worked since 2012 as part of a subgroup of the Natural Language Processing group. Under the EU FP7 funded agINFRA project, which aims to promote data sharing in agricultural sciences, Bromley's subgroup studies information extraction from legacy biodiversity literature. Specifically, they are seeking to enhance an existing specialist agricultural resource, AGRIS.

AGRIS is a collaborative network of more than 150 institutions providing free access to agricultural information in the form of more than 7 million bibliographic references on agricultural research and technology. This multilingual bibliographic database also contains links to related data resources on the Web.

BHL contains a vast amount of agricultural information. Searching the subject "agriculture" alone produces over 2,100 books and journals in BHL. Recognizing the potential of these resources, Bromley and her colleagues, including Dr. David King and Dr. David Morse, developed a process to enhance AGRIS with BHL content. They relied on BHL's metadata to do this.

"BHL is a unique resource for agricultural researchers," explains Bromley. "Its long-term view can prove invaluable in locating wild relatives of crops and understanding their relationship to local habitats and ecosystems. It is the only way to access this breadth of biodiversity literature electronically."

To filter BHL for relevant agricultural content, Bromley and her team downloaded the "Title Table" file from BHL, which is a list of all titles available in BHL with the associated URL and Call Number. After also downloading the bibliographic information for each item in MODS format, Bromley used the Call Number field to discriminate agricultural material using the Library of Congress Classification LCC scheme. While Call Numbers are alphanumeric codes that identify the shelf location of an item in a library, the specific combination of letters and numbers used allows the library to arrange items according to subject, and thus also serves as an indication of the subject of the book itself.

"I selected items with code starting “S, SB, SD, SF, SH, SK” meaning class Agriculture or one of its subclasses," articulates Bromley. "I then selected only those items whose bibliographic data said they were of genre: book, thesis, article or bibliography, as AGRIS only accepts: books, book chapters, thesis, journal articles, conference papers, and bibliography. BHL contains complete journals rather than journal articles, so these were not included. That meant that items such as Canadian Journal of Agricultural Science or Journal of Agricultural Research were omitted. For items that passed both of these matches I also scraped the item’s URL to recover the location of the PDF image of the item and the OCR text."

Finally, Bromley wrote a new MODS file for each item matching her filtering criteria, resulting in 12,645 MODS files with the URLs to the associated PDF and OCR files appended. Each item's MODS data was converted and included in AGRIS by the AGRIS team, and are available here

Bromley's experiments with filtering BHL are written up as a conference paper that she presented at the 8th Metadata and Semantics Research Conference in Karlsruhe as part of a special track Metadata and Semantics for Open Repositories, Research Information Systems and Data Infrastructures, jointly chaired by Imma Subirats (FAO) and by Nikos Houssos (Greek National Documentation Centre). You can download a copy of the paper here:

Bromley's filtering strategy resulted in high precision but lower recall. Nearly all of the selected material was about agriculture, but many items that also contain agricultural information but are classified under different call numbers were omitted. To address this omission, Bromley would eventually like to download the OCR text for every item and mine it for agriculturally related terms.

David Livingstone's Missionary Travels and Researches in South Africa is a fantastic example of materials that are missed when strictly Call Number strategies are employed.

"This was the document that proved to me that in order to find all the relevant agricultural material in BHL we need to mine the whole texts," emphasizes Bromley. "It’s a seminal book in the British consciousness, which I never thought I’d get to read as part of my research. And, interestingly, it turns out to be a key document in my research. Despite containing information about: domestic animals, The Boers as Farmers, Discovery of grape-bearing vines, The sugar-cane, Coffee Estate, and Coffee Plantations amongst others (all listed in the table of contents), which are all relevant to agricultural research, there is no way to tell from the title or the Subjects that it contains these nuggets."

"Bakalahari women filling their egg-shells and water-skins at a pool in the desert." Livingstone, David. Missionary Travels and Researches in South Africa (1858).

BHL staff are constantly working to improve our Library's metadata. Ongoing work to enhance pagination information, merge duplicate title and author entries, associate related titles through bibliographic hyperlinks, and generally correct any errors that exist is tackled by a team of librarians throughout our consortium. Furthermore, current projects such as Mining Biodiversity and Purposeful Gaming are supporting OCR correction that will eventually enable full-text searching. The NEH Art of Life project has not only allowed us to systematically identify and manually classify illustrations throughout the BHL corpus, but in the next few weeks we will be calling for volunteers to help us tag a set of images in Flickr with keywords that describe the illustrations' content. These tags will eventually be ingested into BHL to further enhance our metadata.

You can help us improve our metadata! If you notice a problem or area for improvement, send us feedback! Real BHL librarians will not only answer your feedback but use it to help direct our metadata improvement efforts.

We are honored to present Dr. Bromley's work using BHL and its metadata to enhance biodiversity databases, expanding the reach of our content to new audiences and supporting a wide range of research initiatives. Do you have an example of how you've used BHL to support your research? Tell us about it by sending an email to