Tuesday, May 8, 2012

Partying with BHL: Tagging Flickr Images for EOL

BHL has over 38 million pages of taxonomic literature, freely available worldwide to anyone with an Internet connection. But if you think BHL is just about text, you'll be pleasantly surprised. BHL books also contain thousands of gorgeous natural history illustrations from the past 500 years. We wanted to provide better access to these images, and thus the BHL Flickr was born. BHL's Flickr currently contains more than 30,000 images.

The Encyclopedia of Life, with which BHL is closely associated, is an online encyclopedia dedicated to creating a web page for every species on earth. These pages contain information about each species, links to mentions of those species in BHL, distribution maps, and a myriad of media, including images. EOL harvests many of their images from Flickr, including BHL images that are tagged with a species name machine tag. See an example of an EOL species page with a BHL image here.

Machine tags are tags specially formatted to allow machines to read and understand them. For EOL, these tags tell machines which species (or other taxonomic designations) are depicted in each image. The format for EOL machine tags is "taxonomy:binomial=Genus species". You can replace "binomial" with another taxonomic tag, such as "genus" or "family", if you can only identify the organism at that level. Learn more about the Flickr tagging process and machine tag formats in our previous blog post

While BHL is working on ways to automatically add species tags to images in Flickr (learn more in the post about our recent NEH grant), the process is currently a manual one, requiring users to identify the species in each image with a taxonomic machine tag so that it can be ingested into EOL and associated with the correct species page. With over 30,000 images in the BHL Flickr, staff need help to get these images tagged. To facilitate this process, staff decided to call on the power of the masses and host a Flickr tagging party at the Smithsonian Institution.

The Flickr Tagging Party at the Smithsonian Institution.
Smithsonian employees were invited to gather on April 25th, 2012, for a 1 1/2 hour meeting at which BHL and EOL staff gave overviews of their respective projects and instructions on how to add taxonomic machine tags to BHL images. After a brief tutorial (see the tutorial on the EOL Flickr page), guests were encouraged to begin tagging images from a prepared list of books. Users run into a variety of challenges when tagging images, including outdated species names (users are encouraged to tag images with more modern names), plates without species names, and names in foreign languages or in fonts that are difficult to read, among other things. While the books identified on the prepared list were selected to minimize these difficulties, attendees nevertheless unavoidably encountered many of these issues. Thus, BHL and EOL staff were positioned throughout the room to provide assistance when needed.

The tagging party was an overall success and excellent learning opportunity for both guests and staff. Approximately 170 images were tagged by the 23 attendees. A slightly high tag error rate has prompted staff to refine instructions and develop a simpler format for future events. Additionally, a survey was sent to attendees to allow staff to identify further areas for improvement.

Staff hope to host many more of these events. Incorporating changes based on lessons learned during this first attempt, several more staff parties are planned for the coming months. Later this summer, staff plan to host an event for a natural history society in the Washington, D.C. area before finally hosting the first public tagging party, most likely stationed at the Smithsonian Institution. If you're interested in learning more about tagging Flickr images or perhaps participating in future public events, send us feedback. Be sure to check back on our blog regularly for more information about Flickr, EOL, and machine tags!

Visit the BHL Flickr page here: http://www.flickr.com/photos/biodivlibrary/sets/


6 comments:

CN said...

Interesting exercise, and good to hear about the progress. Wondering about the machine tags for species names. In using an index like ITIS/CoL or WoRMS, I found several possible synonyms to use when tagging plates of decapods from Ireland. Would it be a good idea to tag with the name as seen on the plate, in addition to the valid (current) synonym? Or is it desirable to use the same name as uBio shows for the OCR-read page (if available) on BHL?

CN said...

Interesting exercise and good to hear about the progress. I am curious about the choice of taxonomic names for the machine tags. On the BHL document, uBio may list a synonym for an OCR-read page. For the image plates, is it preferred to tag with this same name? Or one based on another authority, i.e., ITIS, WoRMS? Alternatively, users could tag based on the original name, as visible on the plate on Flickr. Perhaps then an authority would recognize the tag as one synonym and link to the updated name.

Biodiversity Heritage Library said...

@CN, Thanks for the comments! We're glad you're interested in our tagging process.

When it comes to tagging images in BHL for EOL, it is definitely preferable to tag with the modern, valid name, as these images will definitely to incorporated into the correct species page on EOL, or, if no page exists, a correctly-named page will be created. So, if you know the modern, valid name, just tag with that.

uBio is a good place to get names from. If BHL lists a name from uBio for an image, you can typically trust that name and tag the image with it. ITIS or WoRMS are also both great, so it really just depends on which authority you're able to find a modern name under or which you prefer to use. All of these various authorities should be linked in EOL under 1 species page, so using a name from any of those authorities should get it onto the correct page in EOL.

Tagging with the original name on the plate should be a last resort. If you cannot identify a modern name for the species, using either uBio, ITIS, WoRMS, or another authority, then tag with the original name. However, if you know the modern name, there is no need to add a tag with the original name. Having multiple species names coming from different genera (which is often the case with outdated vs. modern names) can cause hiccups in the EOL algorithm and cause the image to be ignored and not inserted into any species pages.

So, short story:

Always tag with the modern, valid name if you know it.

If uBio is giving you a name associated with the page, use it. Otherwise, try other name authorities like ITIS.

If all else fails, tag with the name as it appears on the plate.

We hope this answers your question. Please don't hesitate to let us know if you have more questions or to let us know how your progress goes!

Biodiversity Heritage Library said...

Update:

BHL and EOL staff have met several times to review and revise the Flickr tagging process based on user feedback. In an effort to capture as much information as possible, and make the process as simple as possible for users, the following revisions have been implemented:

* Users are encouraged to submit as many species name machine tags (at the same hierarchical level) as possible. They are encouraged to tag images first with the name as it appears on the image, and secondly with the modern species name, if known.

* BHL images tagged with species name machine tags will be ingested into EOL on a weekly basis.

* All tags submitted to a single image in the BHL Flickr should, if possible, be at the same hierarchical level (i.e. all binomials or all genus-level tags, etc.). Multiple tags at varying hierarchical levels may result in the image being ignored by EOL and not associated with any species pages.

Stay tuned for news about future tagging events, and if you have questions, leave a comment on this post or send feedback to http://www.biodiversitylibrary.org/Feedback.aspx

Anthony Goddard said...

Cool start, but why not extend this to the real crowd? 23 people hopefully provided valuable feedback, but 23 people isn't really calling "on the power of the masses" - have you considered writing an interface to encourage users online to participate, something like the taxonomizer app? http://bioblitz.tdwg.org/taxonomizer

Biodiversity Heritage Library said...

Hi Anthony,

Our ultimate vision is definitely to extend this to a larger audience. We hope to have public events soon, which we'll be sure to write about in a post on our blog.

We recently met with folks at the Encyclopedia of Life to discuss ways to create instructional materials for the public that would guide them in these tagging activities without us having to host physical events for them to attend. We have several solid ideas on how to proceed with these tutorials and supplemental materials, and now all that is required is to gather the resources (staff time particularly) to accomplish them.

We had not heard of the Taxonomizer app, but it looks like a particularly interesting approach to the issue. We'll definitely look into it as we move forward with these events!

Keep checking back on our blog for more updates about these events. Thanks for the comments, and don't hesitate to send other thoughts our way!