In the beginning..."meh"I first became aware of the Biodiversity Heritage Library around 2007. To be honest, initially I was underwhelmed. BHL didn't seem to have much literature, what it did have was mostly about plants (I'm a zoologist by background), the interface was a bit clunky, and most of the content was pre-1923, which to me simply echoed the impression that taxonomy is a science that is something of a backwater, obsessed with ancient documents and arcane terminology.
So at the start I wasn't much of a fan. But as BHL grew it started to add more recent content, particularly for museum journals, as well as vital content such as the Bulletin of Zoological Nomenclature, and I realised that it was going to be much more useful than I'd previously thought. So I started playing with ways to visualise content from BHL, such as timelines to plot search results over time, and sparklines to show how the relative frequency of different names for the same organism would change over time (similar to the nice visualisations Ryan Schenk has done recently.)
But where are the articles?- Part 1- Part 4 (1833-38)
- 1856
- 1901, v. 1 (Jan.-Apr.)
- Jan-Apr 1906
- 1912 v. 2
- 1923, pt. 1-2 (pp. 1-481)
So any tool to find articles has to deal with these issues. But after a few experiments I decided it would be possible to find lots of articles in BHL, especially if I had access to all the BHL data on my own computers. So, I grabbed a copy of the data and created BioStor.
BioStorBelow is a screen shot of BioStor, which at the moment has over 31,000 articles from BHL.

There are two main ways to use BioStor. The first is as a website where you can browse or search for articles. You can search for articles about taxa by adding the taxon name to http://biostor.org/name/, for example http://biostor.org/name/Zonosaurus. In addition to displaying the article, BioStor displays the names found in the article as a tag cloud and a classification, and in some cases also shows a map with localities that have been automatically extracted from the text and displayed on the map, such as this example from A revision of the dwarf Zonosaurus Boulenger (Reptilia: Squamata: Cordylidae) from Madagascar, including descriptions of three new species:

The other way you can use BioStor is as an OpenURL resolver. Bibliographic software and websites such as EndNote, Zotero, and Mendeley all support OpenURL, so you can be looking at an article in one of those databases and automatically look for it in BioStor.
BioStor needs bibliographiesOne thing I've glossed over is how BioStor has managed to find thousands of articles. Some have been added manually, but this rapidly gets tedious. For the majority of articles what I've done is take an existing bibliography for a journal, or a taxonomic group, and write a small computer programme (or "script") to get BioStor to find the articles automatically. For example, I quickly added most of the articles in the journal Tijdschrift voor Entomologie becase I had an EndNote file containing those references.
I spend a lot of time searching for bibliographies, downloading them or scrapping them from websites, converting them into a readable format, then using scripts to ask BioStor to locate the article in BHL. I'm somewhat taken aback by how hard it is to get these bibliographies. If taxonomists and/or journal editors made these available, we could add many more articles to BioStor. While one approach is to beg, borrow, or steal bibliographies, I'm hoping that the rise of online bibliography databases and associated social networks, especially Mendeley, will generate the bibliographies I need to efficiently find articles in BHL.
What's next?BioStor has some obvious limitations, notably the assumption that older literature works the same way as modern articles. Whereas today figures, tables, and text are all contained within the page range of an article, it's not uncommon in older (pre-20th centruy) literature for figures and plates to be physically separate from the text. BioStor can't really handle this, so one day I plan to add the ability to have discontinuous page ranges that will include these figures and plates.
Despite the fact that I've spent a lot of time creating and populating BioStor, in reality it is a side project running on a Mac Mini on my desk. At some point it would be nice to feed BioStor's data back in to BHL itself, so users of BHL could more easily find articles without leaving that web site. BHL also has more resources for ensuring the long term survival of data than I do.
What do I think of BHL now?Despite my initial lack of enthusiasm, I now see BHL as one of the great resources of biodiversity informatics. There's some extraordinary stuff in BHL, and it keeps growing. It's also been great working with Chris Freeland, Phil Cryer, and Mike Lichtenberg, who have all been very helpful, even when I've written blog posts venting my frustration with BHL's limitations. I think it's definitely one of those cases where you only complain about the things you actually care about.
0 comments:
Post a Comment