BHL has deployed a new taxonomic name finding tool to improve the speed and accuracy of identifying names throughout its 58+ million pages.
BHL is now usingGlobal Names Architecture’s (GNA) gnfinder tool to locate taxonomic names in the BHL corpus. Prior to this deployment, BHL’s name finding services were based on an index of scientific names created by GNA developers six years ago by parsing every page in BHL one by one. This took 45 days to accomplish, and the cost of repeating this process made updating or improving the index infeasible.
The gnfinder tool uses fast, scalable programming languages to significantly reduce computational time. Using Open Source applications in Go and Scala, the tool detects candidate scientific names and compares them to millions of scientific name-strings aggregated by GNA for verification. The new process decreases the time needed for name detection and name verification from 35 days to 5 hours and from 7 days to 12 hours, respectively. As a result, the entire BHL corpus can now be indexed in less than a day, compared to the 45 days needed for the previous index. Additionally, by significantly reducing computational time, implementing iterative improvements to the index is now achievable.
Unpaywall finds (legally) open access versions of paywalled literature. Thanks to the work of Richard Orr, Unpaywall’s Lead Developer, BHL is now one of the sources indexed in Unpaywall’s database. As of this week, 43,000 journal articles on the BHL website are now discoverable via Unpaywall.
The Biodiversity Heritage Library (BHL) has added functionality to allow BHL Partners to upload transcriptions in place of the automatically-generated OCR (Optical Character Recognition) for archival materials digitized in BHL. This functionality supports transcriptions generated as part of Partner crowdsourcing projects on Smithsonian Transcription Center, DigiVol, and From the Page.
The Global Names Project held a workshop on 17-19 June 2019 on the Campus of the University of Illinois at Urbana-Champaign. The workshop was titled Scientific names indexing and data mobilization of Biodiversity Heritage Library using tools from Global Names project and was hosted by the Species File Group at the Illinois Natural History Survey. Eighteen people attended representing a variety of organizations interested in BHL content: Global Names Architecture, iDigBio, TaxonWorks, UIUC Species File Group, the Illinois Library, Encyclopedia of Life, the DINA Project, the Catalogue of Life, GBIF, Species File Group Argentina, the HathiTrust Research Center, and Global Biotic Interactions.
We’ve added functionality to the BHL book viewer that makes it easier to generate a PDF for an article.
When you are viewing an article that has been defined in BHL, you can now quickly and easily generate a PDF of that article using our new “Download Article” option in the “Download Contents” dropdown menu.
BHL’s existence depends on the financial support of its patrons. Help us keep this free resource alive!
The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at the Smithsonian Libraries and Archives in Washington, D.C., BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access as part of a global “biodiversity community.”
Sign up to receive the latest news, content highlights, and promotions.Subscribe Now
Subscribe to the blog RSS feed to stay up-to-date on all the latest BHL posts.Access RSS Feed