BHL has deployed a new taxonomic name finding tool to improve the speed and accuracy of identifying names throughout its 58+ million pages.
BHL is now usingGlobal Names Architecture’s (GNA) gnfinder tool to locate taxonomic names in the BHL corpus. Prior to this deployment, BHL’s name finding services were based on an index of scientific names created by GNA developers six years ago by parsing every page in BHL one by one. This took 45 days to accomplish, and the cost of repeating this process made updating or improving the index infeasible.
The gnfinder tool uses fast, scalable programming languages to significantly reduce computational time. Using Open Source applications in Go and Scala, the tool detects candidate scientific names and compares them to millions of scientific name-strings aggregated by GNA for verification. The new process decreases the time needed for name detection and name verification from 35 days to 5 hours and from 7 days to 12 hours, respectively. As a result, the entire BHL corpus can now be indexed in less than a day, compared to the 45 days needed for the previous index. Additionally, by significantly reducing computational time, implementing iterative improvements to the index is now achievable.
On 10 April 2019, we will implement additions and changes to the export files available from the Biodiversity Heritage Library.
The updates involve the following:
This post was originally published on the rOpenSci blog on 28 August 2018 and is republished with permission of the author, Dr. Maëlle Salmon, and rOpenSci.
Armed with rOpenSci’s packages binding powerful C++ libraries and open taxonomy data, how much information can we automatically extract from images? Maybe not much, but, experimenting with gorgeous drawings from a natural history collection, we can least explore image manipulation, optical character recognition (OCR), language detection, and taxonomic name resolution with rOpenSci’s packages.
We’re excited to announce the launch of the new About BHL site!
What is BHL’s history? Who’s involved in the Library? What tools and services does BHL offer? How do you search, download content or access data and developer tools in BHL? How can you get involved in the Library? What projects has BHL engaged in?
Find the answers to these questions and much more information about the Biodiversity Heritage Library on our new About site at about.biodiversitylibrary.org!
The new site, which lives alongside and is linked from our digital library portal at Continue reading
BHL’s existence depends on the financial support of its patrons. Help us keep this free resource alive!
The Biodiversity Heritage Library is an open access digital library for biodiversity literature and archives. BHL’s global consortium of natural history, botanical, and research libraries cooperate to digitize and make their collections accessible as a part of a global “biodiversity commons.”
Sign up to receive the latest news, content highlights, and promotions.Subscribe Now
Subscribe to the blog RSS feed to stay up-to-date on all the latest BHL posts.Access RSS Feed