BHL has deployed a new taxonomic name finding tool to improve the speed and accuracy of identifying names throughout its 58+ million pages.
BHL is now usingGlobal Names Architecture’s (GNA) gnfinder tool to locate taxonomic names in the BHL corpus. Prior to this deployment, BHL’s name finding services were based on an index of scientific names created by GNA developers six years ago by parsing every page in BHL one by one. This took 45 days to accomplish, and the cost of repeating this process made updating or improving the index infeasible.
The gnfinder tool uses fast, scalable programming languages to significantly reduce computational time. Using Open Source applications in Go and Scala, the tool detects candidate scientific names and compares them to millions of scientific name-strings aggregated by GNA for verification. The new process decreases the time needed for name detection and name verification from 35 days to 5 hours and from 7 days to 12 hours, respectively. As a result, the entire BHL corpus can now be indexed in less than a day, compared to the 45 days needed for the previous index. Additionally, by significantly reducing computational time, implementing iterative improvements to the index is now achievable.