Friday, April 10, 2009

Improved handling of diacritics in BHL searches

I wanted to let everyone know about a change that has been made to the search function of the BHL portal.

Until now, letters that include diacritics (for example, ó, ö, è, é, û) were treated differently than letters without diacritics.

What this meant is that in order to find titles, authors, or subjects that included diacritics, you had to search for an exact match on the diacritic... for example, to find all titles about "invertebrate zoology", you had to search twice: once for "invertebrate zoology" and once for "invertebrate zoölogy". (Or you had to search for something like "invertebrate zo" and hope you didn't get too much extra stuff in the search results.) Obviously, there are all sorts of problems with this limitation.

Starting immediately, searches in the BHL portal are accent-insensitive, so no distinction is made between letters with and without diacritics. This means that a search for "invertebrate zoology" will now find all nine titles that contain either "invertebrate zoology" or "invertebrate zoölogy". See the search results here: Another good example is searches for "Linne", which now return instances of both "Linne" and "Linné".

While there is still more work to do to improve the search features, this is a good first step to improving the quality of our search results.

Mike Lichtenberg
Missouri Botanical Garden