NDSR Residents Update

Hello again from the NDSR Residents! Since our last update in July we’ve been focusing on transforming our research into recommendation outlines that we presented to the BHL Tech Team last week. As we head into the final quarter of our residencies, we’ll be focusing on tweaking these ideas, developing workflows and proof of concepts, and finalizing our recommendations in a Best Practices White Paper by December. For this update, we wanted to give a preview of what some of these recommendations will look like and invite some preliminary feedback from the BHL Blog-o-sphere that we can consider as we move into these final months.

Katie has been evaluating current and investigating long term crowdsourcing projects to enhance BHL data and metadata. This has mostly focused on manuscript transcriptions and OCR corrections, but has broadened to include data extraction, named entity recognition, and optimizing, cleaning, and disambiguating collections data for use in large scale computational research.

Her recommendations include the following:

  1. Develop a sustainable, long term transcriptions and corrections crowdsourcing platform in which users identify items to correct or transcribe, tag text with scientific and common names, locations, events, dates/times, and other valuable observation data, and enjoy immediate access to updated text;
  2. Since crowdsourcing is likely not going to scale up to meet the transcription and corrections needs of 52+ million pages in BHL, staff and partner institutions should continue to investigate automated data recognition and extraction methods. Crowdsourced data will likely prove to be valuable training data for future algorithms;
  3. Add already transcribed content to the portal as plain, not marked up text;
  4. Disambiguate and add authority control to bibliographic metadata; and
  5. Donate data to Wikidata to expose collections to the semantic web. A linked data knowledge based, Wikidata will allow users to connect content in BHL to related collections and information across data repositories. Focus on bibliographic metadata to ultimately enrich structured citations across Wikimedia Foundation projects including Wikipedia, Wikimedia Commons, WikiSpecies, and Wikisource; and BHL’s rich index of taxa that can improve the discovery of protologues and other heritage descriptions and treatments in BHL literature.

Alicia has been performing content analyses on the BHL corpus to determine how much of the world of biodiversity literature has been digitized by BHL thus far in order to focus future digitization efforts. These analyses have required her to dive into some data and text mining to create proof of concepts for analyzing the 50+ million pages of BHL.

Her recommendations include the following:

  1. Improving BHL data exports and documentation to encourage more users to manipulate BHL data and look at the “collection as data”;
  2. Adding BHL citations to Wikidata; and
  3. Exploring filtering scientific names into kingdoms for browsing BHL content and for targeting underrepresented taxa in BHL.

Ariadne has approached the goal of searching and browsing for illustrations in the BHL portal with a wide lens: Understanding its strategic context, conveying lessons about data production and engagement from BHL’s illustration crowdsourcing efforts, and investigating the role of illustrations in the scholarly research cycle and among the BHL portal’s taxonomic users. She is looking forward to trying her hand at the interface design process, developing proof of concepts, and continuing to work with collaborators and advisors towards well-rounded final recommendations.

Her preliminary recommendations include:

  1. Build upon scientists’ and the public’s mutual love of illustrations to further the cause of biodiversity (ex. Inviting scientists to share information about their work, species, and history of the field using illustrations as a touchpoint);
  2. Fulfill the desire of crowdsourcing volunteers to make information accessible according to personal or group interests, within constraints of limited management (ex. Pursuing connections with the Wikipedia community); and
  3. Pursue computer vision as a method of data production.

Pam has been distributing and analyzing user surveys to inform the next version of BHL. Over the summer, a survey was posted on the BHL website to capture the feedback of individual users coming to BHL for their research needs. Currently, two surveys are in progress – one gathering the feedback of those users affiliated with the consortium of BHL libraries, and the other seeking the input of organizations and individuals who use BHL at the system level.

Results are still preliminary at this time for the first survey, but her initial recommendations include:

  1. Performing usability studies to help inform the design process after seeing navigation and user interface issues appear in the survey comments;
  2. Conducting focus groups to delve further into user needs and priorities for requirement gathering of particular features and enhancements; and
  3. Focusing on top needs identified by users in the survey: -improving search and browse; -providing a more streamlined download experience; and -enhancing named entities, including author name, scientific name, and geographic name.

Marissa has been researching best practices in digital libraries to make recommendations for Version 2 of the BHL portal. Through examining Europeana, the Digital Public Library of America, Trove, and other large-scale digital libraries as case studies, she is researching which tools, services, and standards are being used to present their collections.

Her preliminary recommendations include:

  1. Utilizing BHL’s current APIs to improve website functions. APIs from other libraries have been used to improve access to content in many ways, including letting users search collections by item type, by creating Twitter bots to broadcast thematic or random items, and offering a Google Chrome extension to showcase high-res images, to name a few;
  2. Exploring data visualizations through tools including Kumu and Tableau. Visualizing BHL’s data would not only add another discovery layer to BHL but would also help BHL staff and members track statistics as well. In the coming months, Marissa will be working with Tableau to create visualizations; and
  3. Create a beta site when BHL Version 2 is in development. To ease users into a completely revamped website, having a beta site active before making the full transition over will help users get accustomed to new features.
Avatar for Marissa Kings
Written by

Marissa Kings is a BHL NDSR Resident for the "Foundations to Actions" project. She is stationed at the Natural History Museum of Los Angeles County, and her project, “Digital Library Best Practices Analysis”, focuses on identifying high value tools and services used by large-scale digital libraries which might be applied to the next generation of the Biodiversity Heritage Library.