Biodiversity Heritage Library - Program news and collection highlights from BHL
  • Home
  • News
  • Featured Books
    • All Featured Books
    • Book of the Month Series
  • User Stories
  • Campaigns
    • Fossil Stories
    • Garden Stories
    • Monsters Are Real
    • Page Frights
    • Her Natural History
    • Earth Optimism 2020
  • Tech Blog
  • Visit BHL
Home
News
Featured Books
    All Featured Books
    Book of the Month Series
User Stories
Campaigns
    Fossil Stories
    Garden Stories
    Monsters Are Real
    Page Frights
    Her Natural History
    Earth Optimism 2020
Tech Blog
Visit BHL
  • Home
  • News
  • Featured Books
    • All Featured Books
    • Book of the Month Series
  • User Stories
  • Campaigns
    • Fossil Stories
    • Garden Stories
    • Monsters Are Real
    • Page Frights
    • Her Natural History
    • Earth Optimism 2020
  • Tech Blog
  • Visit BHL
Biodiversity Heritage Library - Program news and collection highlights from BHL

All posts tagged with ocr

Blog Reel, Tech Updates

OCR Improvements: An Early Analysis

Read the full blog post

Optical character recognition (OCR) plays a critical part in BHL’s contributions to the scientific community. OCR in and of itself is a remarkable achievement, converting images of typewritten text to computer-readable text with “pretty good” accuracy. OCR on handwritten text is an even greater challenge to address and is beyond the scope of the improvements discussed here. The scientific work that BHL supports demands the best accuracy that we can provide using available tools, and let’s be honest, available budgets.

Recently, our colleagues at the Internet Archive made the transition away from the ABBYY FineReader OCR software to the Tesseract Open Source OCR engine. Over the past year or more, the OCR team at the Internet Archive has adapted and fine-tuned Tesseract to their workflows. Our first impression is that Tesseract OCR is more than “pretty good” in its ability to identify text from the page images provided to it.

The downside to this is that the Internet Archive has rightfully chosen to not re-process all existing text content through the Tesseract OCR engine. This is a prohibitively expensive and time-consuming prospect given that they have 35 million text-based items and reprocessing them would take several years and use up resources that could otherwise be used for gathering new content.

However, in the interests of supporting the efforts of the BHL community, the BHL Tech Team is working with our Internet Archive partner to reprocess some of BHL’s oldest content with the newest available version of Tesseract OCR. We are currently in a testing phase, and this blog post details some of our early results.

Continue reading
July 19, 2022byJoel Richard
Blog Reel

How’s your fern and bird coverage, BHL?

Read the full blog post
“Every one knows what a bird is,” asserts an early 20th century book that I found while browsing the Biodiversity Heritage Library (BHL). As I’ve learned during my Professional Development Internship with Jacqueline Chapman at Smithsonian Libraries this summer, it’s not always that simple. Taxonomy is ever-changing, especially at the granular level needed by subject specialists around the world who use BHL to conduct research on organisms ranging from mosses to turtles to fungi. BHL is a consortial digital library whose member libraries digitize works in natural history and botany based on both user requests and subject librarians’ selections.
Continue reading
August 23, 2016byBecca Greenstein
Blog Reel, User Stories

Original Publications at our Fingertips

Read the full blog post
Systematics is the branch of biology concerned with classification and nomenclature. It is sometimes used synonymously with taxonomy. In their 1970 publication Systematics in Support of Biological Research, Michener et al. defined systematic biology and taxonomy as: Identifying species, their relationships and evolutionary hierarchies, is critical to saving biodiversity.
Continue reading
January 13, 2015byGrace Costantino
BHL News, Blog Reel

Crowdsourcing and BHL: Current Projects that Allow Users to Help Us Improve Our Library!

Read the full blog post
Recent crowdsourcing initiatives are revolutionizing scientific research, allowing the public to help scientists and researchers document, identify, and better understand biodiversity. For example, the Atlas of Living Australia’s FieldData program allows anyone to contribute sightings, photos and observational data to help researchers and natural resource management groups collect and manage biodiversity data.
Continue reading
November 6, 2014byGrace Costantino and Trish Rose-Sandler
BHL News, Blog Reel

Game Laboratory Tiltfactor Selected for the Purposeful Gaming and BHL Project

Read the full blog post

BHL and the Missouri Botanical Garden are pleased to announce a major milestone reached in the project, “Purposeful Gaming and BHL”. Dartmouth College’s Tiltfactor was chosen to design the game that will help improve access to texts from the Biodiversity Heritage Library (BHL).

The Purposeful Gaming and BHL project is based at the Missouri Botanical Garden (MOBOT) in St Louis, Missouri. In the fall of 2013, MOBOT was awarded a $449,641 grant by the Institute of Museum and Library Services (IMLS) to test new means of using crowdsourcing and gaming to support the enhancement of texts from the BHL.

Continue reading
June 30, 2014byGrace Costantino
BHL News, Blog Reel, Tech Updates

The 2012 BHL Staff & Technical Meeting

Read the full blog post

On September 27-28, 2012, thirty-one staff members representing all 14 BHL member institutions convened at the Ernst Mayr Library at the Museum of Comparative Zoology at Harvard University for the 2012 BHL Staff and Technical Meeting. As a combined meeting, it brought together not only those that manage the digitization workflow at each member institution, but also those that work to keep BHL’s technical infrastructure running smoothly and constantly improving.

Continue reading
October 9, 2012byMichelle Strizever
BHL News, Blog Reel

Wikimania 2012!

Read the full blog post

Two missions collide: Free, Open, and Global! Wikipedia we love you.

Since 2009, we have been looking at Wikipedia as a way to drive new user traffic to the Biodiversity Heritage Library while improving the content and accuracy of Wikipedia’s articles. This symbiotic relationship has had a few bumps along the way but, our recent attendance at the 8th Annual Wikimania Conference held in Washington, DC, reaffirmed our commitment to increase our Wikipedia efforts which include adding our Flickr images to the Wikimedia commons file repository as well as inserting species citations, and external links to auto-generated BHL taxon name bibliographies. During this week-long conference, we were inspired by the sense of mission, ingenuity and passion that our fellow Wikipedians demonstrated.

Continue reading
July 30, 2012byJJ Dearborn

Help Support BHL

BHL’s existence depends on the financial support of its patrons. Help us keep this free resource alive!

Donate Now

search

About BHL

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at the Smithsonian Libraries and Archives in Washington, D.C., BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access as part of a global “biodiversity community.”

Follow BHL

Join Our Mailing List

Sign up to receive the latest news, content highlights, and promotions.

Subscribe Now

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Subscribe to Blog Via RSS

Subscribe to the blog RSS feed to stay up-to-date on all the latest BHL posts.

Access RSS Feed

BHL on Twitter

Tweets by @BioDivLibrary

Inspiring Discovery through Free Access to Biodiversity Knowledge.

The Biodiversity Heritage Library makes it easier than ever for you to access the information you need to study and explore life on Earth…for free, anytime, anywhere.

60+ Million Pages of
Biodiversity Literature Online.

EXPLORE

Tools and Services
to Transform Research.

EXPLORE

300,000+
Illustrations on Flickr.

EXPLORE

 

ABOUT | BLOG AUTHORS | HARMFUL CONTENT | PRIVACY | SITE MAP | TERMS OF USE

Download Adobe Acrobat Reader