Biodiversity Heritage Library - Program news and collection highlights from BHL
  • Home
  • News
  • Featured Books
    • All Featured Books
    • Book of the Month Series
  • User Stories
  • Campaigns
    • Fossil Stories
    • Garden Stories
    • Monsters Are Real
    • Page Frights
    • Her Natural History
    • Earth Optimism 2020
  • Tech Blog
  • Visit BHL
Home
News
Featured Books
    All Featured Books
    Book of the Month Series
User Stories
Campaigns
    Fossil Stories
    Garden Stories
    Monsters Are Real
    Page Frights
    Her Natural History
    Earth Optimism 2020
Tech Blog
Visit BHL
  • Home
  • News
  • Featured Books
    • All Featured Books
    • Book of the Month Series
  • User Stories
  • Campaigns
    • Fossil Stories
    • Garden Stories
    • Monsters Are Real
    • Page Frights
    • Her Natural History
    • Earth Optimism 2020
  • Tech Blog
  • Visit BHL
Biodiversity Heritage Library - Program news and collection highlights from BHL

All posts tagged with internet-archive

Blog Reel, Tech Updates

OCR Improvements: An Early Analysis

Read the full blog post

Optical character recognition (OCR) plays a critical part in BHL’s contributions to the scientific community. OCR in and of itself is a remarkable achievement, converting images of typewritten text to computer-readable text with “pretty good” accuracy. OCR on handwritten text is an even greater challenge to address and is beyond the scope of the improvements discussed here. The scientific work that BHL supports demands the best accuracy that we can provide using available tools, and let’s be honest, available budgets.

Recently, our colleagues at the Internet Archive made the transition away from the ABBYY FineReader OCR software to the Tesseract Open Source OCR engine. Over the past year or more, the OCR team at the Internet Archive has adapted and fine-tuned Tesseract to their workflows. Our first impression is that Tesseract OCR is more than “pretty good” in its ability to identify text from the page images provided to it.

The downside to this is that the Internet Archive has rightfully chosen to not re-process all existing text content through the Tesseract OCR engine. This is a prohibitively expensive and time-consuming prospect given that they have 35 million text-based items and reprocessing them would take several years and use up resources that could otherwise be used for gathering new content.

However, in the interests of supporting the efforts of the BHL community, the BHL Tech Team is working with our Internet Archive partner to reprocess some of BHL’s oldest content with the newest available version of Tesseract OCR. We are currently in a testing phase, and this blog post details some of our early results.

Continue reading
July 19, 2022byJoel Richard
Blog Reel, Featured Books

Field Note-Worthy: Thousands of Field Notes Now Available in BHL Thanks to the Field Notes Project!

Read the full blog post

In February 2016, the Biodiversity Heritage Library set out to digitize over 450,000 pages of field notes. While the BHL had already added some archival material to its collection before this project, the Field Notes Project is BHL’s largest undertaking of digitizing field notes to date.

We finished work on the project May 31, 2018 and are pleased to report that the project team digitized over 517,000 pages of field notes! 

Continue reading
June 28, 2018byAdriana Marroquin
Blog Reel

Update re: Internet Archive Outage 8/4/2017.

UPDATE: Internet Archive is back online. Page images are now correctly displaying in BHL. If you experience continued issues, please submit feedback.Thank you for your patience!—————————————————————Internet Archive is experiencing an outage on 4 August 2017. As a result, page images are not displaying in BHL. We apologize for the inconvenience, and we will update this post and social media as the status changes. Thank you for your patience and #StayTuned.
Continue reading
August 4, 2017byMichelle Strizever
Blog Reel, Featured Books

The Southern Cultivator

Read the full blog post

The Expanding Access to Biodiversity Literature (EABL) collection has grown rapidly over the last year, with the addition of born digital material and in-copyright titles scanned by various BHL member libraries. It wasn’t until recently, however, that the collection included titles contributed directly by non-BHL members. This process—a significant departure from usual BHL workflows—is part of EABL’s effort to digitize valuable content from organizations outside the consortium.

Continue reading
June 15, 2017byPatrick Randall
BHL News, Blog Reel

Building Digital Field Notes Collections Together

Read the full blog post
At Internet Archive, we are excited to provide digitization services for BHL Field Notes Project contributors from coast to coast. We will be digitizing our partners’ selected field notebooks at two of our eight North American regional digitization centers: San Francisco, CA and Princeton, NJ and providing remote services for the American Museum of Natural History.

At regional centers, Internet Archive operators upload metadata from contributing partners, capture high-quality digital images using our Scribe system, then review each image for completeness and add structural metadata as appropriate.

Continue reading
March 2, 2017byElizabeth MacLeod
BHL News, Blog Reel

Internet Archive Library Leaders Forum 2016

Read the full blog post
At the end of October I attended the Internet Archive Library Leaders Forum 2016. This was the 3rd time I’ve attended this meeting since 2009 and was by far the best one yet! The Forum coincided with IA’s 20th anniversary so there was a big push from IA to showcase their latest and greatest to celebrate their platinum year.     The most successful aspect of the Forum was meeting with Internet Archive colleagues and partners face to face, many of which share similar digitization workflow and collection management challenges to BHL.
Continue reading
December 21, 2016byBianca Crowley
BHL News, Blog Reel

BHL Welcomes Two New Affiliates

Read the full blog post
The Internet Archive, a non-profit institution based in San Francisco and long-time BHL partner in digitization efforts, and the Naturalis Biodiversity Center, one of the world’s premier natural history museums, based in Leiden, The Netherlands, have joined the Biodiversity Heritage Library as Affiliates. These new partnerships will allow BHL to expand the breadth and depth of its online collection and strengthen the consortium’s technical expertise.
Continue reading
March 4, 2016byGrace Costantino
Page 1 of 3123»

Help Support BHL

BHL’s existence depends on the financial support of its patrons. Help us keep this free resource alive!

Donate Now

search

About BHL

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at the Smithsonian Libraries and Archives in Washington, D.C., BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access as part of a global “biodiversity community.”

Follow BHL

Join Our Mailing List

Sign up to receive the latest news, content highlights, and promotions.

Subscribe Now

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Subscribe to Blog Via RSS

Subscribe to the blog RSS feed to stay up-to-date on all the latest BHL posts.

Access RSS Feed

BHL on Twitter

Tweets by @BioDivLibrary

Inspiring Discovery through Free Access to Biodiversity Knowledge.

The Biodiversity Heritage Library makes it easier than ever for you to access the information you need to study and explore life on Earth…for free, anytime, anywhere.

62+ Million Pages of
Biodiversity Literature Online.

EXPLORE

Tools and Services
to Transform Research.

EXPLORE

300,000+
Illustrations on Flickr.

EXPLORE

 

ABOUT | BLOG AUTHORS | HARMFUL CONTENT | PRIVACY | SITE MAP | TERMS OF USE

Download Adobe Acrobat Reader