The Power of Community Science: How Smithsonian Volunpeers Transform Scientific Field Notes
Last month, Smithsonian Libraries and Archives (SLA), Smithsonian Transcription Center (STC), and the Biodiversity Heritage Library (BHL) celebrated a significant milestone – technical staff worked collaboratively to integrate over 43,000 pages of transcription materials from STC into BHL. An additional 151,362 scientific name access points have now been added to the BHL search index for SLA archival field notes. These transcriptions enhance BHL’s full-text search, enable taxonomic name recognition, improve accessibility for vision-impaired users, and support climate research.
The Smithsonian Field Books Collection
The Smithsonian Field Books Collection is a set of primary source archival records selected from the Smithsonian Libraries and Archives (SLA). This material dates from the late nineteenth century and the first comprehensive biological survey of the continental United States to the most recently accessioned materials at the Smithsonian Institution Archives.
The collection includes personal records of naturalists and scientists such as William Healey Dall (1845 – 1927) and Cleofe Calderon (1929 – 2007) at work around the world and expedition records such as the Western Union Telegraph Expedition (1865 – 1867) of Russian America and the United States Exploring Expedition (1838 – 1842) of the Pacific Ocean.
Naturalists and scientists in the field recorded their firsthand observations and data in a wide variety of forms including diaries and journals, hand-drawn maps and tables of data, observation logs and specimen catalogs, correspondence and reports, manuscripts, sketches, photographs, even audio recordings.
Recognizing the Value of Accurate Transcription
Mining the rich information embedded in these field notes depends on accurate transcription. The majority of the field notes are handwritten – even the most recent ones. Digital surrogates provide high resolution images but fail to afford anything beyond visual accessibility. Transcription, however, opens up this material to full-text searching, pattern recognition, visual accessibility aids, and more.
Beyond the handwriting itself, the field notes also contain other non-textual information that can only be captured through transcription. Examples include ornithologist Martin Moynihan’s notations of bird song in remote regions of Central America or Charles Dolittle Walcott’s sketches and diagrams of sedimentary stratification where his fossil specimens were found in the canyons of the American Southwest.
Chief naturalist with the U.S. Department of Agriculture, Vernon Orlando Bailey’s “Journal kept by Bailey on field trip to Wyoming and New Mexico, March 15-June 1906” focused on extermination techniques of gray wolves that would bring them to near extinction in the continental United States not long afterwards. The transcript includes descriptions of the sketches he included in his notes.
The Smithsonian Field Book Project’s initial goal was to catalog these hidden biodiversity research material, improving discoverability in keeping with FAIR (Findable Accessible Interoperable Reusable) Data Principles. Doing so quickly resulted in additional researcher demand for more access and usability – first to view the field notes online (digitization) and then to examine them more closely (transcription). Grant funding assisted in a major rapid-capture digitization effort. However when it came to transcribing the digitized field notes, the Archives simply lacked the capacity to meet the level of researchers’ demand.
Turning to digital volunteers and the just-launched Smithsonian Transcription Center in 2013 changed the transcription equation beyond our best expectations both in volume and in accuracy. We quickly came to recognize these community scientists as collaborators, “volunpeers”, in the effort to advance and disseminate knowledge.
We are far from the end of this journey. More than half the collection remains to be digitized, and over two thousand digitized field notes still need transcription.
Inspired to Transcribe
These archival field notes contain vital historic biodiversity information. By transcribing the handwriting into machine readable text, volunpeers can help inform current day scientists and assist with their research on a multitude of topics such as climate change, the extinction crisis, or the spread of invasive species. Transcribing field notes can also take volunpeers on an adventure across time and distance, accompanying the writer on their journey. Having these adventures transcribed in machine readable text can also help inform science historians assisting them by making these historic documents more easily findable, searchable, and reusable.
Volunpeers collaborate to transcribe as accurately as possible the pages of the field journals provided. Multiple volunpeers will work on each project page transcribing the hard-to-read handwriting into machine readable text. Once satisfied the transcription is as complete as possible, a volunpeer will mark the transcribed page as complete. The volunpeers will then move onto the next page of the field notes until the project is finished.
Having many volunpeers working on one project helps ensure the quality of the transcription. What one volunpeer finds illegible, others may be able to read, especially as all the volunpeers working on a particular project become more familiar with the handwriting.
Volunteers have transcribed hundreds of field journals over the years. Two favorite examples have been the field journals of Vernon Orlando Bailey, a field naturalist who journeyed throughout the U.S. midwest studying and collecting mammals. Another volunpeer favorite were the papers of Arctic explorer and naturalist Robert Kennicott. Read more about these volunpeer experiences on the STC blog.
The Power of the Smithsonian Transcription Center
When the first of the field books were added to Smithsonian Transcription Center (STC), the program was still available only as a beta version. Approximately 450 volunpeers were transcribing on the site (co-author Siobhan Leachman among them), with the first 6,000 completed pages under their belt. Even during this moment of immense energy and fresh connections, it must have been difficult to imagine what STC would become. Today, a little over a decade later, more than 91,000 individual volunpeers have worked together to transcribe and review over 1.4 MILLION (!) pages of historic and scientific collections.
STC is the largest digital volunteering and crowdsourcing program at the Smithsonian Institution, and provides opportunities to engage with and contribute to digitized materials from across the full breadth of content areas represented by its museums, archives, and libraries. Through collaborative transcription and review, Smithsonian staff and digital volunteers work together to ensure that this content is more readable, accessible, and text-searchable across Smithsonian data systems and beyond.
Currently, the most active and popular projects are the Freedmen’s Bureau Transcription Project, a collaboration with the National Museum of African American History and Culture that deepens insight into the Reconstruction period and empowers African American genealogical research, and Project PHaEDRA, a collaboration with the Harvard-Smithsonian Center for Astrophysics that illuminates the work and discoveries of early women computers at the Harvard College Observatory.
If you feel inspired by this data access success story, consider joining the digital volunteer community, or sign up for our newsletter to stay up-to-date on upcoming projects.
Reusing the Liberated Data
The successful integration of transcription materials into BHL makes historical scientific data more accessible and useful. Through the collaborative efforts of technical staff from the Biodiversity Heritage Library, Smithsonian Libraries and Archives, and the Smithsonian Transcription Center and dedicated volunpeers, over 151,362 scientific name access points were added to the BHL search index, greatly enhancing search capabilities over the digitized corpus of Smithsonian field notes and archives. Out of the 556 eligible items reviewed, 522 were uploaded, contributing 43,460 pages of improved OCR text. These contributions not only improve BHL’s full-text search and taxonomic name recognition services but also provide better accessibility for vision-impaired users, and support ongoing biodiversity and climate change research.
A special thanks goes to Mike Lichtenberg, BHL’s Lead Developer and Systems Architect and Paul Day, Lead Developer at Smithsonian Transcription Center. As with many data improvement and platform enhancement projects, the requisite technical expertise is pivotal in ensuring the success of our collective efforts!
References and Resources
Dearborn, J., Lichtenberg, M., Richard, J. M., deVeer, J., Trizna, M., & Mika, K. 2023. [Presentation] Unearthing the Past for a Sustainable Future: Extracting and Transforming Data in the Biodiversity Heritage Library for Climate Action. Presented virtually at TDWG, Tasmania, Australia 2023. https://www.youtube.com/watch?v=8sGssyrpuJw
Trizna, M., & Dearborn, J. June 2023. [Poster] AI Models Are Getting Better at Reading Handwriting, but How Can We Find Handwritten Text to Begin With?. 7th Annual Digital Data Conference, Leveraging Digital Data for Conservation, Ecology, Systematics, and Novel Biodiversity Research, Tempe, Arizona, United States of America. https://doi.org/10.25573/data.23523495.v1
Dearborn, J., & Mika, K. June 5, 2022. [Poster] Extracting Expedition Log Data Found in the Biodiversity Heritage Library. Through the Door and Through the Web: Releasing the Power of Natural History Collections Onsite and Online, Edinburgh, Scotland, United Kingdom: Society for the Preservation of Natural History Collections (SPNHC). https://doi.org/10.5281/zenodo.6593457
Leave a Comment