BHL News, Blog Reel

BHL and Culturomics

On December 16, 2010 Science released a paper, “Quantitative Analysis of Culture Using Millions of Digitized Books” that describes data mining research using a vast textual archive created by the Google Books. The abstract reads, “We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of “culturomics”, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. ‘Culturomics’ extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.”

The paper and subsequent commentary has accelerated nascent efforts at macroscopic, algorithmic questioning of large historical textual data sets. Can similar methods be applied fruitfully to the BHL corpus?

Already, Rod Page, bioinformatician and developer of BioStor, has demonstrated suggestive evidence in the affirmative by tracking a small sample of species names for the same organism in BHL texts through time and plotting the number of citations. The graph may be a visual representation of scientific debate and usage. Many other uses are possible, including:

Co-occurrence of place names with species
Frequency of co-occurrence of species names esp. with key words such as host, prey, predator, symbiont etc.
Tracking trends in zoological and botanical research by tracking methodological terminology through time.
Identification of taxonomically significant “events” in the literature based on textual cues.

Much of the follow-on activity to the Science paper is occurring in the “Digging into the Data” program. Thus, on May 9, the BHL made its data available for researchers in the Digging into the Data program.

View Full Size Image

BHL Director, Tom Garnett, will be attending the conference, “Digging into the Data” in June where speakers, including the authors of the Science article, will address issues of and opportunities in data mining of large textual corpora. With suitable partners, it is possible that we can seek NSF or Google funding for the unique use case our increasing text corpus presents. The framework for a proposal would be a team of biologists and a team of computer scientists posing research questions for the BHL corpus that would be amenable to algorithmic investigation. Even if funding is not forthcoming, if third party researchers use the BHL corpus to produce scientifically or historically salient results, it will enhance the value and use of the BHL, which can lead to further collaborations.

algorithmic, biostor, culturomics, google, science

May 16, 2011

Written by Grace Costantino

Grace Costantino served as the Outreach and Communication Manager for the Biodiversity Heritage Library from 2014 to 2021. In this capacity, she developed and managed BHL's communication strategy, oversaw social media initiatives, and engaged with the public to excite audiences about the wealth of biodiversity heritage available in BHL. Prior to her role as Outreach and Communication Manager, Grace served as the Digital Collections Librarian for Smithsonian Libraries and as the Program Manager for BHL.

Cancel Reply

About BHL

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at the Smithsonian Libraries and Archives in Washington, D.C., BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access as part of a global “biodiversity community.”

BHL and Culturomics

Related Posts

Leave a Comment

Cancel Reply

Help Support BHL

Search

About BHL

Follow BHL

Join Our Mailing List

Subscribe to Blog via Email

Subscribe to our Blog Via RSS

BHL on Twitter

BHL and Culturomics

Related Posts

Plants and the People Who Name Them: The International Plant Names Index and BHL

BHL and Our Users: Rod Page and BioStor

Announcing the New Biodiversity Heritage Library!

Leave a Comment

Cancel Reply

Help Support BHL

Search

About BHL

Follow BHL

Join Our Mailing List

Subscribe to Blog via Email

Subscribe to our Blog Via RSS

BHL on Twitter