Thursday, April 17, 2014

BHL Technical Advisory Group meets at the Missouri Botanical Garden

For the 2014 BHL Technical Meeting, the BHL Technical Advisory Group (TAG) met at the Missouri Botanical Garden (MBG) (2-3 April 2014) with William Ulate (Technical Director) and Martin Kalfatovic (BHL Program Director). Also joining the meeting were the BHL tech team based at MBG (Trish Rose-Sandler and Mike Lichtenberg), Carolyn Sheffield (BHL Program Manager), Bianca Crowley (BHL Collections Coordinator, by phone), and Connie Rinaldo (BHL Executive Committee Vice-Chair, by phone).

From left: William Ulate, Siang Hock Kia, Carolyn Sheffield, Mike Lichtenberg, John Mignault, Keri Thompson, Trish Rose-Sandler, Martin Kalfatovic, Joel Richard. Not pictured: Chris Freeland, Connie Rinaldo, Joe de Veer, Bianca Crowley)
The group met to review BHL technical development priorities, staffing and communications concerns, and related topics.  Over the course of the two-day meeting, the group discussed priorities in terms of both core BHL technical operations as well as special projects that BHL participates in.  Core operations include things such as server and website maintenance for biodiversitylibrary.org.  The Tech Team also works closely with the BHL Secretariat and BHL Staff to review user feedback related to technical issues as they arise as well as suggested improvements to BHL functionality.  For example, one area for technical development that has come to the top of the list is full text searching and we're excited to share that work is getting underway to make that a reality.

In addition, the Tech Team is also actively involved in three grant-funded projects to enhance BHL: Art of Life (NEH), Purposeful Gaming & BHL (IMLS), and Mining Biodiversity (IMLS via the Digging Into Data Challenge).  BHL also participates in technical discussions with the larger biodiversity community.  This past December, William Ulate (Technical Director) and Joe DeVeer (TAG Member) virtually attended the iDigBio CITScribe Hackathon in Gainesville, Florida.  In February, Ulate and TAG Member John Mignault participated in the NESCent-EOL-BHL research sprint.

All in all, the Tech Meeting was an opportunity to review several aspects of BHL's technical direction, identify priorities, and strategize communication workflows for existing and new priorities.


Wednesday, April 16, 2014

PDF Generation restored...

Dear BHL users:

We are glad to inform you that our IT staff has solved the technical difficulties found with our PDF Generation process.  We have tested the service and it seems to be working well.

We apologize for the inconveniences this may have caused you. Please let us know through our Feedback form if you find any issues with this or any other BHL functionality again.

Regards,

William Ulate
BHL Technical Director

Thursday, March 27, 2014

BHL and EOL team up for NESCent Research Sprint

Research teams at the NESCent-EOL-BHL Research Sprint.
Photograph by Cyndy Parr.

In early February, the National Evolutionary Synthesis Center (NESCent) hosted the EOL-BHL Research Sprint. NESCent, based in Durham, NC, is a non-profit science center supporting research in the evolutionary sciences. NESCent emphasizes an interdisciplinary approach to research, and so the idea behind the Sprint was to put together teams of programmers and life scientists to expose each other to questions and ways of thinking that they might not necessarily consider in their normal work. Informaticians could bring programming and data skills to bear on questions that scientists may not have had the programming expertise to implement effectively, using BHL's and EOL's now considerable amount of freely available data. Scientists could identify questions based on the data to programmers that they might not have considered. Plus, the meeting was useful in identifying how well researchers could identify and retrieve the data they needed from the BHL text corpus. To this end, William Ulate, BHL Technical Director and John Mignault, a member of the BHL Technical Advisory Group attended the meeting.

The teams covered a wide variety of interesting topics from studying the color of butterflies based on extracting color information from images to studying changes in ontologies over time based on an analysis of the text in the BHL corpus (see http://bit.ly/1dnnhG0). Over the course of the sprint, the teams began data mining EOL and BHL for their data sets and started preliminary analyses of their data. Each day, groups met at the end of the day to share experiences and progress. By the end of the sprint, each of the teams were sharing plans for further collaboration and completing their analyses. Plans for publication and grants proposals based on sprint ideas were also discussed. In an open, collaborative spirit, members shared the materials freely via Google Drive.

We learned some interesting things about the way people approach the BHL data set. Many of the teams on the first day wanted to use the BHL application programming interface for bulk data retrieval. Several team members asked us how they could download "all of the text." When we told them that this was impractical and would result in a great deal of unwanted data, they asked how they could retrieve data based on, for example taxa - I want to harvest all pages with names from this taxon (Chordata) or this common name (Vertebrate). Others wanted data restricted by location. We tried to assist them given their specific needs rather than their initial request for the whole data set (see http://bit.ly/1rvbut3). This raised useful questions as to how we can provide the data to researchers need in the ways they need it - should we offer ways to request bulk data downloads based on a specific set of criteria? Should we alter the API (http://www.biodiversitylibrary.org/api2/docs/docs.html) in order to make it possible to retrieve more closely focused data sets? As BHL becomes better known as a source of "Big Data" for the biodiversity community, we will need to evolve our access to that data in order to better meet the needs of our users.

We were also surprised to discover the popularity of the R statistical programming language among scientists. Many team members used R in their work, to such an extent that a short R group discussion was scheduled for one morning during the meeting. Scott Chamberlain of Simon Fraser University has created an R interface to the BHL API, available at http://bit.ly/1oAFKjI. It is always good to see BHL and its data used in new and interesting ways. Follow up further results from this Sprint at: http://blog.eol.org.

The Sprint was a valuable meeting for BHL: it exposed our valuable data to more scientists and informaticians, and it gave BHL staff useful feedback on the uses of the BHL data corpus and its value to researchers. We would like to thank EOL, NEScent and the Richard Lounsbery Foundation for the opportunity and their collaboration in making this event a success.
     

Thursday, March 20, 2014

First Meeting of the Mining Biodiversity project

Meet our international partners to extract data from BHL 


Mining Biodiversity (MiBio project) is one of the projects that won during the third round of the transatlantic Digging Into Data Challenge, a competition aiming to promote the development of innovative computational techniques that can be applied to big data in the humanities and social sciences. The project is an international collaboration between the National Centre for Text Mining (UK), Missouri Botanical Garden (US) and Dalhousie University’s Big Data Analytics Institute (Canada) and Social Media Lab (Canada), along with colleagues from the Encyclopedia of Life and the Smithsonian Institution.

We will integrate novel text mining methods, visualization, crowdsourcing and social media into the BHL to provide a semantic search system that allows users to explore search results according to multiple information dimensions or facets.  The goal is to transform BHL into a next-generation social digital library resource that facilitates the study and discussion (via social media integration) of legacy science documents on biodiversity by a worldwide community.
Relations between the Work Packages of the project

The project has five major components, covered in 9 Work Packages (WP):

  1. Automatic correction of errors in OCR using Google n-grams by our colleagues of the Big Data Analytics Institute (WP2).
  2. Crowdsourcing the annotation of semantic metadata (concepts and events) in legacy texts (WP5).
  3. Extract metadata (terms, concepts and significant events) automatically and track their change over time (WP3 & WP4) to facilitate semantic search (WP6) implemented with NaCTeM.
  4. Use interactive visualization techniques to manage the search results, in collaboration with Dalhousie (WP7)
  5. Design a social media layer as an environment for interaction and collaboration on science, education, awareness and outreach, lead by our colleagues of the SocialMediaLab (WP8).

Manchester Town Hall
at Albert Square, UK
On February 17th, the first face to face meeting in Manchester, UK marked the start of this new project.  The Principal Investigators of the project, Dr. Anatoliy Gruzd from Dalhousie University (Canada) and William Ulate from Missouri Botanical Garden (USA), met with Dr. Sophia Ananiadou at the University of Manchester's National Centre for Text Mining (NaCTeM), where her colleagues involved in the project showcased the tools and services they have developed and will be adapting for our project.

The National Centre for Text Mining (NACTEM)

NaCTeM has developed text mining services based on a number of generic natural language processing tools like Argo, their Web-based workflow construction platform for text mining, implemented on top of the OASIS Unstructured Information Management Architecture (UIMA) standard for interoperability among information processing components.

Example of an Argo workflow that automatically extracts
species and anatomical features using entity tagging
components that NaCTeM has developed for this purpose.
Several of the NaCTeM tools have been developed as modules that can be adapted and used as components in workflows, receiving input from the previous module, processing or performing a task and passing the results to another module.  In the case of named-entity recognizers, these receive text pre-processed into smaller units (sentences, tokens) and extract features automatically according to statistical models used by different entity taggers (specialized in gene, chemical, anatomical, habitat or species information, for example).

Another type of components of the workflows, in addition to named entity recognizers, are the linkers, which facilitate the automatic linking of names or concepts found in text to entries in external vocabularies via unique identifiers and using a string similarity method.

Argo's functionality allows workflows to be deployed as a Web service so they can be invoked by external applications, just like BHL currently invokes Name-finding web services to find the taxa within the text.

At NaCTeM commenting on the tools for the project.
L to R:  Mr. John McNaught, Dr. Anatoliy Gruzd,
Dr. Sophia Ananiadou, Ms. Riza Batista-Navarro,
Mr. Georgios  Kontonatsios and Mr. Paul Thompson.
Missing from the photo: Dr. Rafal Rak,
Claudiu Mihăilă, and Dr. Ioannis Korkontzelos.
In order to develop these named entity recognizers and linkers for the biodiversity domain, it is necessary first, to identify which entity types are of interest (in our case, it could be names of persons, places, species, among others) and the vocabularies to link to for each type.  To assist on this process, NaCTeM has also developed term extraction tools like TerMine.  TerMine detects terms and acronyms in input text and can be used in building a term inventory for biodiversity.  This is what the initial task for our colleagues at Missouri Botanical Garden and Smithsonian will be about: finding those authoritative sources (vocabularies, ontologies, thesauri, gazetteers, etc.) of terms to help build the term inventory and then train the entity taggers to be used in our workflows.

NaCTeM has also done substantial work on event extraction, i.e., the extraction of  associations or interactions between concepts or entities.  This experience will help us identify and extract the type of events that scientists, historians and other scholars have long wanted to extract from the BHL corpus (like behavior, habitat, trophic relations, geographic range and others ) for our own named-entities: species, people, places throughout time.  Finally, NaCTeM's vast experience developing customized semantic search engines like KLEIO, ISHER and Europe PubMed Central EvidenceFinder will facilitate providing an enhanced semantic search functionality over the BHL corpus text, to allow users to explore results according to multiple information dimensions or facets.

Additional information on the tools and services can be found at:
For some interesting explanation of what Unstructured Information is and the terminology of the process around it, look at this nice introduction of the UIMA 1.0 Standard.

Or read more details about these or some of the other service systems and tools that NaCTeM has developed.

The Social Media Lab

On Monday February 17th, 2014, as part of the third Social Media Workshop, which covered the outreach and impact aspects of the International Centre for Social Media Research at Manchester University, our group was invited to attend a talk by our colleague in the project, Dr. Anatoliy Gruzd, where he presented the research done at the Dalhousie University Social Media Lab, how to make sense of the huge quantity of data and the new methods to collect information when studying online social networks through analysis and visualization.

Social Media Lab at
Dalhousie University, Canada
© All Rights Reserved  
For our project, the staff at Dalhousie has started investigating what users and communities, as well as the context in which they are currently accessing, commenting and sharing the records from BHL across various social media platforms, such as Twitter and Flickr.  For this work they will be employing some of their own tools developed by the Social Media Lab (such as Netlytic.org) as well as other tools developed by third parties.  Their goal is to add a social layer to integrate content from different biodiversity fora and social media sites with BHL via a user-friendly interface, to foster a community of users that could exploit BHL as an environment for sharing digital objects.

For more information, take a look at:
And some of Dr. Gruzd's and the Social Media Lab staff other publications.

I hope this gives an idea of the work ahead and a better sense of what the project attempts to do and how it aims to do it.  We will keep you informed as results become available, but in the meantime, let us know how do you envision yourself using BHL as a social digital library?  What information you'd like to track and how you'd like to access it?  Tell us the vocabularies you'd need to see included and what types of named entities and associations you'd want to be tagged in the BHL corpus?

William Ulate
BHL Technical Director
Missouri Botanical Garden

This project is made possible in part by a grant from the Institute for Museum and Library Services [Grant number LG-00-14-0032-14].

Tuesday, March 18, 2014

2014 Annual BHL meeting held in New York City, March 10-11, 2014

BHL member and affiliates met in New York City for the 2014 Annual Meeting (10-11 March 2014). The annual meeting is a chance for the leaders of BHL members and affiliates to learn what is happening around BHL and to give updates from their own institutions.

This year, the meeting was held jointly by the New York Botanical Garden and the American Museum of Natural History. The first day of meetings was hosted by Susan Fraser, Director of the LuEsther T. Mertz Library of the New York Botanical Garden. The morning session of the meeting included the 2014 BHL Program Director's Report by Martin R. Kalfatovic; an update on user engagement from Carolyn Sheffield (BHL Program Manager); an overview of BHL technical activities from William Ulate (BHL Technical Director); and a report on the recent Global BHL meetings and membership committee report by Vice-Chair Connie Rinaldo. Bob Corrigan, Director of Operations for the Encyclopedia of Life (EOL), also joined the meeting to give an update on EOL activities. Gregory Long, President and CEO of the New York Botanical Garden, welcomed the BHL members.

Attending the meeting were representatives from fifteen of the sixteen BHL members, including the three most recent members, Washington University of St. Louis, The National Library Board, Singapore (BHL Singapore), and the University of Illinois, Urbana-Champaign. Our newest affiliate, the Natural History Museum, Los Angeles County, also attended.

The business portion of the meeting took place the following day at the American Museum of Natural History, hosted by Tom Baione, the Harold Boeschenstein Director of the AMNH Research Library. Tom also gave the group a tour of Natural Histories: Exploring Rare Books and Scientific Illustration exhibition, based on his book of the same title.
Pictured above are the meeting attendees:
Front Row, left to right: Susan Fraser (NYBG), Chris Mills (Kew), Christine Giannoni (Field Museum), Tomoko Steen (Library of Congress), Eric Chin (BHL Singapore), Nancy Gwinn (Smithsonian Libraries), Cathy Buckwalter (ANSP), Judy Warnement (Harvard Botany Libraries).
Second Row, left to right: Tom Baione (AMNH), Marty Schlabach (Cornell), Connie Rinaldo (Harvard/Museum of Comparative Zoology), Carolyn Sheffield (BHL Program Manager), Diane Reilinger (MBL/WHOI), Richad Hulser (NHMLAC).
Third Row, left to right: Kelli Trei (UIUC), Doug Holland (Missouri Botanical Garden), Chris Freeland (Washington University).
Back Row, left to right: William Ulate (BHL Technical Director), Martin R. Kalfatovic (BHL Program Director). 
NOT PICTURED: Jane Smith (Natural History Museum, London). Photograph taken at the Enid A. Haupt Conservatory, New York Botanical Garden.

Thursday, March 6, 2014

5th Global BHL Meeting, Lorne, Australia


Representatives from BHL-Global nodes at the
5th Global BHL Meeting 
The 5th Global Biodiversity Heritage Library Meeting was held in Lorne, Australia, February 1-2, 2014.   Representatives from each of BHL’s global nodes, with the exception of BHL Egypt, convened to discuss the status of current goals, the formation of new goals, and to work together in forming the overall direction of BHL Global.  The meeting consisted of reports from the global nodes, the election of officers, and discussion of bylaws, technical issues and goals.

The first day of the meeting consisted of presentations delivered by representatives from BHL Central and the Global Nodes.

BHL Central
Kicking off the presentations, Martin Kalfatovic, BHL Program Director, reported on BHL Central’s continued growth.  BHL Central is now comprised of 15 dues-paying member institutions, with a collection of over 42 million pages, and usage statistics that include over 3 million visitors since BHL launched in 2007.  In other news, the latest version of the Macaw software developed at the Smithsonian Libraries is now being tested at Harvard, New York Botanical Garden and the California Academy of Sciences with the University of Pretoria to also begin testing soon.  With this release, users can now upload to a cloud server, after which the files go to the Internet Archive and then the BHL portal.

BHL Africa
Anne-Lise Fourie, Principal Librarian at South African National Biodiversity Institute (SANBI), shared the good news that two more institutions in Kenya have joined BHL.  In South Africa, institutions are sending digitized content to the University of Pretoria for quality assurance.  To date, the Steering Committee has met twice with the possibility of more frequent meetings of the regional representatives to help build and maintain momentum.

Grants and Social Media 
Connie Rinaldo, Vice Chair of the BHL Executive Committee and Librarian of the Ernst Mayr Library, Harvard Museum of Comparative Zoology, reported on BHL’s grant-funded projects and on the status of BHL Central’s social media efforts.  BHL currently has four active grants, two of which are about to wrap up and two that just recently kicked off.  Connecting Content, an IMLS grant led by the California Academy of Sciences Library, is linking field notes, specimens, and published literature.  Connie demonstrated MCZ-Harvard’s contributions with the William Brewster collection.  Concurrently, the Art of Life is exploring automated ways of locating illustrations in natural history literature and providing metadata for them.  Led by the Missouri Botanical Garden, this NEH grant will broaden and engage the BHL audience by integrating tagging applications so users can edit descriptive metadata, and integrating that user-generated metadata to enhance access to illustrations.  The two new grants—Purposeful Gaming and the BHL and Digging Into Data—are both funded by IMLS and led by the Missouri Botanical Garden.  Purposeful Gaming and the BHL will develop a game to crowdsource OCR corrections for seed catalogs and transcriptions of field notes.  Digging Into Data will explore new methods for the explore integration of text mining, visualization, crowdsourcing and social media for enhancing use of BHL content.

Social media has been a strong component of the BHL outreach strategy and in 2013 over 36,000 visits to the BHL website came from social media platforms (out of a total of 1.4 million visits).  With recent staff departures, BHL's social media presence is shifting to maintenance mode and we've seen a corresponding decrease in traffic.  A discussion ensued about how best to tailor outreach efforts for maximum impact with existing resources, including recent efforts in the education domain such as BHL Europe’s Historian app for teachers and BHL Africa’s push to teach younger students about the environment.

Encyclopedia of Life 
Nancy Gwinn, Chair of the BHL Executive Committee and Director, Smithsonian Libraries, presented on the recent EOL meetings in Canberra.  The meetings included demonstrations for new tool suites including the recently released Traitbank, which provides the capability to assemble similar traits from across species for comparisons.

Biodiversity Library Exhibitions 
Jiři Frank, Vice-Chair of BHL-Global, reported on the current status of the BHL-E exhibition software and the group discussed the idea of having a "Treasures of the Global BHL" online exhibition.  We’re very pleased that Connie Rinaldo and Jiři have both already graciously volunteered their time for coordination and training, respectively.

Following the presentations and going into the second day, attendees moved to setting the direction for Global BHL for the coming year.  Each of the Global BHL Officers--Ely Wallis, Jiři Frank and Nancy Gwinn--were re-elected to two year terms in the offices of Chair, Vice-Chair and Secretary, respectively.  One of the first tasks that the re-elected executive committee will be taking on is the review of the bylaws.

Action items for the global nodes were also identified and will help guide collection and technical development, outreach efforts, and overall growth for the BHL global nodes in 2014.  BHL Central will work with the global nodes on creating new collections of content for inclusion in the BHL Portal and on continued development of Macaw.  Based on their extensive experience and thanks to their existing resources, BHL Europe will develop a marketing plan for others to use as a model.  BHL Australia will coordinate the collection of input from API users to help inform new features and improvements.  Finally, all nodes have agreed to work together on recruiting new nodes to ensure representation of all continents.

All told, it was a very successful meeting with inspiring updates from all and some exciting new directions for BHL-Global.  We're looking forward to working with our colleagues across the globe on completing the tasks we have set out to accomplish and working towards an ever-growing and adaptive BHL!



Tuesday, February 25, 2014

Helping Out with Diverse Interests in Biodiversity: Taxonomy of Molluscs and Birds

Prof. Hamish Spencer (right) and his long-time 
collaborator, Prof. Jon Waters (left) examining the 
holdfast of a brown alga, Durvillaea poha, a species 
they and a student of theirs described after showing
 it was genetically distinct from the widespread D. antarctica.  
The holdfasts pictured are the habitat for a number of 
interesting invertebrates (e.g., molluscs, crustaceans). 
One of these, the gastropod mollusc Diloma durvillaea
was also described by them.

New Zealand is an exciting place to study biodiversity for a number of reasons. First, its unique set of plants and animals, evolving in the context of an active geologic history, results in several model systems that are ideal for testing ideas about how evolution works. Second, the country still has areas of its natural environment that are relatively undisturbed, something of which the wider public is very proud and which means that many people are interested in and aware of many native species. And, third, New Zealand is still in an “age of discovery” with undescribed species turning up in numerous studies across almost all habitats.

Hamish Spencer has had the good fortune not only living in such an amazing location but has also pursued a very rewarding career in preserving and improving understanding of the rich biodiversity found here. He currently serves as Director of the Allan Wilson Centre, a cross-institutional group of evolutionary biologists working on various aspects of New Zealand’s biodiversity.  He is also a Professor in the Department of Zoology at the University of Otago, in the city of Dunedin, known as the Edinburgh of the South.  

As part of our regular BHL and Our Users series, Professor Spencer has graciously agreed to answer some questions about how BHL has impacted that work.

When did you first discover BHL?

A while ago! Not sure.  Maybe 6 years?

What is your opinion of BHL and how has it impacted your research? 

It is fantastic! What is more important, even: it is getting better.  I am amazed at the breadth of its material, especially from the 19th Century. I work on the phylogenetics of a variety of groups (so far, molluscs, birds, trematodes, brown algae, polychaetes, crustaceans), usually as model systems to answer a question about the way evolution works. For example, I have been interested in the importance of long-distance dispersal in marine environments, especially the Pacific and Southern Oceans. With my collaborators, I have used brown algae, molluscs and crustaceans to investigate various questions about dispersal and sometimes venture into the taxonomy of these groups when the phylogenetic work reveals new species or clades. In order to do that properly, I like to consult original descriptions and, although Otago is New Zealand’s oldest university, with a good library dating back well over 130 years, sometimes that literature is simply unavailable.  That is where BHL comes in.

How often do you use BHL?

It is very sporadic.  Sometimes not for weeks at a time and then intensively for a week or so.

How do you usually use BHL? 

Usually I want a whole article from an ancient journal, so I download that.  I find that if I just read online, I inevitably need to check some detail and so I have to go back and look at I again. Sometimes, if I want some part of a book, I download just the relevant pages.

What are your favorite features/services on BHL?

I am amazed and impressed by the breadth of material available.  It is a real tribute to the many people who have gone to the effort of producing high-quality scans of so much material. It is seldom, now, that I am wanting to read something from the 19th century that is not there.  As a consequence BHL is becoming my first port-of-call for such items.  It is a bookmark I use frequently.

If you could change one thing about BHL, what would it be, or what developmental aspect would you like the BHL team to focus on next?

The one thing that does not work very well is the downloading of parts of an item.  Selecting a large number of pages and then finding a number of them are blank can waste quite a bit of time.  (I think you know this is an issue, already, though!)

If you had to choose one title/item in BHL that has most impacted your research, or one item that you prefer above any other in BHL, what would it be and why?

That’s hard, since I work on a range of groups, but I suppose the early issues of Proceedings of the Malacological Society of London would be up there (even though it is 20th century!).  More recently the (19th-century) Transactions of the Royal Society of South Australia have been very useful.

We send our deepest thanks to Hamish Spencer for his participation in this series.  We’re always excited to learn more about how people are using BHL and the impact it has had on their work.  Gathering feedback on what our users would like to see changed or improved also helps us guide future development so that we can continue to improve and transform BHL to meet the needs of our users.  Have a story of how BHL has impacted your work?  We would love to hear from you! Send us an email to feedback@biodiversitylibrary.org.