Report from the Digital Data in Biodiversity Research Conference, University of Michigan

The Digital Data in Biodiversity Research Conference was sponsored by iDigBio, the University of Michigan Herbarium, the University of Michigan Museum of Paleontology, and the University of Michigan Museum of Zoology. The conference was attended by about 185 people from a variety of institutions. I attended to participate in the GBIF North American Nodes Workshop and was joined by Alicia Esquivel (BHL NDSR Resident based at the Chicago Botanic Garden).

After a welcome from Dean Andrew D. Martin of the College of Literature, Science, and the Arts, the opening series of plenary talks began with Stephen Smith (University of Michigan) speaking on “The Utility of Large-scale Phylogenetic Analyses for Understanding the Evolution of Biodiversity.” The detailed talk covered the promise of a comprehensive view of the tree of life, whether for a particular clade or the entire tree of life which has been a major motivation of the systematics community for decades. Smith described new efforts and new ways for combining the resources from the Open Tree of Life with other phylogenetic analyses to construct a dated and comprehensive tree and discussed construction of a comprehensive tree for seed plants containing 80,037 taxa from GenBank and 356,807 total taxa.

Maureen Kearney, Associate Director for Science, National Museum of Natural History (Smithsonian Institution) spoke next. Kearney’s talk, “Expanding the Power of Natural History Knowledge: Frontiers in Research and Collections at the Smithsonian’s National Museum of Natural History,” was an inspiring overview of the role of natural history museums in this era of rapid global change to mobilize collections data and natural history knowledge for science and society.

Kearney spoke of how natural history scientists help us comprehend the fundamental nature of the planet, of organisms (including humans), and of evolutionary and ecological interactions throughout the history of life on Earth. Enormous potential exists for natural history museums in the 21st century if they highlight their unique niche as irreplaceable research and data centers for the study of global change. Kearney also noted that this can only be realized if museums build large-scale pipelines and open-source, dynamic platforms to digitize, structure, link, and share our natural history data and knowledge. She spoke of key partners inside the National Museum of Natural History (such as the Global Genomics Initiative and the Encyclopedia of Life) and other partners at the Smithsonian, including the Biodiversity Heritage Library and the Smithsonian’s Digitization Program Office and its 3D imaging team.

Donald Hobern, GBIF Executive Secretary, spoke on “Preserving Evidence of Biodiversity Patterns: GBIF and Persistent Biodiversity Data Management.” Hobern gave an overview of GBIF as well as the goals of the GBIF implementation plan, which include simplifying and supporting data publishing and assisting with delivery of the most detailed version possible for each data source.

Other plenary talks included:

  • “Linking Heterogeneous Data in Biodiversity Research” by Pam Soltis, Florida Museum of Natural History, University of Florida
  • “Using ‘Digital Specimens’ to explore the behavioral phenotype” by Mike Webster, Cornell Laboratory of Ornithology

DAY 1: CONCURRENT SESSIONS
Full abstracts and details of the Concurrent Sessions are online. Sessions attended were:

  • 3D Surface Models in Paleontology and Archaeology by Dan Fisher, University of Michigan Museum of Paleontology. Used the Buesching mastodon (now at University of Michigan) as a case study for the talk. Walked through the excavation and the 3D imaging process. Digital data on the external form of specimens are central to many paleontological and archaeological analyses. Digital models minimize handling of fragile and/or heavy specimens, facilitate access and collaboration, allow complex measurements, enhance visualization of surface topography, and simplify inspection of multi-object assemblies.
  • Paleobiology Database: A Community Based Data Service for Research, Education, and Museums by Mark Uhen, George Mason University
  • MorphoSource: A Virtual Museum and Digital Repository for 3D Specimen Data by Doug Boyer, Duke University
  • ePANDDA: enhancing Paleontological and Neontological Data Discovery API by Susan Butts, Yale University; Seth Kaufman, Whirligig Inc.
  • The Importance and Challenges of Database Integration: MorphoBank, MorphoSource, and the Paleobiology Database by Julie Winchester, Duke University

 

AFTERNOON WORKSHOP
In the afternoon, the meeting offered focused workshops. The Biodiversity Heritage Library was invited to participate in the Digital Data and the North American Nodes of the Global Biodiversity Information Facility session led by Bob Hanner, Stinger Guala, and James Macklin. A goal of the workshop was to discuss the current status of the GBIF North American Nodes, current activities and collaborations.

Attended by about 50 participants, the presentations at the workshop included:

  • National and Regional Coordination Roles within GBIF (Donald Hobern)
  • Biodiversity Information Serving Our Nation (BISON): Connections and Cooperation (Stinger Guala)
  • The Canadian Biodiversity Information Facility (CBIF) (James Macklin)
  • Canadensys: revealing the biodiversity of Canada (Anne Bruneau)
  • Overview of the Biodiversity Heritage Library Recent Activities (Martin Kalfatovic)
  • The Catalogue of Life: Infrastructure for Science (Tom Orrell)
  • Global Genome Biodiversity Network – Infrastructure for Genomic Research (Jon Coddington)
  • iDigBio, National Coordinating Center for NSF’s ADBC Program (Larry Page)
View Full Size Image

The BHL talk, “Overview of the Biodiversity Heritage Library: Recent Activities” covered:

As the world’s largest open access digital library for biodiversity literature, the Biodiversity Heritage Library (BHL) is an unparalleled resource that has forever changed the way researchers around the globe understand, describe, and conserve life on Earth. BHL has become not only a model for digital libraries but also a fundamental resource for taxonomic literature aggregation, discovery, and presentation by engaging the taxonomic community and responding to user needs. To achieve this, BHL relies on many standards and tools, such as Digital Object Identifiers (DOIs), Application Programming Interfaces (APIs), and Global Names Architecture (GNA). These standards and tools help ensure that data in and about the literature matches the needs and expectations of the scientific community and is readily available for widespread reuse. To meet the evolving needs and expectations of researchers, we must continually innovate and adapt to the changing technological landscape.  BHL is in the process of organizing widespread user needs analysis and an environmental scan of information resources to define requirements for a next generation digital library.

The presentations were followed by a panel discussion and a conversation with the audience.

EVENING RECEPTION AND TOUR OF THE RESEARCH MUSEUMS CENTER
After the day’s meetings, attendees were bused to the Research Museums Center, located about 7.5 kilometers from the center of the University of Michigan campus. Participants were given the opportunity of guided tours of the University of Michigan Herbarium, collections areas of the Museum of Paleontology, and the wet and dry collections of the Museum of Zoology.

View Full Size Image View Full Size Image
View Full Size Image View Full Size Image
View Full Size Image View Full Size Image
View Full Size Image View Full Size Image

 

DAY 2: PLENARY TALKS
The second day again opened with plenary talks which included:

  • Big Data, Museum Specimens, Access and Archiving – Lessons from #scanAllFish by Adam Summers, University of Washington. Amazing high energy talk about Summers’ project #scanAllFish, over 1,991 species, 3,094 specimens from 109 collections. Expects to store over half a petabyte of data for 30,000 vertebrates. Storing and backing these data up is an issue. It is also interesting to consider what collections plan to do when these data are returned to them with the specimens.
  • Video Data and Motion Analysis in Comparative Biomechanics Research by Beth Brainerd, Brown University. “Film or video recordings have long been important primary data for research in comparative biomechanics. Innovations have included the use of two or more cameras to capture 3D motion, and the use of two X-ray video cameras (fluoroscopes) to capture 3D motion of bones in vivo. Over the past decade we have developed X-ray Reconstruction of Moving Morphology (XROMM), which combines  dual-fluoroscopy with bone models from CT scans to produce accurate animations of 3D bones moving in 3D space.”
  • The PREDICTS Project: Projecting Responses of Ecological Diversity In Changing Terrestrial Systems by Adriana De Palma, Natural History Museum, London. “PREDICTS is a collaborative project that aims to produce global models of how local biodiversity responds to land use and related human impacts, in order to make projections under possible future scenarios.”
  • Field Collections to Digital Data: A Workflow for Fossils and the Use of Digital Data for Reconstructing Ancient Forests by Dori Contreras, University of California Museum of Paleontology. “The integration of curation and digitization with project-focused data collection is a key component to performing time-efficient studies from new fossil collections. Standard workflows for processing fossil specimens starting from initial field collection and continuing through digital analysis/measurement are not widely established. Here I present my workflow for reconstruction of a diverse Late Cretaceous flora from plant macrofossils preserved in an extensive recrystallized volcanic ashfall deposit.”
  • Natural History Data Pipelines: The Good, the Bad, and the Ugly by Andy Bentley, University of Kansas Biodiversity Institute. “Collections, aggregators, data re-packagers, publishers, researchers, and external user groups form a complex web of data connections and pipelines that form the natural history knowledge base essential for collections use by an ever increasing and diverse external user community.  We have made great strides in developing the individual parts of this knowledge base and we are now well poised to integrate these capabilities to address big picture questions.  Although we need to continue work on the individual pieces, the focus now needs to be on integration of these disparate sources of data that create the pipeline.”

DAY 2: CONCURRENT SESSIONS
Sessions attended included:

  • Using Statistical Analysis to Calculate the Size of Biodiversity Literature by Alicia Esquivel, Chicago Botanic Garden
  • Illustrating Value Added in Databasing Historical Collections: Entered, Proofed, and Done (or Not!) by Tony Reznicek, University of Michigan Herbarium
  • The Encyclopedia of Life v3: constructing a linked data model by Jennifer Hammock, National Museum of Natural History, Encyclopedia of Life, Smithsonian Institution
  • Encyclopedia of Life Version 3: New Tools for the Exploration of Biodiversity Knowledge by Katja Schulz, National Museum of Natural History, Encyclopedia of Life, Smithsonian Institution
  • How do People see Biodiversity? Using a Digital Identification Key in a Citizen Science Program by Mathilde Delaunay, Muséum national d’Histoire naturelle, Paris, France
  • Taxonomic Data Quality in GBIF: A Case Study of Aquatic Macroinvertebrate Groups by Joan Damerow, Field Museum of Natural History
  • Hole-y Plant Databases! Understanding and Preventing Biases in Botanical Big Data by Katelin D. Pearson, Florida State University
View Full Size Image

Alicia Esquivel’s talk was an excellent overview of the BHL NDSR Resident program and the important work being done by the group as BHL looks forward to BHL Version 2. Esquivel’s work focuses on looking for gaps in the BHL collections and other collections analysis. In addition to her talk, she also presented a poster (co-authored by Constance Rinaldo, BHL Chair / Museum of Comparative Zoology, Harvard University).

CAPSTONE SESSION

  • Prospects for the Use of Digitized Specimens in Studies of Plant Diversity and Evolution by Michael Donoghue, Yale University
  • A Vision for a National Cyberinfrastructure for Biodiversity Research and what NSF can do Enable it by Peter McCartney, National Science Foundation

OTHER RESOURCES

Avatar for Martin R. Kalfatovic

Martin R. Kalfatovic is BHL Program Director and Associate Director, Digital Programs and Initiatives Division, Smithsonian Libraries and Archives. As the BHL Program Director, Mr. Kalfatovic functions as the executive director and manager of the international consortium of over 80 natural history, botanical garden, government, and university libraries engaged in the mass digitization of taxonomic literature. The position also serves as a key contact for government, NGO, and academic leaders at both the national and international level. Within his role at Smithsonian Libraries and Archives, Mr. Kalfatovic is responsible for the Libraries’ active Digital Library program. This program includes the creation of digital editions of library materials, online exhibitions, and new digital publications.