A summary from the 1st Global BHL Technical Meeting
by William Ulate, Global BHL Project Coordinator
September 22 to 24, in Woods Hole, Massachusetts, took place the Global BHL Technical Meeting, it was the very first time all signed and prospective BHL partners were going to be together at such meeting. There were representatives from all over the world, including
Through out this exchange participants achieved a high-level description of software and hardware components and were able to agree on milestones and deliverables for a global timeline, while at the same time, sketch the definition of global governance & policies for collaboration in the project.
On Wed. Sep. 22nd morning, after all participants had arrived and enjoyed a delicious breakfast (everyone recognized the Food Catering throughout the whole meeting was outstanding), a warm welcome from our hosts by Cathy Norton, MBL Director, followed by our BHL Director, Tom Garnett and our BHL Executive Committee Chair, Graham Higley, followed by a brief introduction from each participant, provided the perfect setting for a picturesque multimedia display of a brief Taking Measure of the Biodiversity Heritage Library: 2003- 2010 by Martin Kalfatovic, our BHL Deputy Director, and Chris Freeland, Global BHL Technical Director, talking about the BHL-US role in the Global BHL. The first section of the meeting was rounded up by Phil Cryer and Anthony Goddard presenting their lessons learned while setting up the Clustered and distributed Storage with commodity hardware and open source software to mirror BHL information.
After a comfort break each regional node was given the opportunity to share, before the rest of the group, the details of their specific projects, why and how it connects to the whole BHL and other projects, the work already done, the digitized content available or planned including dates of major milestones & deliverables, the resources available, their funding status, and their regional requirements, among other things.
The first partner to present was BHL Europe. Henning Scholz, Project Coordinator for BHL-E, gave an overview of their principles, objectives and partners, their work plan with dates for deliverables and how BHL Europe can integrate into different networks like this Global BHL initiative. Then, Melita Birthälmer, also from Museum für Naturkunde in Berlin, presented the project’s activities related to Content Management, starting with the available and planned numbers of volumes from different providers and the quality of that content and explaining in greater detail about the Global Reference Index to Biodiversity, GRIB, a bibliographic database with content management and deduplication functionalities being developed in collaboration with the EDIT project. Even when the GRIB is still in a prototype phase (see http://grib.gbv.de/), it has been suggested as an option for a worldwide bibliographic database for a Global Biodiversity Heritage Library. Finally, Adrian Smales, from the Natural History Museum, talked about the technical implementation, dealing with topics like different metadata views and formats used by BHL-E content providers, their current infrastructure status, some considerations with an Open Archive Information System (OAIS) and a Preservation Archive System (PAS), the GRIB and a general Work Plan for the Technical implementation deliverables.
The next partner to present was
Abel Packer, presented afterwards about the BHL
Finally, our colleague from
In the afternoon, after a comfort break, Bianca Crowley, BHL Collections Manager, presented the analysis of the BHL User Survey 2010. A total of 16 reusable questions were developed and for this first time, an average of 1020 successful responses per question were analysed, to understand how current user groups are using BHL services and what new development are groups expecting in the future.
Chris Freeland, BHL Technical Director, lead us in two interesting discussions about the Names finding process in BHL and what can be done to improve it, given an existing 35% error rate on the species names when the OCR is performed. Here, it was noted how two subprocesses are been carried out: the string finding and the name reconciliation. While some of the existing services take on both processes (UBIO, for example), it was concluded we have no mechanism in place to validate the 5.1 million names, so we should concentrate on working on OCR correction and let the specialists handle the name reconciliation. We will make random sample data available for potential partners, so nomenclators, for example, could provide feedback on ratio of good names.
Finally to round up the first day, the group reviewed the implications of the Global Open Access and BHL standpoint on it. It’s no secret for anyone that the global world of copyright is very complex. The group commented on the several copyright issues and distribution limitations will encounter in sharing materials globally. BHL is not assuming any copyright responsibility on its own, moreover BHL doesn't own any copyright. A small group of colleagues was defined to take all input about this topic and develop a suitable statement; taking into account that the user might get confused and frustrated if we end up with different categories of access and the system would have to be rebuilt to support it.
The second day the group was divided into an Administration subgroup, in charge of Policies & procedures needed for a global collaboration and another Technology subgroup to work on components needed for a Global BHL. The Administrative group was to deal with topics like Organization of each BHL node, Global BHL Collaboration and Governance and Communication Models for project leaders of each BHL node. On the other hand, the Technology group was set to discuss topics like the Content Ingest process in existing BHL, the Content Replication, making particular reference to preservation (LOCKSS) and mirror sites, the Localization, taking in consideration if he had to deal and share materials that couldn’t be openly distributed; and finally, the topic of Global Identifiers within the whole project.
Other more technical topics were covered during the rest of the meeting, from "Branding & Identity of the project itself" to funding opportunities, to data mining, OCR & Text correction experiences, and improvement of existing and new services required for integration at the APIs & User Interfaces levels. Even some birds-of-a-feather sessions on Content and Data Synch were included. Finally, a set of Action Items was sketched to follow up (see it here).