A summary from the 1st Global BHL Technical Meeting by William Ulate, Global BHL Project Coordinator.
September 22 to 24, in Woods Hole, Massachusetts, took place the Global BHL Technical Meeting, it was the very first time all signed and prospective BHL partners were going to be together at such meeting. There were representatives from all over the world, including Australia, Brazil, Egypt, Europe and the US; unfortunately our colleagues from China were unable to make it. We had a very productive meeting to know each other and present each other’s work in order to describe priorities and requirements for a Global BHL.
Through out this exchange participants achieved a high-level description of software and hardware components and were able to agree on milestones and deliverables for a global timeline, while at the same time, sketch the definition of global governance & policies for collaboration in the project.
On Wed. Sep. 22nd morning, after all participants had arrived and enjoyed a delicious breakfast (everyone recognized the Food Catering throughout the whole meeting was outstanding), a warm welcome from our hosts by Cathy Norton, MBL Director, followed by our BHL Director, Tom Garnett and our BHL Executive Committee Chair, Graham Higley, followed by a brief introduction from each participant, provided the perfect setting for a picturesque multimedia display of a brief Taking Measure of the Biodiversity Heritage Library: 2003- 2010 by Martin Kalfatovic, our BHL Deputy Director, and Chris Freeland, Global BHL Technical Director, talking about the BHL-US role in the Global BHL. The first section of the meeting was rounded up by Phil Cryer and Anthony Goddard presenting their lessons learned while setting up the Clustered and distributed Storage with commodity hardware and open source software to mirror BHL information.
After a comfort break each regional node was given the opportunity to share, before the rest of the group, the details of their specific projects, why and how it connects to the whole BHL and other projects, the work already done, the digitized content available or planned including dates of major milestones & deliverables, the resources available, their funding status, and their regional requirements, among other things.
The first partner to present was BHL Europe. Henning Scholz, Project Coordinator for BHL-E, gave an overview of their principles, objectives and partners, their work plan with dates for deliverables and how BHL Europe can integrate into different networks like this Global BHL initiative. Then, Melita Birthälmer, also from Museum für Naturkunde in Berlin, presented the project’s activities related to Content Management, starting with the available and planned numbers of volumes from different providers and the quality of that content and explaining in greater detail about the Global Reference Index to Biodiversity, GRIB, a bibliographic database with content management and deduplication functionalities being developed in collaboration with the EDIT project. Even when the GRIB is still in a prototype phase (see http://grib.gbv.de/), it has been suggested as an option for a worldwide bibliographic database for a Global Biodiversity Heritage Library. Finally, Adrian Smales, from the Natural History Museum, talked about the technical implementation, dealing with topics like different metadata views and formats used by BHL-E content providers, their current infrastructure status, some considerations with an Open Archive Information System (OAIS) and a Preservation Archive System (PAS), the GRIB and a general Work Plan for the Technical implementation deliverables.
The next partner to present was Australia. Elycia Wallis from Museum Victoria showed the comprehensive work that the Atlas of Living Australia (ALA) has been carrying out and the context where the BHL-Australia (BHL-Au) project is being developed as one of its Rich Data Stores component projects (see presentation here) . Then she presented their worked starting at mid-2010 with the BHL-Au and BHL kickoff meetings at Museum Victoria in Melbourne and ALA HQ in Camberra. Then she explained the achievements setting up the infrastructure, including development and testing environments, assessing workflows to scan and adapting them to Australia conditions and developing a new user interface for BHL that should be ready by the end of 2010 with the existing functionality (test site can be accessed at http://bhl-test.ala.org.au) Following the topics proposed, Ely talked about human and other resources they have available and then explained the plan and timing ahead. The mirroring, ingestion and uploading processes should be ready by March 2011. She also mentioned Australian copyright laws allows to scan documents up to 1955, but they might concentrate on particularly scanning rare books by mid 2011 and they feel confident they will be able to apply for further funds to perform maintenance after that. Additionally, Ely mentioned other very interesting projects going on to BHL-Au like supporting annotations, scanning field notes and correcting OCR through volunteers work (crowd sourcing).
Abel Packer, presented afterwards about the BHL Brazil/ BHL SciELO Network of national and thematic collections of quality journals, funded by the Federal Government and the state of Sao Paulo government, the research community and the libraries (see his presentation here). The network governance formalization should be in place by the end of 2010, start of 2011. Their slow but fully sustainable technical work has focused on procedures and criteria for content to be digitized and the open technology used in the portal development, OAI metadata exchange services implementation and VHL-provided search engine functions. SciELO Network had a Kick-off Workshop on Essential Rare Works Collection in Biodiversity on February 2010; in close collaboration with BHL Advisory Committee it plans to validate the selection criteria and choose the 200 first journal/ bulletins titles and books to scan. Its plans are to be operational and launched with 100 initial books by December 2010, expand to Latin America and the Caribbean countries starting in 2011 and have more than 2000 books scanned, digitized and exposed through BHL by 2013.
Finally, our colleague from Egypt, Dr. Noha Adly, presented their progress in Bibliotheca Alexandrina (www.bibalex.org), a “center of excellence in the production and dissemination of knowledge” (see her presentation here) whose objectives fit perfectly with BHL’s and has been involved in digital libraries and projects for quite some time now, developing technical infrastructure, long-established mass digitization and OCR workflows, mirroring Internet Archive and massive data sets, training specialists for their workflow, and more recently working with the Arabic version of Encyclopedia Of Life. Bibliotheca Alexandrina has come a long way since it started with 1 scanner in 2003, it now has 120 trained specialists working using their 10 scanners, 7 days a week on two shifts, digitizing and doing the OCR of 167,000 Arabic books, photos, negatives, slides and maps to include into joint projects like Description de L’Egypte and the World Digital Library. They have also developed their own projects like Digital Assets Repository (composed of Digital Assets Factory, Digital Assets Metadata using Fedora to manage only metadata, Digital Assets Keeper and Digital Assets Publishers) and the Science Supercourse, a PowerPoint repository for health, agriculture, environment and computer engineering. Bibliotheca Alexandrina is interested in becoming a BHL partner, holding a mirror site and working on infrastructure. It has also offered to organize our next Technical Global BHL Meeting, (which everyone happily took note of).
In the afternoon, after a comfort break, Bianca Crowley, BHL Collections Manager, presented the analysis of the BHL User Survey 2010. A total of 16 reusable questions were developed and for this first time, an average of 1020 successful responses per question were analysed, to understand how current user groups are using BHL services and what new development are groups expecting in the future.
Chris Freeland, BHL Technical Director, lead us in two interesting discussions about the Names finding process in BHL and what can be done to improve it, given an existing 35% error rate on the species names when the OCR is performed. Here, it was noted how two subprocesses are been carried out: the string finding and the name reconciliation. While some of the existing services take on both processes (UBIO, for example), it was concluded we have no mechanism in place to validate the 5.1 million names, so we should concentrate on working on OCR correction and let the specialists handle the name reconciliation. We will make random sample data available for potential partners, so nomenclators, for example, could provide feedback on ratio of good names.
Finally to round up the first day, the group reviewed the implications of the Global Open Access and BHL standpoint on it. It’s no secret for anyone that the global world of copyright is very complex. The group commented on the several copyright issues and distribution limitations will encounter in sharing materials globally. BHL is not assuming any copyright responsibility on its own, moreover BHL doesn’t own any copyright. A small group of colleagues was defined to take all input about this topic and develop a suitable statement; taking into account that the user might get confused and frustrated if we end up with different categories of access and the system would have to be rebuilt to support it.
The second day the group was divided into an Administration subgroup, in charge of Policies & procedures needed for a global collaboration and another Technology subgroup to work on components needed for a Global BHL. The Administrative group was to deal with topics like Organization of each BHL node, Global BHL Collaboration and Governance and Communication Models for project leaders of each BHL node. On the other hand, the Technology group was set to discuss topics like the Content Ingest process in existing BHL, the Content Replication, making particular reference to preservation (LOCKSS) and mirror sites, the Localization, taking in consideration if he had to deal and share materials that couldn’t be openly distributed; and finally, the topic of Global Identifiers within the whole project.
Other more technical topics were covered during the rest of the meeting, from “Branding & Identity of the project itself” to funding opportunities, to data mining, OCR & Text correction experiences, and improvement of existing and new services required for integration at the APIs & User Interfaces levels. Even some birds-of-a-feather sessions on Content and Data Synch were included. Finally, a set of Action Items was sketched to follow up (see it here).