The “Orange Bag” problem and Citebank
Relative to other BHL staff, I came kind of late to the BHL project having only been on board since September of 2010. One of my first tasks at BHL was to deal with what was referred to as the “Orange Bag” problem. For years there had been a growing backlog of data that couldn’t be ingested via the standard BHL workflow (i.e. scanned via Internet Archive Scribe machines, uploaded to Internet Archive portal, then ingested into the BHL portal) due to the format of the data. Data in the orange bag consisted primarily of simple digital objects (e.g. single pdf of an article) versus the more structured complex objects (e.g. multiple image files for each page of a scanned book or journal) that we typically ingest into the BHL portal. Files were given to BHL on CDs or on hard drives and stored at BHL in, literally, a physical orange bag.
Upon arrival I immediately moved the data from the more vulnerable physical media to permanent storage on a network server. The Citebank platform, built in Drupal and utilizing the Biblio module, was just beginning to be developed by our programmer David at that time so I was able to work with him to identify the types of functionality needed for the administrative importer and for the end user interface.
As part of my work with data ingest into Citebank I have developed a which helps guide contributors on which citation data is needed depending on the publication type (eg. article, book, conference proceeding, etc) and how the values in those fields need to be formatted (e.g. lastName, firstName). In some cases, our contributor’s data cannot conform to these guidelines. I then work to normalize the data as much as possible before ingest so that it can interoperate well with existing data in the system and result in more effective search results for our end users.
Citebank has been public since early 2011 and continues to grow significantly in the number of citations ingested whether via manual processes or automated processes using the OAI-PMH protocol. Citebank now contains well over 104,000 citations with corresponding content files that represent 22 collections (including the title records for BHL books and journals as well as the articles created by BHL users)
What else does a data analyst do?
In addition to my Citebank work, I advise on user interface changes for the BHL portal (e.g. I helped determine how data should be displayed in the brief, full and MODS displays); develop rights metadata, and conduct usability tests.
More recently I was involved in writing part of the NEH grant called the “Art of Life”. My background in the arts and humanities was helpful in explaining how audiences from those fields would benefit from BHL’s wealth of illustrations. I was thrilled to hear our NEH grant was accepted and we will begin working on it later this summer.
Collaboration: BHL’s key to success
One aspect that continues to amaze me about this project is the collaborative and productive outcomes from a virtual team of staff that work in many different parts of the world and communicate across multiple time zones. Somehow it all seems to come together despite the fact that we only see each other face to face a few times a year.
On a weekly basis, I get to work with very smart colleagues at the Smithsonian Libraries, particularly with who as BHL collections manager helps determine what content is appropriate for Citebank while I help determine the best way to get the content into Citebank. I also get to collaborate with many of the other BHL staff by participating in the BHL Collections committee, presenting on BHL at conferences, and writing papers about BHL.
Our Users: My Inspiration
Attending last fall’s Life & Literature conference in Chicago was eye opening for me in terms of better understanding our users, their needs, the role that BHL has and can play in their work, as well as the types of audiences that could benefit from our content. The infection of our users is contagious and one of the most inspiring things for me is to read the feedback from our users who say their work would not be possible, or at least much more difficult, without the existence of BHL.
I have master’s degrees in both art history and library science from Indiana University. I never imagined I’d be working at a botanical garden or on a biodiversity project but my openness to go wherever the digital library winds want to take me has landed me in a pretty sweet spot and for that I am most grateful.
- Trish Rose-Sandler, Data Analyst, Biodiversity Heritage Library