BHL News, Blog Reel, Tech Updates

Now serving all page images via djatoka

Last fall developers Ryan Chute and Herbert Van de Sompel from Los Alamos National Laboratory’s Research Library released djatoka, a new Open Source JPEG 2000 image server. This new project, first reported in D-Lib Magazine, provides a scalable and open solution for delivering high resolution JPEG 20000 images, such as the (nearly) 11 million pages scanned to date and made available through the Biodiversity Heritage Library.

This development was greeted with cheers by the BHL Technical Development Team, as it solves problems we’ve previously reported with the non-scalable, proprietary solutions used to serve JPEG 2000 images. Following a functional evaluation in December 2008, djatoka was integrated into the BHL staging site for performance testing and was promoted to production on Thursday, January 22, 2009.

To view djatoka in use on BHL materials, check out the following page from “Wild oxen, sheep & goats of all lands, living and extinct” by R. Lydekker published in 1898, selected in honor of the Chinese New Year, in this, the Year of the Ox:
http://www.biodiversitylibrary.org/page/9370105

JPEG 2000: An excellent format with poor support
Displaying JPEG 2000 images on the web can be a challenge because 1) the images are often too large to be downloaded quickly, 2) most web browsers don’t understand JPEG 2000 images without appropriate plug-ins, and 3) until the release of djatoka there were no active open source image servers; costly commercial solutions were the only option, aside from complete custom development.

JPEG 2000 images are different from traditional image formats like GIF, JPEG, and PNG in that they consist of several “layers”, one for each available resolution. Think of JPEG 2000 images as a pyramid – each layer of a JPEG 2000 file is a copy of the same image, each at progressively larger size as you step down the pyramid. The top layer may be relatively small but the bottom layer could be quite large. Consider a JPEG 2000 image whose bottom layer is 18,000 pixels square, such as this example. To view this image in its entirety using standard 1024 x 1280 monitors would require an array of 270 monitors stacked 15 high and running 18 wide! Cool, yes, but impractical.

So, to deliver JPEG 2000 images to web users, scalable software is needed to carve up the JPEG 2000 image into smaller chunks at a given resolution in a format natively understood by a users’ browser. And, since we are talking about this occurring during the load time of a web page, all of this has to happen very quickly.

BHL’s first approach to JPEG 2000 delivery
BHL has delivered JPEG 2000 images to users since its public launch in February 2007 using a mix of proprietary and open source technologies. BHL development activities are organized at Missouri Botanical Garden (MOBOT), whose Botanicus digital library system was an early prototype for BHL. Guided by previous work by MOBOT developers, and reusing the infrastructure already in place at MOBOT, BHL developers integrated LizardTech’s ExpressServer for server-side JPEG 2000 tiling and the open source GSIV javascript library for interface display.

Though functional, the addition of BHL’s 1,500 users per day pushed ExpressServer beyond its available capacity, causing a significant delay in delivering page images. Since it is a commercial solution, and one that is licensed by processor, to increase capacity would require the purchase of an additional ExpressServer license, costing upwards of $15,000USD. An open source option was needed, but none was available until the release of djatoka.

About djatoka
djatoka (pronounced jay-too-kay) uses the Kakadu library to process and present JPEG 2000 images. It is a Java-based web application that runs on Apache Tomcat. djatoka allows a web page to perform an asynchronous request to get image parameters, such as maximum size of the image or the number of resolution levels available. The web page can then execute some javascript to request the necessary image tiles from djatoka and arrange them within the browser to form a larger composite image, similar to the now ubiquitous Google Maps interface.

djatoka provides a streamlined API for developers to script against to generate and assemble the image tiles. Further, the existing IIPImage Javascript Viewer has been modified to work with djatoka, relieving potential adopters of the significant burden of writing complex code to calculate, request, and assemble necessary tiles. IIP also provides interface functionality that enables the user to pan around the image and zoom in or out.

For detailed information about djatoka, including source code for download, visit:

djatoka info – http://african.lanl.gov/aDORe/projects/djatoka
djatoka on SourceForge: http://sourceforge.net/projects/djatoka

djatoka in BHL
djatoka offers a rich user interface through integration of the IIPImage JavascriptViewer.BHL already had an existing interface for delivering images that tested well with users, so our goal was to use djatoka as a drop-in replacement for ExpressServer without affecting end user functionality. With those objectives in mind, we made the following changes to IIPImage and left the server alone; it’s running pretty much in its default state save for some caching options we customized.

djatoka brought with it some improvements to our user interface:

Preview thumbnail. This allows a user to see which portion of the overall image is being viewed. This also lets the user easily view different parts of the image by simply dragging the control to the desired part of the image.
Simplified zoom controls. With our djatoka implementation we got rid of the radio buttons formerly used to zoom in and out and replaced them with plus and minus buttons.

Some changes we made to the djatoka viewer are as follows:

We needed to retain the Save and Print image features from the previous image viewer. Therefore, we rolled these into the djatoka viewer.
BHL defaults to serving a single low resolution JPEG images, when available, and only escalates to more costly JPEG 2000 processing when instructed by the user. We made the djatoka viewer friendlier toward raw JPEGs. djatoka itself handles raw JPEGs with aplomb.
We removed a text box that provides a way to embed a Region Of Interest (ROI), or “slice,” of the image as tiled by djatoka. While this is a very useful feature, we felt the current implementation consumed too much screen space of the visible page image. We plan to reintroduce this in a less obtrusive manner.

In the end, we were successful in creating a drop-in replacement for the existing viewer. We didn’t drastically change our user interface but still managed to make significant improvements. We chose an evolution this time around; the revolution has been carefully scheduled for another date.

Towards a scalable infrastructure
BHL developer Phil Cryer has written an excellent and detailed blog post about the technical infrastructure devised to provide scalability and fault tolerance to our djatoka implementation. It is available at http://www.fak3r.com/2009/01/27/howto-serve-jpeg2000-images-with-a-scalable-infrastructure/.

Future work
While we are happy with our initial implementation of djatoka, we have already planned some future enhancements and have begun discussions concerning priorities with the djatoka development community, including lead developers Chute and Van de Sompel. These enhancements are as follows:

An embed image link. This will directly replace functionality we removed from our implementation of the djatoka viewer. Drawing from similar features elsewhere, we are leaning toward a clickable link icon which pops up a box which contains a URI for the image as the user is currently viewing it.
The current djatoka viewer is based on an old version of MooTools. Newer versions of MooTools are not compatible with the djatoka viewer. We’d like to fix this so that it’s easier to drop into a website that uses a current or future version of MooTools. This will allow us to use a newer and more flexible version of this useful code library, and will allow others to reuse our work more easily.

Currently, our changes to the djatoka viewer are pretty specific to the BHL. In the spirit of Open Source, we will make these enhancements available to the community. We have the goal of making the viewer generic enough so that it can be used without major customizations, and we will be considering several use cases for our work.

Conclusion
As described above, djatoka has easily integrated into the production BHL infrastructure and user interface with minimal effort. It has nullified problems that existed with the previous, proprietary solution used to deliver JPEG 2000 images, and thus far has performed without significant error or delay since being promoted into production at www.biodiversitylibrary.org. Because this is an open source solution, BHL can easily scale up without significant expense should we require additional capacity. Our experience implementing djatoka has been overwhelmingly positive, and we would encourage any project currently serving JPEG 2000 images to evaluate its features and functionality within your infrastructure.

If you’d like to learn more about djatoka visit the main web site or get the code from SourceForge. Join the listservs to become an active member of the djatoka development community, as the lists are the best source of current information and have an active user base. Finally, please leave comments or feedback about our specific implementation of djatoka in BHL using the Comment form below.

Chris Freeland, BHL Technical Director
chris.freeland (at) mobot.org

Chris Moyers, BHL Developer
chris.moyers (at) mobot.org

Technical Notes

January 26, 2009

Written by Chris Freeland

Chris Freeland served as the BHL Technical Director from 2006-2012. He is currently the Director of the Open Libraries program at Internet Archive. In this capacity he works with libraries & publishers to digitize their collections, working towards the Archive’s mission of providing “universal access to all knowledge.”

16 Comments

jeromine March 26, 2009 at 3:03 am Reply

This comment has been removed by a blog administrator.
peacay February 15, 2009 at 1:26 am Reply

Thanks for the updating of the feed! It’s great to have more than title now.
kehan January 30, 2009 at 7:28 pm Reply

seems to be working now, although I’m at home 😉

I really do think it’ll be working as I could access the images on port 80 at work.

I like the fullscreen button 😉
Cheers,
kehan
cminor9 January 30, 2009 at 4:35 pm Reply

@peacay

Thanks for the feedback. This is most helpful for us to see.

Point taken about the magnifying glass and the zooming.

In case you are interested, the magnifying glass triggers the detail view, which allows you to zoom in and out. By default we load a smaller image, which makes the page load faster for general viewing. The downside is that this image doesn’t allow zooming. You must click a button to download a large image to enable zooming. This isn’t exactly seamless, but we’re working toward that as the ultimate goal 🙂

Thanks again for the feedback!

Chris Moyers
philcryer.com January 30, 2009 at 11:55 am Reply

@kehan said…
” Maybe it’s my firewall at work, but I get the following error for the page frame: The connection was refused when attempting to contact images.biodiversitylibrary.org:81. I loaded the frame in a separate tab, changed the port to 80 and hey presto it was working”

I appreciate your feedback, the viewer is in an iframe on port 81 – which, while it works for most, seems that it’s likely blocked on some firewalls for outbound users. I suspect this was your issue, as it was for another poster which displayed a Squid error, which we don’t run at mobot. So, we have reset the ports, and are now pointing to the default http port, 80 – we think this will fix the issue, please let us know if it does not.

Thanks again for the feedback, feel free to contact me directly if you experience any other issues.

Phil

phil.cryer (at) mobot.org
kehan January 30, 2009 at 9:30 am Reply

erm it broke 🙁
Maybe it’s my firewall at work, but I get the following error for the page frame:
The connection was refused when attempting to contact images.biodiversitylibrary.org:81.

I loaded the frame in a separate tab, changed the port to 80 and hey presto it was working – nice quality image I must say.
Chris Freeland January 29, 2009 at 3:57 pm Reply

Thanks, Brewster! I've already been in touch with Raj to discuss how your new bookreader (which is reeeeeeaalllly nice!) might play well with djatoka and fit into needs for BHL. There's a lot of exciting work happening in open access book delivery…I'm wondering if you guys have plans to get groups together who are working on this – a one-day workshop on the future of the digital book, or something??! Would be fun & educating!
Anonymous January 29, 2009 at 12:59 pm Reply

Very nice result !

Do you have any ideas when your modifications to the viewer will be available ? And where will you publish them ?
peacay January 28, 2009 at 12:12 am Reply

Although the technicalities are far beyond me, this seems to really work well. Thank you.

Just a teeny tiny point: when I landed on the buffalo page, the first thing I looked for was a zooming button of some sort. I started double clicking on the main picture and also the thumbnail on the side. All of those icons at the top have mouseover text, except the magnifying glass.

Actually, looking at view source, it does appear to have text. But it doesn’t display, even on hard refresh. Odd. I’m on FF 3.0.5 with XP(sp2).

As a secondary point, why don’t those ‘+’ and ‘-‘ buttons display from the outset?? It’s a significant delay after clicking the magnifying glass icon: it reloads the whole viewing area page before displaying the ‘+’ and ‘-‘.

Not a criticism, just an observation.
Anonymous January 27, 2009 at 5:53 pm Reply

Nice to see it work so well.

At the Archive, we too have been using the kakadu library underneath, but with a different shell.

way to go!

-brewster

Newer Comments »

Cancel Reply

About BHL

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at the Smithsonian Libraries and Archives in Washington, D.C., BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access as part of a global “biodiversity community.”

Now serving all page images via djatoka

Related Posts

16 Comments

Leave a Comment

Cancel Reply

Help Support BHL

Search

About BHL

Follow BHL

Join Our Mailing List

Subscribe to our Blog Via RSS

Now serving all page images via djatoka

Related Posts

Quello che era nuovo in TDWG 2013?

Updates to API & Tech Documents

Revised BHL Architecture

16 Comments

Leave a Comment

Cancel Reply

Help Support BHL

Search

About BHL

Follow BHL

Join Our Mailing List

Subscribe to our Blog Via RSS