SHARE

Tuesday, October 11, 2011

Image Sizes in BHL. SEE ALSO: Piece of String, Length of.

“How long is a piece of string?” isn’t a familiar idiom to those living in the Midwest of the continental United States. Well, it wasn’t to at least one person living in the Midwest. It’s the answer you’ll get in the BHL AU office to questions like “How long does it take to build a website?” or any other question to which there isn’t a definitive answer for the general case, like “How big is a page image in BHL?”


Of course another way of answering would be “It depends”. To take two completely random examples, this page from Prodromus of the zoology of Victoria is 1828 pixels wide by 2879 pixels high. This page from Australian Lepidoptera and their transformations is 3496 pixels wide by 4785 pixels high.


Now, I’m sure you’ve gotten familiar with BHL’s API while you’ve been putting together your entry for the Life and Literature code challenge. I know you’re working hard on your entry, ‘cause it’s what all the other cool kids are doing.


You don’t need me to tell you that when you use the API to get an item’s metadata with the page flag set to true, you get a url for a thumbnail image and an url for the full size of each page. Which is fantastic, if you want an image that will fit into a 200px by 300px box, or an IOUS (image of unusual size). What if you want an images that will fit into a 600px by 800px box? Do you get the thumbnail and scale it up? Yes, but only if you’re doing it in a bad police procedural that creates image information from nowhere. In the real world, you need to get the full size image and downsize it. Until now.


Now you can get your Astacoides serratus at a range of sizes to suit your budget. Simply add the width and height of your bounding box at the end of the thumbnail image url, and Bob's your uncle. So, if you want an image to fit into a 600px by 800px box, instead of using the thumbnail url as is (http://bhl.ala.org.au/pagethumb/5221137), use http://bhl.ala.org.au/pagethumb/5221137,600,800 and you’ll get back an image that’s exactly 914px by 1440px.


Okay, I know that a 914px by 1440px image doesn’t fit into a 600px by 800px box. You’re still going to have to scale the image down to 508px by 800px to fit, but at least you only have to download a third of the information compared to the full size image (148kB vs 436kB). So why aren't we providing an image of A. serratus at 508px by 800px?


Rather than have the server scale the full size image for each request, images are available at fixed fractions of the original dimensions. The fractions available are a half, a quarter, an eighth and a 16th. Those are fractions of the width and height, so each step down has only a quarter as many pixels as the one before.


The server will give you the smallest image available that won’t need to be scaled up to fit within your bounds. So, using our old mate A. serratus as the example, if you specified a bounding box of 915px x 1441px, you'd get the full size image at 1828px x 2879px. If you don’t provide a width and height, the assumed size of the bounding box is 200px by 300px.


I’ve got to be honest, while all the links point to the Australian node, the heavy lifting for this was done by the good folks at http://www.biodiversitylibrary.org/. You can replace bhl.ala.org.au with www.biodiversitylibrary.org and get exactly the same results.


Have fun playing with the images, and I look forward to seeing how you put it to good use.

3 comments:

Rod Page said...

Nice, but (of course) I want more. I understand the reasons for choosing a series of pre-determined image sizes, but this means images may often be bigger than they need to be. What about adding an option to get an image smaller than the bounding box?

Long term we also need to think about removing colour background, either totally, or minimising it. You can get much smaller images in black and white. For example, the image http://bhl.ala.org.au/pagethumb/34570914,800,1316 is 631K (and actually 1480 × 2436 pixels). I can get the same page down to 147K (800 × 1316 pixels) in b&w http://dl.dropbox.com/u/639486/bhl/0532.png.

I think size matters, especially in the context of mobile apps (iPhone, iPad, etc.). I think the goal should be to approach the level of image size optimisation seen in Google's PDF viewer.

Thesherrin said...

G'day Rod,
Thanks for your feed back, and I definitely agree with you that size
matters. We're using the "out of the box" functionality provided by
archive.org to get the resized images, and the selection of which image
to return is done on their server. I can't see us adding the option of
getting an image that's smaller than the requested bounding box in the
near future. In the borderline cases, you could end up with an image
that you'd be scale up by a factor of 2 (almost), which doesn't result
in the most readable text. Of course, you could say well, in those cases
if the next size up was only a little bit bigger than the bounding box,
then send the bigger image. However, the level of up scaling
that's acceptable would be different for everybody.

If you'd like to have that level of control over your image selection,
you can use the a mini-thumbnail to get some image dimensions at a
low bandwidth cost. For example, say you wanted to show my old mate
Astacoides serratus in a 1000 by 1500 window.  If you send off a request
for a small bounding box of 100px by 200px you'll get back an image
that's 115px by 180px and only 3.73kB. From that, you can
estimate that the widths of your larger images
will be 230 (+/-2), 460 (+/-4), 920(+/-8), 1840 (+/-16).  If you wanted to go for the smaller image and scale up, you’d send a request with a 912px by 1432px bounding box, get back the image at 914px by 1440px, and Robert would, once again, be your avuncular figure. Hope this helps. - Simon

Rod Page said...

OK, I see. I'd assumed this was something BHL had implemented. In order to get more control of the images I'm now moving to downloading the DjVu files from Internet Archive and extracting the images locally. For now I'm keeping the faded-brown look but I'm exploring B&W page images. The other reason for grabbing the DjVu files is I can then generate text overlays and eventually have a document viewer that has the same sort of functionality as Google's PDF viewer.

As an aside, this blog really needs a decent comment system (Blogger's is pretty terrible). I'm a fan of Disqus, which can also mirror the conversations on Twitter (which is pretty much the only way to get my attention).

Oh, and the AB's are going to crush your guys on Sunday ;)