Wednesday, July 31, 2013

Serving multiple images from database as a CSS sprite

Introduction

In the first public beta version of ZEEF which was somewhat thrown together (first get the minimum working using standard techniques, then review, refactor and improve it), all favicons were served individually. Although they were set to be agressively cached (1 year, whereby a reload is when necessary forced by the timestamp-in-query-string trick with the last-modified timestamp of the link), this resulted in case of an empty cache in a ridiculous amount of HTTP requests on a subject page with relatively a lot of links, such as Curaçao by Bauke Scholtz:

Yes, 209 image requests of which 10 are not for favicons, which nets as 199 favicon requests. Yes, that much links are currently on the Curaçao subject. The average modern webbrowser has only 6~8 simultaneous connections available on a specific domain. That's thus a huge queue. You can see it in the screenshot, it took on an empty cache nearly 5 seconds to get them all (on a primed cache, it's less than 1 second).

If you look closer, you'll see that there's another problem with this approach: links which doesn't have a favicon re-requests the very same default favicon again and again with a different last-modified timestamp of the link itself, ending up in copies of exactly same image in the browser cache. Also, links from the same domain which share the same favicon, have their favicons duplicated this way. In spite of the agressive cache, this was simply too inefficient.

Converting images to common format and size

The most straightforward solution would be to serve all those favicons as a single CSS sprite and make use of CSS background-position to reference the right favicon in the sprite. This however requires that all favicons are first parsed and converted to a common format and size which allows easy manipulation by standard Java 2D API (ImageIO and friends) and easy generation of the CSS sprite image. PNG was chosen as format as that's the most efficient and lossless format. 16x16 was chosen as default size.

As first step, a favicon parser was created which verifies and parses the scraped favicon file and saves every found image as PNG (the ICO format can store multiple images, usually each with a different dimension, e.g. 16x16, 32x32, 64x64, etc). For this, Image4J (a mavenized fork with bugfix) has been of a great help. The original Image4J had only a minor bug, it ran in an infinite loop on favicons with broken metadata, such as this one. This was fixed by vijedi/image4j. However, when an ICO file contained multiple images, this fix discarded all images, instead of only the broken one. So, another bugfix was done on top of that (which by the way just leniently returned the "broken" image — in fact, only the metadata was broken, not the image content itself). Every single favicon will now be parsed by ICODecoder and BMPDecoder of Image4J and then ImageIO#read() of standard Java SE API in this sequence. Whoever returned the first non-null BufferedImage(s) without exceptions, this will be used. This step also made us able to completely bypass the content-type check which we initially had, because we discovered that a lot of websites were doing a bad job in this, some favicons were even served as text/html which caused false negatives.

As second step, if the parsing of a favicon resulted in at least one BufferedImage, but no one was in 16x16 dimension, then it will be created based on the firstnext dimension which is resized back to 16x16 with help of thebuzzmedia/imgscalr which yielded high quality resizings.

Finally all formats are converted to PNG and saved in the DB (and cached in the local disk file system).

Serving images as CSS sprite

For this a simple servlet was been used which does basically ultimately the following in doGet() (error/cache checking omitted for simplicity):


Long pageId = Long.valueOf(request.getPathInfo().substring(1));
Page page = pageService.getById(pageId);
long lastModified = page.getLastModified();
byte[] content = faviconService.getSpriteById(pageId, lastModified);

if (content != null) { // Found same version in disk file system cache.
    response.getOutputStream().write(content);
    return;
}

Set<Long> faviconIds = new TreeSet<>();
faviconIds.add(0L); // Default favicon, appears as 1st image of sprite.
faviconIds.addAll(page.getFaviconIds());

int width = Favicon.DEFAULT_SIZE; // 16px.
int height = width * faviconIds.size();

BufferedImage sprite = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB);
Graphics2D graphics = sprite.createGraphics();
graphics.setBackground(new Color(0xff, 0xff, 0xff, 0)); // Transparent.
graphics.fillRect(0, 0, width, height);

int i = 0;

for (Long faviconId : faviconIds) {
    Favicon favicon = faviconService.getById(faviconId); // Loads from disk file system cache.
    byte[] content = favicon.getContent();
    BufferedImage image = ImageIO.read(new ByteArrayInputStream(content));
    graphics.drawImage(image, 0, width * i++, null);
}

ByteArrayOutputStream output = new ByteArrayOutputStream();
ImageIO.write(sprite, "png", output);
content = output.toByteArray();
faviconService.saveSprite(pageId, lastModified, content); // Store in disk file system cache.
response.getOutputStream().write(content);

To see it in action, you can get all favicons of the page Curaçao by Bauke Scholtz (which has page ID 18) as CSS sprite on the following URL: https://zeef.com/favicons/page/18.

Serving the CSS file containing sprite-image-specific selectors

In order to present the CSS sprite images at the right places, we should also have a simple servlet which generates the desired CSS stylesheet file containing sprite-image-specific selectors with the right background-position. The servlet should basically ultimately do the following in doGet() (error/cache checking omitted to keep it simple):


Long pageId = Long.valueOf(request.getPathInfo().substring(1));
Page page = pageService.getById(pageId);

Set<Long> faviconIds = new TreeSet<>();
faviconIds.add(0L); // Default favicon, appears as 1st image of sprite.
faviconIds.addAll(page.getFaviconIds());

long lastModified = page.getLastModified().getTime();
int height = Favicon.DEFAULT_SIZE; // 16px.

PrintWriter writer = response.getWriter();
writer.printf("[class^='favicon-']{background-image:url('../page/%d?%d')!important}", 
    pageId, lastModified);
int i = 0;

for (Long faviconId : faviconIds) {
    writer.printf(".favicon-%s{background-position:0 -%spx}", faviconId, height * i++);
}

To see it in action, you can get the CSS file of the page Curaçao by Bauke Scholtz (which has page ID 18) on the following URL: https://zeef.com/favicons/css/18.

Note that the background-image URL has the page's last modified timestamp in the query string which should force a browser reload of the sprite whenever a link has been added/removed in the page. The CSS file itself has also such a query string as you can see in HTML source code of the ZEEF page, which is basically generated as follows:


<link id="favicons" rel="stylesheet" 
    href="//zeef.com/favicons/css/#{zeef.page.id}?#{zeef.page.lastModified.time}" />

Also note that the !important is there to overrule the default favicon for the case the serving of the CSS sprite failed somehow. The default favicon is specified in general layout CSS file layout.css as follows:


#blocks .link.block li .favicon,
#blocks .link.block li [class^='favicon-'] {
    position: absolute;
    left: -7px;
    top: 4px;
    width: 16px;
    height: 16px;
}

#blocks .link.block li [class^='favicon-'] {
    background-image: url("#{resource['zeef:images/default_favicon.png']}");
}

Referencing images in HTML

It's rather simple, the links were just generated in a loop whereby the favicon image is represented via a plain HTML <span> element basically as follows:


<a id="link_#{linkPosition.id}" href="#{link.targetURL}" title="#{link.defaultTitle}">
    <span class="favicon-#{link.faviconId}" />
    <span class="text">#{linkPosition.displayTitle}</span>
</a>

The HTTP requests on image files have been reduced from 209 to 12 (note that 10 non-favicon requests have increased to 11 non-favicon requests due to changes in social buttons, but that's not further related to the matter):

It took on an empty cache on average only half a second to download the CSS file and another half a second to download the CSS sprite. Per saldo, that's thus 5 times faster with 197 connections less! On a primed cache it's even not requested at all. Noted should be that I'm here behind a relatively slow network and that the current ZEEF production server on a 3rd party host isn't using "state of the art" hardware yet. The hardware will be handpicked later on once we grow.

Reloading CSS sprite by JavaScript whenever necessary

When you're logged in as page owner, you can edit the page by adding/removing/drag'n'drop links and blocks. This all takes place by ajax without a full page reload. Whenever necessary, the CSS sprite can during ajax oncomplete be forced to be reloaded by the following script which references the <link id="favicons">:


function reloadFavicons() {
    var $favicons = $("#favicons");
    $favicons.attr("href", $favicons.attr("href").replace(/\?.*/, "?" + new Date().getTime()));
}

Basically, it just updates the timestamp in the query string of the <link href> which in turn forces the webbrowser to request it straight from the server instead of from the cache.

Note that in case of newly added links which do not exist in the system yet, favicons are resolved asynchronously in the background and pushed back via Server-Sent Events. In this case, the new favicon is still downloaded individually and explicitly set as CSS background image. You can find it in the global-push.js file:


function updateLink(data) {
    var $link = $("#link_" + data.id);
    $link.attr("title", data.title);
    $link.find(".text").text(data.text);
    $link.find("[class^='favicon-']").attr("class", "favicon")
        .css("background-image", "url(/favicons/link/" + data.icon + "?" + new Date().getTime() + ")");
    highlight($link);
}

But once the HTML DOM representation of the link or block is later ajax-updated after an edit or drag'n'drop, then it will re-reference the CSS sprite again.

The individual favicon request is also done in "Edit link" dialog. The servlet code for that is not exciting, but for the case you're interested, the URL is like https://zeef.com/favicons/link/354 and all the servlet basically does is (error/cache checking omitted for brevity):


Long linkId = Long.valueOf(request.getPathInfo().substring(1));
Link link = linkService.getById(linkId);
Favicon favicon = faviconService.getById(link.getFaviconId());
byte[] content = favicon.getContent();
response.getWriter().write(content);

Note that individual favicons are not downloaded by their own ID, but instead by the link ID, because a link doesn't necessarily have any favicon. This way the default favicon can easily be returned.