Sharovatov’s Weblog

data URI theory and practice

Posted in browsers, css by sharovatov on 11 May 2008

Theory

Data URI'es is an RFC 2397 published in 1998. It's a URL scheme which is used to embed small resources right into the (X)HTML page.

Syntax is quite simple: data:[<mediatype>][;base64],<data>

To see how it works let's take the following code (testcase):

<link rel="stylesheet" type="text/css" href="data:text/css;base64,Ym9keXtiYWNrZ3JvdW5kOmdyZWVuO30=">

Browser supporting data URI will base64-decode the encoded string Ym9keXtiYWNrZ3JvdW5kOmdyZWVuO30= to body{background:green;} and then load this string as if it was a result of an http request to an external file containing this CSS code.

According to the RFC we can embed any small resource into our page, e.g:

  1. images (as img elements and CSS backgrounds)
  2. javascript
  3. html (links, iframes)
  4. css (and even dataURIed images inside dataURIed CSS!)
  5. any other resource supported by browsers

So theoretically we could have the same functionality as we have in MHTML — some or all external resources embedded directly in the page.

All data URI advocates say that as most of the browsers have 2 concurrent connections per server (but 6 in total), dataURI mechanism potentially can speed up page load by decreasing the amount of HTTP requests (especially in case of HTTPS where encrypting payload produces quite big overhead). But:

  • HTTP protocol already has methods to help building efficient applications — persistent connections to avoid recreating the sockets, different caching mechanisms to reduce overhead (Conditional GET) or avoid total amount of requests (aggressive caching using Expires header).
  • Even more, using simple technique you can have your browser use 6 concurrent connections to parallelize fetching data as much as it can and therefore fasten page load.
  • Though HTTP 1.1 spec says that we shouldn't have more than 2 concurrent connections per server, in real world we have 2 concurrent connections only in Firefox and IE6/7. In IE8b1 the number is 6, in Opera 9 and Safari it's 4. In the next post I will give more details on this.

So keeping all this in mind we can't just say that dataURI is the only usable way to improve page load times. But it definetely is the only option when you have a limited access to the server and/or the server is not configured properly, so you can't set Expires header for aggressive caching, you can't set DNS wildcards or CNAME records to get your resources served from different hosts (and therefore leverage the maximum available concurrent connection in browsers) or server doesn't support HTTP caching properly.

Practice

So I can see only the following cases where dataURI can be effectively used:

  • CSS sprites, rounded corners images, icons and other images that have only presentational semantics. It's the perfect target for dataURI + base64 to be applied to. If we embed them in the CSS file, we remove HTTP requests that would be queried if these images were normal files. These images are part of the design described in the stylesheet, so it makes perfect sense to embed them in CSS. CSS files can be perfectly cached and while design doesn't change, we don't need to touch this CSS and change anything. But there should be a common sense here as well — firstly, base64 decoding takes system resources and secondly, who wants to wait for a CSS file of couple hundred kilobytes in size to load?
  • Reasonably small CSS files with rules specific for a page. If there is a semantical sense to define an inline CSS on a page, then there's a perfect sense to set it using dataURI. Another thing is that if CSS file is not going to be parsed until it’s fully downloaded by a browser. So when we embed a big image, we’ll have first client opening the page wait till CSS is fully loaded. So we loose our HTTP parallelism benefits here.

Please don't forget that if a resource is embedded on multiple pages, it's obviously going to be redownloaded as many times as these containing pages are. And if a resource is not dataURI'ed but referenced normally as an external file, it can be cached quite aggressively and requested from the server only once (all popular web-servers already provide good caching support for static files).

However, this is all ideal world where specification don't have flaws and all the browsers follow them.

In our world we have the following:

  • Lack of support. Only IE8b1/Opera9/Firefox2/Safari support data URI. No IE6/IE71. That means that for the next three or four years while IE6 and IE7 will still have a significant market share, we can't just go and start using dataURI.
  • Different size limits on URI length in different browsers. As far as I know for now IE8 supports up to 32 kilobytes in data: value. Even though all other browsers support bigger sizes, our limit will obviously be 32Kb.
    See testcase 1 with data URI of 32755 bytes and testcase2 with dataURI of 32868 bytes.

Also I would strongly discourage from dynamically base64-encoding and embedding images in CSS files by some scripting language unless you're well aware of HTTP caching principles.

Let's consider the following composed code from Wikipedia data:URI page:

<?php
function data_url($file, $mime) 
{  
  $contents = file_get_contents($file);
  $base64   = base64_encode($contents); 
  return ('data:' . $mime . ';base64,' . $base64);
}

header('Content-type: text/css');

?>
 
div.menu {
  background-image:url(<?php echo data_url('menu_background.png','image/png')?>);
}

Unless accompanied with correct HTTP caching algorythm, this CSS file will be downloaded every time the page that references this CSS file is loaded! So every time user accesses the page referencing this CSS file, server will get a request, initiate script parsing, base64-encode the image and send it back to client. So you get rid of one simple request for an image (that in case of being a static file will be perfectly cached) but have one heavy request that will be run every time user requests a page! Not a fair change I think. So again, if you decide to use data URI scheme for your resources, encode and embed them beforehand or implement proper server-side HTTP caching and compressing support.

Note for russian-speaking users: — there's a way to embed images even for IE6/IE7. Though it's rather a proof-of-concept — it doesn't support HTTP caching/compressing, but it works!

Links and resources:

Follow

Get every new post delivered to your Inbox.