3 hard weeks' friday night

Too much news in just three weeks…

I'm now managing a small team of 3 developers and sysadmin — great guys, seems to be a good team, so far at least :)

I didn't think that managing role would require so much work, but it's fair — responsibility is so much higher.

plan of the room with 5 descsWe've got our own quite room with AC, new furniture and decend sound system, each web-developer has a fresh modern PC (Intel Core 2 Duo + 2GB RAM) with a dual 19″ monitor configuration. During the day we listen to sky.fm new age channel which keeps us from talking to ourselves loudly and makes the atmosphere a little bit more comfortable.

Me and two php developers started a reasonably small PHP/MSSQL WEB2.0 web-app project (200 man hours approximately), and our sysadmin started doing things that have been planned for ages but I never had time to look at them or research, and that's so cool to get things done in time!

I've signed for a fogbugz 45 days trial — that's an awesome piece of project management software with evidence-based scheduling approach to estimating and project release date prediction. I'll prepare a separate post about this when we get our web-app project done, but the only thing I can tell for now — it just rocks! We are going to buy a server license as soon as our trial expires :)

As we have Windows on servers and workstations, I decided to set the following list of software to be our company-wide standard for PHP development:

  • Windows XP Professional — our OS standard, the most cost-effective system with such a small price for such a great level of integration
  • Eclipse PDT as a PHP IDE. It's free, has got all the basic IDE features and a good subversion SVN plugin Subclipse. It's definetely not Visual Studio 2005, but still very good
  • VisualSVN Server — free and easy-to-use subversion-based SVN server which has MMC snap-in and that's very handy in the complex AD environment we have
  • previously mentioned fogbugz — genuine project-management and bugtracking software
  • PHP5.2 (FastCGI) + Windows Server 2003 IIS6
  • Freemind — free and simple mindmapping tool for brainstorming and technical specification drafting.

So without the FogBugz (and Windows XP which would be installed anyway) the whole PHP development infrastructure costs nothing! Awesome! And FogBugz pricing is very reasonable for such a functionality!

So the project has started, well see how it goes and that's going to be my trial as a manager.

We are also trying to move away from virtual hosting to our own dedicated servers to get as much control over our environment and to gain as much integration opportunities as possible.

So as we have 3 geographically remote offices (London, Moscow and Volzhsky), I had the goal of joining the offices into one secure transparent network with failover capability.

We got D-Link DFL-800 firewall installed in each office. This is a hardware firewall with 2 WAN ports and VPN client/server support. Each office has got 2 internet channels from separate ISPs and all three firewalls are connected to each other with 2 VPN channels — one on first ISP and another on the second ISP, so even if 3 links fail, network link between the offices is still alive.

Routing is configured in a way that any machine from any network can access any other machine in any other network as if it was sitting just next to it.

Next thing I had to do was moving the website from virtual hosting (protechweb.co.uk) to a dedicated server. Next post describes the process I followed.

Comments (2)

HTTP History Lists and Back Button

While writing the post about forms values persistence, I noticed that browsers handle back button in different HTTP situations differently.

HTTP 1.1 spec says the following:

13.13 History Lists

 User agents often have history mechanisms, such as "Back" buttons and
 history lists, which can be used to redisplay an entity retrieved
 earlier in a session.

 History mechanisms and caches are different. In particular history
 mechanisms SHOULD NOT try to show a semantically transparent view of
 the current state of a resource. Rather, a history mechanism is meant
 to show exactly what the user saw at the time when the resource was
 retrieved.

 By default, an expiration time does not apply to history mechanisms.
 If the entity is still in storage, a history mechanism SHOULD display
 it even if the entity has expired, unless the user has specifically
 configured the agent to refresh expired history documents.

 This is not to be construed to prohibit the history mechanism from
 telling the user that a view might be stale.

So it clearly recommends UA authors to separate history list and cache behaviour. So if user navigates through the history list (using Back or Forward buttons), HTTP spec recommends to show the exact response that the user saw before, regardless if it's stale or expired.

I've tested 4 major browsers — IE, FF, Opera and Safari, and here is the summary table:

Expires in the future +
Conditional GET validators
no request no request no request no request
Expires in the future no request no request no request no request
Conditional GET validators no request no request no request no request
no HTTP caching headers no request no request no request full request
Expires in the past no request no request no request full request
Cache-Control: no-store full request full request no request full request
Cache-Control: no-store +
Expires in the past
full request full request no request full request
Page served with IE8 FF Opera Safari

So we can see that only Opera follows HTTP 1.1 recommendation.

Obviously IE and FF don't produce a request when HTTP caching is not explicitly prohibited which is against the HTTP spec recommendation, but this was done intentionally asauthors usually prohibit caching for a reason and don't want users to view those pages without revalidating.

And Safari just does the full request whenever the page is not cached explicitly.

Comments

IE8b1 — expressions support

Internet Explorer keeps on changing. One example is expressions — their support is dropping from version to version. In different versions of IE this testcase produces the following output.

So IE started filtering values in expressions since IE6, and now IE8b1 in both modes doesn't even allow you to use object property accessors (both dot and square brackets notations — see testcase). So in IE8b1 in expressions you can only use plain string values in your expressions (which is not handy at all) or call externally defined functions.

I can't help thinking of any other reason for disabling this except for protecting from potential XSS threat that I described in the previous post.

Also in IE8b1 expressions are not reevaluated on mouseover event (see testcase), but onscroll still fires document.recalc (this again seems to be left intentionally in order to support all cludges that were invented to implement for example non-existent position: fixed CSS rule).

Bottom line, if you have expressions used in your CSS code, don't wait — separate all the stuff you do there to JS functions and just call these functions from your expressions.

Comments

data URI browser issues

Length limit

Theory

data URI specification says the following:

some applications that use URLs may impose a length limit; for
example, URLs embedded within <A> anchors in HTML have a length limit
determined by the SGML declaration for HTML [RFC1866]. The LITLEN
(1024) limits the number of characters which can appear in a single
attribute value literal, the ATTSPLEN (2100) limits the sum of all
lengths of all attribute value specifications which appear in a tag,
and the TAGLEN (2100) limits the overall length of a tag.

Though at the time of writing data URI specification HTML3.2 was current HTML recommendation, author intentionally used LITLEN, ATTSPLEN and TAGLEN values from the older HTML2.0 SGML declaration to show that some user-agents may impose a length limit for URI.

HTTP1.1 doesn't put a limit on the length of URI, but it warns us:

Note: Servers ought to be cautious about depending on URI lengths
above 255 bytes, because some older client or proxy
implementations might not properly support these lengths.

which basically means that if all clients in the network support URIes more than 255 bytes long, we're ok.

HTML3.2 SGML declaration states the maximum length of an attribute to be 65535. Even more, HTML4.01 SGML declaration uses value 65535 as a maximum allowed in SGML but says that fixed limits should be avoided. XML1.0 SGML declaration uses 99999999 value just to show that there's no limit specified.

Practice

Different browsers have different maximum length of dataURI'ed values supported.

As per the kb208427 article, IE supports URI length of up to 2048 bytes. According to the Microsoft IE8 data URI support whitepaper, IE8 supports up to 32Kbytes of data URI and silently discards dataURI value if its size exceeds 32Kbytes (which can be checked in the testcase1 and testcase2). As I've already mentioned in the previous post, other browsers provide bigger-sized URI support, but I doubt that IE8 will have minor market share so we will still have to stick to 32Kbytes. And I will repeat: data URI spec author that the only reasonable and semantic use of data URI is embedding small resources, so realistically speaking 32Kbytes limit shouldn't be a problem.

Serving CSS dataURI'ed

In theory, CSS has to be served with its MIME type (text/css).

In practice, only Firefox and only in standards compliancy mode cares about MIME type that CSS's been served with. Please see the testcases with CSS served with wrong MIME type in different render modes: in standards compliancy mode and in quirks mode. Opera, Safari and Internet Explorer 8 all apply CSS served with any MIME type in all modes. The behaviour is the same for both CSS files served using dataURI and by referencing normal external files.

Serving Javascript and HTML dataURI'ed

Safari, Opera and Firefox support embedding javascript using data URI scheme. According to the whitepaper, IE8b1 doesn't support this. Here's the quote:

Scripts in data URIs are unsupported because they allow potentially harmful script to bypass server- and
proxy-script filters for applications such as HTML email. (Web-based email clients generally do not allow
emails to execute script; data URIs could be used to easily bypass these filters).

I do agree that this is a valid point, it is a potential security issue, dataURIed javascript is even published as an XSS vector. Please see the testcase.

Opera and Safari run dataURI'ed HTML page in a separate isolated context, IE8b1 doesn't support dataURI'ed html at all, so the only affected browser here is Firefox. There's an interesting bugzilla entry describing the XSS (marked as duplicate to the security proposal) which says:

The attack works by exploiting an ambiguity in RFC 2397 with regard to the Javascript same-origin security policy — what is the origin of a URI? Is it the containing page? If so, preventing this attack is the responsibility of site maintainers. If not, FF should launch the child of a data: URI without
same-origin privileges.

Firefox authors reply that this is site maintainers' problem to filter dataURI and compare this to filtering javascript: (which is quite fair) but why did they want to create a new hole in all the sites for some vague benefits of executing dataURI'ed scripts in the same context?

To me it seems a bit weird especially looking at the fact that other browsers do care about execution context. The bugzilla entry is still opened, but I doubt that this is going to be fixed. So, “site maintainers”, be aware!

Nested dataURI'es

Neither dataURI spec nor any other mentions if dataURI'es can not be nested. So here's the testcase where dataURI'ed CSS has dataURI'ed image embedded. IE8b1, Firefox3 and Safari applied the stylesheet and showed the image, Opera9.50 (build 9613) applies the stylesheet but doesn't show the embedded image! So it seems that Opera9 doesn't expect to get anything embedded inside of an already embedded resource! :D

But funny thing, as IE8b1 supports expressions and also supports nested data URI'es, it has the same potential security flaw as Firefox does (as described in the section above). See the testcase — embedded CSS has the following code: body { background: expression(a()); } which calls function a() defined in the javascript of the main page, and this function is called every time the expression is reevaluated. Though IE8b1 has limited expressions support (which is going to be explained in a separate post) you can't use any code as the expression value, but you can only call already defined functions or use direct string values. So in order to exploit this feature we need to have a ready javascript function already located on the page and then we can just call it from the expression embedded in the stylesheet. That's not very trivial obviously, but if you have a website that allows people to specify their own stylesheets and you want to be on the safe side, you have to either make sure you don't have a javascript function that can cause any potential harm or filter expressions from people's stylesheets.

Line feeds

Firefox, Opera, Safari and IE8b1 support both data URI values supplied as one line (as an URI) and splitted by 76 bytes (as specified in MIME and MHTML RFCs). See the testcase.

But base64 RFC doesn't put a requirement to split base64 strings:

MIME [3] is often used as a reference for base 64 encoding.  However,
MIME does not define "base 64" per se, but rather a "base 64
Content-Transfer-Encoding" for use within MIME.  As such, MIME
enforces a limit on line length of base 64 encoded data to 76
characters.  MIME inherits the encoding from PEM [2] stating it is
"virtually identical", however PEM uses a line length of 64
characters.  The MIME and PEM limits are both due to limits within
SMTP.

Implementations MUST NOT not add line feeds to base encoded data
unless the specification referring to this document explicitly
directs base encoders to add line feeds after a specific number of
characters.

DataURIed images with images turned off

When you turn off images in your browser, only Firefox still shows dataURIed images. IE8b1, Safari and Opera don't show the image as it's supposed to be when you turn the images off. To test this turn off images in your browser and run the testcase.

UPDATE: Firefox developers told me this is by design as unchecking “Load images automatically” option in the browser settings disables only network request to get the image. So if the image is accessible without doing a network request — either from cache or embedded as dataURI, it will be displayed in any case.

Dynamically created dataURIes

As dataURI can contain binary data (e.g. to show images), there are thoughts on using this. Ajaxian has a crazy article on creating pure JS video player that doesn't use flash but changes dataURI'ed images instead. This technique may get some practical evolution and usage, but now it's rather impractical.

Links and references

Comments

data URI theory and practice

Theory

Data URI'es is an RFC 2397 published in 1998. It's a URL scheme which is used to embed small resources right into the (X)HTML page.

Syntax is quite simple: data:[<mediatype>][;base64],<data>

To see how it works let's take the following code (testcase):

<link rel="stylesheet" type="text/css" href="data:text/css;base64,Ym9keXtiYWNrZ3JvdW5kOmdyZWVuO30=">

Browser supporting data URI will base64-decode the encoded string Ym9keXtiYWNrZ3JvdW5kOmdyZWVuO30= to body{background:green;} and then load this string as if it was a result of an http request to an external file containing this CSS code.

According to the RFC we can embed any small resource into our page, e.g:

  1. images (as img elements and CSS backgrounds)
  2. javascript
  3. html (links, iframes)
  4. css (and even dataURIed images inside dataURIed CSS!)
  5. any other resource supported by browsers

So theoretically we could have the same functionality as we have in MHTML — some or all external resources embedded directly in the page.

All data URI advocates say that as most of the browsers have 2 concurrent connections per server (but 6 in total), dataURI mechanism potentially can speed up page load by decreasing the amount of HTTP requests (especially in case of HTTPS where encrypting payload produces quite big overhead). But:

  • HTTP protocol already has methods to help building efficient applications — persistent connections to avoid recreating the sockets, different caching mechanisms to reduce overhead (Conditional GET) or avoid total amount of requests (aggressive caching using Expires header).
  • Even more, using simple technique you can have your browser use 6 concurrent connections to parallelize fetching data as much as it can and therefore fasten page load.
  • Though HTTP 1.1 spec says that we shouldn't have more than 2 concurrent connections per server, in real world we have 2 concurrent connections only in Firefox and IE6/7. In IE8b1 the number is 6, in Opera 9 and Safari it's 4. In the next post I will give more details on this.

So keeping all this in mind we can't just say that dataURI is the only usable way to improve page load times. But it definetely is the only option when you have a limited access to the server and/or the server is not configured properly, so you can't set Expires header for aggressive caching, you can't set DNS wildcards or CNAME records to get your resources served from different hosts (and therefore leverage the maximum available concurrent connection in browsers) or server doesn't support HTTP caching properly.

Practice

So I can see only the following cases where dataURI can be effectively used:

  • CSS sprites, rounded corners images, icons and other images that have only presentational semantics. It's the perfect target for dataURI + base64 to be applied to. If we embed them in the CSS file, we remove HTTP requests that would be queried if these images were normal files. These images are part of the design described in the stylesheet, so it makes perfect sense to embed them in CSS. CSS files can be perfectly cached and while design doesn't change, we don't need to touch this CSS and change anything. But there should be a common sense here as well — firstly, base64 decoding takes system resources and secondly, who wants to wait for a CSS file of couple hundred kilobytes in size to load?
  • Reasonably small CSS files with rules specific for a page. If there is a semantical sense to define an inline CSS on a page, then there's a perfect sense to set it using dataURI. Another thing is that if CSS file is not going to be parsed until it’s fully downloaded by a browser. So when we embed a big image, we’ll have first client opening the page wait till CSS is fully loaded. So we loose our HTTP parallelism benefits here.

Please don't forget that if a resource is embedded on multiple pages, it's obviously going to be redownloaded as many times as these containing pages are. And if a resource is not dataURI'ed but referenced normally as an external file, it can be cached quite aggressively and requested from the server only once (all popular web-servers already provide good caching support for static files).

However, this is all ideal world where specification don't have flaws and all the browsers follow them.

In our world we have the following:

  • Lack of support. Only IE8b1/Opera9/Firefox2/Safari support data URI. No IE6/IE71. That means that for the next three or four years while IE6 and IE7 will still have a significant market share, we can't just go and start using dataURI.
  • Different size limits on URI length in different browsers. As far as I know for now IE8 supports up to 32 kilobytes in data: value. Even though all other browsers support bigger sizes, our limit will obviously be 32Kb.
    See testcase 1 with data URI of 32755 bytes and testcase2 with dataURI of 32868 bytes.

Also I would strongly discourage from dynamically base64-encoding and embedding images in CSS files by some scripting language unless you're well aware of HTTP caching principles.

Let's consider the following composed code from Wikipedia data:URI page:

<?php
function data_url($file, $mime)
{
  $contents = file_get_contents($file);
  $base64   = base64_encode($contents);
  return ('data:' . $mime . ';base64,' . $base64);
}

header('Content-type: text/css');

?>

div.menu {
  background-image:url(<?php echo data_url('menu_background.png','image/png')?>);
}

Unless accompanied with correct HTTP caching algorythm, this CSS file will be downloaded every time the page that references this CSS file is loaded! So every time user accesses the page referencing this CSS file, server will get a request, initiate script parsing, base64-encode the image and send it back to client. So you get rid of one simple request for an image (that in case of being a static file will be perfectly cached) but have one heavy request that will be run every time user requests a page! Not a fair change I think. So again, if you decide to use data URI scheme for your resources, encode and embed them beforehand or implement proper server-side HTTP caching and compressing support.

Note for russian-speaking users: — there's a way to embed images even for IE6/IE7. Though it's rather a proof-of-concept — it doesn't support HTTP caching/compressing, but it works!

Links and resources:

Comments

IE8b1 — attribute selectors, generated content

It's my first post after a great vacation in St.Petersburg — my first 2 weeks vacation in last 4 years. I've continued testing IE8 and found some new interesting stuff.

Attribute selectors

Both [class=myclass] and [className=myclass] work in IE7/IE8. The last one can be used as a CSS hack to target those browsers, but I would still recommend using conditional comments to target different IE versions.

If you look at the testcase, you will see that both [class=test1] and [classname=test2] selectors work. When I saw className working, I immediately tested other DOM properties like nodeName. Unfortunately, it didn't work there — here's the testcase. If it did, if there was such a way to access not HTML attribute but DOM properties from CSS selectors, it would be really weird but interesting.

Generated content

When I was testing it, I noticed that if you want to get element's class, you can't use content: attr(class) rule, you have to use content: attr(className). It's obvious that this is a DOM property name rather than HTML element's attribute.

This violates the standard which clearly says that attr(X) must return an attribute string value for the element matching the selector. It also violates the standard by returning null value for not existing attributes.

This behavior also gives us some strange options. Please see the testcase.

I don't know if it's a bug or a feature — none of the Microsoft documents on IE8 describes this behaviour, so I don't know if this is going to be fixed or not; but it may be used in some interesting ways.
E.g. using outerHTML IE-only DOM property I rebuilt the testcase for the attribute selectors bug mentioned above. If you have IE8, don't wait to have a look. And please have a look at another interesting thing — again it's IE8-only as it uses attr(nodeName) function to show every element's nodeName.

During testing I've noticed some more bugs with generated content:

  • text-transform doesn't work for generated content. Please see testcase
  • text-indent doesn't work for generated content. Here's the testcase
  • text-align doesn't work for generated content. The testcase

Comments

IE8b1 generated content support

I've tested generated content model in IE8b1 quite thoroughly, have found quite weird bugs and here's what I've come up with:

  1. First bug I've noticed was happening when you set position: relative for the generated content rule. The tab where you have this page opened dies. And then due to newly introduced crash recovery system, it tries to recover the tab, loads the page and dies again and so on — an infinite loop that you can't break. But the weird thing is that it doesn't actually die — it shows a window promting to select a debugger. The kind of window that appears when you have errors in your javascript code.

    here's the code sample:

    p:before {content: "test"; position: relative;}

    and the testcase.

  2. I noted the bug and continued testing.

    Next thing I came up with was the fact that if the page doesn't have IMG/OBJECT/IFRAME elements or an image set as a background for an element, generated content is created after window.onload!

    Please have a look on the following testcases:

    1. Document contains None of the elements listed above, and generated content is not being generated till you press OK. It means that generated content is created after window.onload occurred!
    2. Generated content is created before window.onload as it should be in the following cases:
      an element has CSS background-image rule set, or page includes one of the elements: IMG,
      OBJECT or an IFRAME

    At this point I thought — wait — it's strange — all CSS rules were always applied before window.onload! Anyway, I just went on testing.

  3. And then there was another strange thing — when you use content: attr(class), IE8b1 doesn't show the attribute value but shows null instead. But if you set the rule as content: attr(className), it actually shows the attribute value!

    Here's the testcase for this bug.

  4. And another interesting thing is that expression doesn't work in generated content rules.

    Please see the testcase.

Of course I can only guess but my feeling is that IE8b1 doesn't have proper support for the generated content, it's rather done by a hook somewhere firing off the function that generates the content. All these four bugs have something in common — debug window (that's usually shown for javascript errors); generating content after window.onload in some cases; reading class attribute value by its DOM name (className). Basically it's all about javascript.

And I can't help thinking that IE8b1 uses some hidden javascript code to support generated content. And this functionality is triggered by some hidden event like DomContentLoaded.

And if so I would be really happy if they could give us access to this handler :)

Comments

Selectors API support in IE8b1

As I mentioned in the previous post, IE8b1 introduced support for very powerful DOM accessing concept — Selectors API. It is still a W3C working draft, but I bet that as IE and Webkit already support it, Presto and Gecko will soon have it as well.

So what do we have? As per the spec, we have 2 methods: .querySelector() and .querySelectorAll() which can be applied to any HTMLElement and based on he parameter (CSS selectors string) provided return an Element or StaticNodeList populated with elements matching the provided CSS selectors. Bottom line, you give it CSS selector, they return you matching element(s).

It provides you with a new flexible way to select elements in DOM. We can do any weird and wonderful stuff we want with the power of JS combined with the flexibility of CSS selectors:

  1. Get all paragraphs with the .note classname from one div? Not a problem – document.querySelectorAll('#myDiv .note');
  2. Get all elements with some classname? Forget about document.getElementsByClassName slow kludges — use document.querySelectorAll('.myClass');
  3. Get a link with .current classname from your UL-based menu? document.querySelector('#menu .current');

So generally we don't have to iterate over huge StaticNodeLists anymore — it's done natively and very fast (much faster then by JS libraries). Please see the testcase prepared by Webkit authors to measure their Selectors API support — it works in IE8b1 except for CSS3 Selectors block (IE8b1 doesn't support CSS3 :nth- and :last-child selectors).

Bottom line, Selectors is a way to find elements in DOM. All browsers know how do it already when they parse CSS rules and find elements to which these rules have to be applied. So it's just an existing browser functionality exposed to the developer. And we have to keep in mind that if browser supports a CSS selector, it will allow you to query for this element using Selectors API. And obviously if there's no support for some CSS selector, you won't be able to get this element using Selectors.

For example, as IE8b1 doesn't support :last-child CSS3 selector, you can't style such elements in CSS and you can't query them using Selectors.

Notes:

  1. Unfortunately, IE8b1 doesn't fully implement the Selectors API spec. Here's the MSDN article quotation:

    Because Internet Explorer 8 does not formally support XHTML documents, it does not support the namespace features of the W3C Selectors API specification, such as the NSResolver parameter.

    But for websites where namespaces are not used it's not gonna be of any problem.

  2. Another interesting issue that Selectors API spec raises is a potential history theft.

    Basically you can get all visited links hrefs and send them by AJAX somewhere (just a matter of getting a StaticNodeList of elements by doing document.querySelectorAll("a:visited")).

    Spec leaves it for the vendor to fix. So IE8b1 ignores the :visited and :link selectors when they appear in the selector query criteria.

Please see the Testcase

Comments

IEb81 initial tests

Some IE8b1 test results:

  1. As earlier, alert([1,2,3,].length) shows 4 and 4th element has undefined value.
  2. Unfortunately no support for so wanted :last-child CSS3 Selector and buggy support for dynamically added elements that match :first-child and should therefore enforce layout to be recalculated. See PPK's testcase on quirksmode.org.
  3. We can't set padding on html element for some reason — see testcase.

But I really enjoy Selectors API implemented in IE8b1. It was the second browser to support this right after Webkit. I will describe the support and prepare some testcases in the next post.

Have a good weekend, people!

Comments

Welcome to my new blog

Hi all!

This is my new blog. Previous entries can be found here. Some of the old articles will eventually be updated, translated to English and published here.

About me:

I'm Russian web-developer, I live in Volzhsky town which is just 20 miles away from Volgograd (formerly known as Stalingrad).

My subjects of interest are CSS/JS/HTML/DOM/ASP/ASP.NET.

So stay connected and I promise to deliver some interesting stuff ;)

Comments