Sharovatov’s Weblog

HTTP persistent connections, pipelining and chunked encoding

Posted in http by sharovatov on 5 November 2009

When I have free time, I like to reorganise the knowledge I’ve got and prepare mindmaps/cheatsheets/manuals of interesting stuff. And the formal approach I usually use forces me to organise data in a way so that it won’t take me long to grasp the idea if I forget something.

And I also like posting resulting resources to blog — that’s a good English techwriting skills practice plus some publicity for the knowledge ;)

So this post is another one from the HTTP series and describes HTTP/1.1 persistent connections and chunked encoding.

HTTP/1.0 said that for every request to a server you have to open a TCP/IP connection, write a request to the socket and get the data back.

But pages on the internet became more complex and authors started including more and more resources on their pages (images, scripts, stylesheets, objects — everything that browsers had to download from the server). And for every resource request clients were opening separate connections, and it was taking time and CPU/memory resources to open a new connection, so from users prospective, resulting latency was becoming worse. Something could be done to improve the situation.

So HTTP IETF decided to implement a nice technique called “persistent connections”.

Persistent connections reduce network latency and CPU/memory usage of all the peers by allowing reuse of the already established TCP/IP connection for multiple requests.

As I mentioned, HTTP/1.0 client was closing the connection after each request. HTTP/1.1 introduced using one TCP/IP connection for multiple sequential requests, and both server and client can indicate that the connection has to be closed upon the completion of current request-response by specifying Connection: Close header.

Usually HTTP/1.1 client sends Connection: Close header with the last request in the queue to indicate that it won’t need anything else from the server, so that the TCP/IP connection can be safely closed after the request has been served with response. (Say, it wanted to download 10 images for the HTML page, it sends Connection: Close with the 10th image request and the server sends the last image and closes the connection after it’s done).

Persistent connections are the default for HTTP/1.1 clients and servers.

And even more interestingly, HTTP/1.1 introduced pipelining support – a concept where client can send multiple requests without waiting for each response to be sent back, and then server will have to send responses in the same order the requests came in.

Note: pipelining is not supported in IE/Safari/Chrome, disabled by default in Firefox, leaving Opera the only browser to support and have it enabled. I will cover this topic in one of the next posts.

In any case, if the connection was dropped, client will initiate new TCP/IP connection and those requests that didn’t get a response back will be resubmitted through the new connection.

But as one connection is used to send multiple requests and receive responses, how does the client know when it has to finish reading the first request?

Obviously, Content-Length header must be set for each response.

But what happens when the data is dynamic or the whole response’s content length can’t be determined by the time transmission starts?

In HTTP/1.0 everything’s easy — Content-Length header can just be left out, so the transmission starts, client starts reading the data it’s getting from the connection, then when the server finishes sending the data, it just closes the TCP/IP connection, so client can’t read from the socket any more and considers the transmission completed.

However, as I’ve said, in HTTP/1.1 each transaction has to have correct Content-Length header because client needs to know when each transmission is completed, so that the client can either start waiting for the next response (if requests were pipelined), or stop reading current response from the socket and send new request through this TCP/IP connection (if requests are sent in a normal sequential mode), or close the connection it if it was the last response he was to receive.

So as the connection is reused for multiple resources’ content transmission, the client needs to know exactly when each resource download is completed, i.e. it needs the exact number of bytes it has to read from the connection socket.

And it’s obvious that if Content-Length can not be determined before the transmission starts, the whole persistent connections concept is useless.

That is why HTTP/1.1 introduced chunked encoding concept.

The concept is quite simple — if exact Content-Length for the resource is unknown at the time when transmission starts, server may send resource content piece by piece (so-called chunks) and provide Content-Length for each chunk, plus sends an empty chunk with zero Content-Length at the end of the whole response to notify client that this response transmission is complete.

To let HTTP/1.1 conforming clients know that chunked response is coming, server sends special header — Transfer-Encoding: chunked.

Chunked encoding approach allows client to safely read the data — it knows the exact number of bytes that are to be read for each chunk and knows that if an empty chunk arrived, this resource transmission is completed.

It’s a little bit more complex than HTTP/1.0 scenario where server just closes the connection as soon as it’s finished, but truly worth it — persistent connections save server resources and reduce whole network latency, therefore improving overall user experience.

Links and resources:


Share :

HTTPBis group is awesome!

Posted in Firefox, http, IE8, web-development by sharovatov on 21 October 2009

I’m finally back to blog. Finally started finding time between doing stuff at home, working at my great place of work and studying English :)

As you know, HTTP/1.1 spec said that conforming clients SHOULD NOT open more than 2 concurrent connections to one host. This was defined back in 1997 and at that time it seemed reasonable to have 2 simultaneous connections for a client, and noting that HTTP/1.1 introduced persistent connections concept, people thought that 2 simultaneously opened reusable TCP/IP connections would be enough for general use.

However, everything changes. Broadband internet came to mass market and people started thinking that better parallel download could benefit the whole website or a webapp perfomance. The history started with IE5.01, which was opening two connections by default, but there was a way to configure the number. So if you had a really good internet connection, you could make websites load significantly faster.

By the time IE8 development started, broadband connections became a standard for home internet, so IE8 started opening 6 connections (if the bandwidth allowed – on the dialup or behind a proxy it will still open 2). So IE8 engineers did a smart move and introduced the world with a browser that seemed to load sites faster.

Needless to say, Firefox 3 decided to change the value as well, so now Firefox 3 has 6 as a default value for network.http.max-persistent-connections-per-server configuration setting. Good for Mozilla for copying stuff from IE again!

And now HTTPBis team (Julian Reschke) commits the change which states that in the forthcoming HTTP standard the maximum amount of concurrent requests is not limited even with “SHOULD NOT” clause :)

Thanks HTTPBis team!

IE8 Rendering Modes theory and practice

Posted in browsers, css, http, IE7, IE8, web-development by sharovatov on 18 May 2009

Many web-developers moan about the Rendering Modes switch that’s been introduced in IE8. I’ve got some thoughts on this.

Rendering modes switches history

As you may know, IE5 on Mac (followed by IE6 on Windows) introduced Quirks and Standards compliancy modes with a DOCTYPE switcher. All other browsers followed the move and introduced similar approach – they tried to mimic IE behaviour in quirks mode and tried supporting CSS as close to the W3C standard as they could.

DOCTYPE switch approach was really straight forward – if you put a valid DOCTYPE for your HTML page, browser switches to the standards compliancy mode; if DOCTYPE is omitted, quirks mode is used.

This switch was required because many sites were built at the time when even CSS1 was a draft, and for browsers to evolve safely (and add newer standards’ support) without breaking the existing web these old websites were to be rendered properly.

At the time the idea of using DOCTYPE as a rendering mode switcher was fine – almost all pages that had been built without standards support in mind didn’t have DOCTYPE. And if developers wanted to use modern standards, they added HTML or XHTML DOCTYPE and the page was rendered using standard-compliant parser.

Quirks mode and DOCTYPE switch locked old web with old rendering engine, IE5 engine.

I’m not wrong – after IE4 won first browsers war (BW I) nearly all sites were built for IE4 and IE5 only, so all other browsers had nothing to do but copy IE5 behaviour.

It’s important to note that DOCTYPE can only toggle standards compliancy on and off.

Therefore DOCTYPE switch only solves backwards compatibility issue and does nothing to prevent future compatibility issues.

When developers started using DOCTYPE switch, IE6 was a major browser and all the sites were build around its CSS support level. So when IE7 was shipped, it had better support for CSS2.1 but had the same DOCTYPE rendering modes switch.

That’s why many of these websites had rendering issues and had to fix CSS/JS for IE7 specifically. Conditional comments and conditional compilation came to rescue. However, with XP SP3 update IE6 started showing @jscript_version equal to that of IE7 and conditional compilation became pretty useless.

In IE7 old web-sites in quirks mode continued being rendered by IE5 rendering engine, and standards compliancy engine was just an updated version of IE6 standards compliancy rendering engine with some bugs fixed. And while IE7 was a minor update to IE6 but still caused so much trouble, one could imagine what IE8 with proper CSS2.1 support would do to websites. Basically, all the websites with DOCTYPES that were just fixed for IE7 Standards Compliancy mode would break in IE8 again.

Other browsers didn’t have a similar problem, and there’re many reasons to this. First of all, by the time when first IE7 beta was shipped, other browsers had really insignificant market share (so their developers could change anything and wouldn’t be blamed for “breaking the web”), were updated really often and were mostly installed by IT-related people who knew how to update the browser. So DOCTYPE rendering switch was and still is fine for non-IE browsers.

And Microsoft had to solve interesting problem in IE8 – support CSS2.1, support IE7 standards compliancy mode and support IE5 quirks mode. But DOCTYPE can only toggle standards compliancy on or off! They couldn’t drop support for very old but still existing sites build in quirks mode, couldn’t drop IE6 and IE7 support, and couldn’t leave IE8 without proper CSS2.1 support.

So we see that if any browser is locked at some version and this version gains noticeable market share, DOCTYPE switching will not work as vendors will have to choose what to support – older sites or newer features.

And Microsoft found a way – they gave web-developers and users a way to control in which mode sites are rendered.

IE8 Rendering Modes Theory

In IE8 there’re 6 rendering modes:

  • “Edge” Standard compliancy mode
  • IE8 Standards mode
  • “Emulate IE8” mode
  • IE7 Standards mode
  • “Emulate IE7” mode
  • IE5 – Quirks mode

Edge Standard compliancy mode basically tells the browser that it should use the most recent rendering mode – in IE8 it’s “IE8 Standards mode”, in IE9 it will hopefully be “IE9 Standards mode”.

IE8 Standards mode makes browser render a page using IE8 standards compliancy parser and renderer.

Emulate IE8 – DOCTYPE switch will be used to determine whether to use IE8 Standards compliancy mode or IE5 Mode.

IE7 Standards mode

makes browser render a page using IE7 Standards compliancy parser and renderer.

Emulate IE7 – DOCTYPE switch will be used to determine whether to use IE7 Standards compliancy mode or IE5 Mode. If you have a site that’s been built with IE7 Standards compliant mode support and you don’t want to change anything there, just add this META tag or HTTP header and IE8 (and all future versions of IE) will emulate IE7 behaviour, i.e. use IE7 Standards Compliant renderer when DOCTYPE is present and use IE5 mode when there’s no DOCTYPE. So you virtually “lock” your website to a current browser version behaviour. “Emulate IE7 mode” even sets User Agent string to IE7 string and report Version Vector as IE7 (so conditional comments will be evaluated as for IE7).

IE5 Mode – basically, it’s Quirks Mode. If we compare this to DOCTYPE switch approach, Edge mode is similar to a mode when DOCTYPE is set, and IE5 mode – to a mode when DOCTYPE is not present.

To explain modes in more details, I need to show how they are applied.

IE8 Rendering Modes Practice

As I’ve shown above, the whole idea of using DOCTYPE as a rendering mode switch failed because there were more than two types of documents on the web – those that should work in quirks mode, those that should work in IE7 standards mode, those that should work in IE8 standards compliancy mode and those that should work even with future versions of IE. So instead of two types we have three and a half :)

To give a proper way to control this, Microsoft and WaSP invented X-UA-Compatible http header, so you can either configure your server to send this header with corresponding value or add it as META HTTP-EQUIV to your page. For example:

<meta http-equiv="X-UA-Compatible" content="IE=8">

This declaration will tell IE8 to toggle IE8 Standards compliancy mode to render the page.

Here’s a list of modes:

Rendering mode X-UA-Compatible value
Edge Standard compliancy IE=edge
IE8 Standards compliancy IE=8
IE8 Emulation IE=EmulateIE8
IE7 Standards compliancy IE=7
IE7 Emulation IE=EmulateIE7
IE5 Quirks IE=5

 

IE8 Rendering Modes Switches Priority

First and main rule – if you specify X-UA-Compatible header or meta tag, IE8 will use the rendering engine you set.

If it’s EmulateIE7, DOCTYPE switch will be used to determine whether to use IE7 Standards mode or IE5 Quirks mode. See test case with DTD and without DTD.

If it’s EmulateIE8, DOCTYPE switch will be used to determine if it should use IE8 Standards mode or IE5 Quirks mode. See test case with DTD and without DTD.

If it’s IE=8, IE8 Standards Compliancy mode will be used regardless of presence or absence of DOCTYPE. See test case with DTD and without DTD.

If it’s IE=7, IE7 Standards Compliancy mode will be used regardless of presence or absence of DOCTYPE. See test case with DTD and without DTD.

If it’s IE=5, IE5 Quirks mode is used even if there is a DOCTYPE (actually it didn’t matter for IE5 if there was DOCTYPE or not – quirks mode was the only mode it had). See test case.

If it’s Edge, most modern rendering engine is used. For example, IE8 uses IE=8 mode regardless if there’s a DOCTYPE or not and IE9 will use IE=9. See test case with DTD and without DTD.

It’s worthy to note that X-UA-Compatible META overwrites X-UA-Compatible HTTP header, so you can set X-UA-Compatible HTTP header for the whole website and then have X-UA-Compatible META with a different value on some pages.

If website doesn’t have X-UA-Compatible META tag or HTTP header, IE8 uses EmulateIE8 mode i.e. DOCTYPE is used to switch between IE8 standards compliancy mode and IE5 Quirks mode.

This means that everybody needs to add to add X-UA-Compatible header or META to their websites.

Accepting the fact that web-developers wouldn’t update all the websites that were built around IE6/IE7 standards compliancy mode, Microsoft decision to take standards compliancy mode as a default would certainly “break the web”.

So Microsoft had to find a way around this which they successfully did by introducing a bunch of Compatibility View settings.

Compatibility View

As I’ve already said, when developer didn’t care about adding X-UA-Compatible header or META, IE8 works in EmulateIE8 mode and uses DOCTYPE switch to determine which rendering mode to use. If the site hasn’t got a DOCTYPE set, IE5 Quirks mode will be used, so backwards compatibility is not an issue. But if there’s a DOCTYPE, which Standards compliant mode to use? IE7? IE8? IE9 Standards Compliancy mode when IE9 it’s shipped? There’s no automatic way to choose.

And as IE8 Standards mode is a default one, there might be issues on sites that only supported IE6/IE7 Standard modes. Apparently, only user can see if the site he’s viewing has rendering issues. And Microsoft gave users an easy way to switch back to IE7 Standards Compliancy mode if required by clicking on a Compatibility View button (located to the left of refresh button).

image

IE8 will switch to EmulateIE7 mode and save the setting for the domain so all the pages from this domain will be rendered using IE7 engine.

Compatibility View button is shown only when X-UA-Compatible HTTP header or META is not set.

Also during IE8 install a user can specify if he wants to use Compatibility View List feature. Basically, when this feature is turned and user clicks Compatibility View button while browsing, IE8 reports to Microsoft that the website on this domain required user to press the Compatibility View button.

If statistics show that users tend to press Compatibility View button for the site, it’s included in iecompatdata.xml file which fresh copy is distributed with Windows Update. It can be viewed by typing res://iecompat.dll/iecompatdata.xml into the address bar.

So when Compatibility View List feature is enabled and there’s no X-UA-Compatible HTTP header or META, and DOCTYPE is present, IE8 will look if the site is in the res://iecompat.dll/iecompatdata.xml file and make Compatibility View button pressed by default.

As soon as the site gets X-UA-Compatible header or META tag, IE8 will use the rendering mode from its value and won’t even check Compatibility View List and also Compatibility View button won’t show at all.

Local Intranet security zone

Intranet differs from Internet a lot.

If Compatibility View List feature will work fine for public internet websites, Intranet sites should not be reported anywhere. So what rendering engine to use when there’s no X-UA-Compatible value and DOCTYPE is present? IE7 Standards Compliancy? IE8 Standards Compliancy?

Intranet a controlled environment where to decrease support costs IT pros usually set everybody with same set of software. As most successful companies use Microsoft technologies as a core of their IT infrastructure and because only IE has been providing a decent platform for building powerful intranet web applications, IE was always a de-facto standard browser in companies.

Many intranet web applications were targeting IE5.5/6.0 and nobody cared to refactor to get standards compliancy – why bother if there was always only one browser supporting a well-known list of technologies so cross-browser interoperability was never an issue? We can argue if this approach is good or not – but IE8 team had thousands of intranet applications they had to support. That’s why compatibility view was the only option for sites from Local Intranet security zone.

If the site is in the Local Intranet zone and there’s no X-UA-Compatible header or meta tag, DOCTYPE switch is used to determine which rendering engine to take – IE7 Standards compliancy or IE5 quirks mode.

If there’s no X-UA-Compatible value and site is in Local Intranet security zone, it will be rendered in EmulateIE7 mode by default.

Compatibility View button is not displayed at all.

But as soon as a developer adds X-UA-Compatible header or META, site will start working in the specified mode.

Developer Tools

With great set of developer tools that Microsoft provided for IE8 you can also view what rendering is being used and change it if required. This is useful for developers only and overrides any other settings. It’s very useful for testing – you don’t need IE7 in a virtual machine to test how a page looks in IE7 – just press F12 to get the developer tools and play with Browser modes or Document modes :)

Hope I managed to cover all the modes. In future posts I’ll write more on this topic.

P.S. As SelenIT pointed in comments, it’s very important that X-UA-Compatible meta tag must be put before any other elements in the head section, otherwise it might not work!


Share:

Apple submitted HTTP Live Streaming spec to IETF

Posted in http by sharovatov on 2 May 2009

As I’ve blogged recently, nearly a year ago Microsoft proposed an approach for adaptive video streaming over HTTP – Smooth Streaming. As Microsoft didn’t apply for a patent for this technology, I was hoping to see the same beautiful approach implemented in modules for other web-servers, or even as web-applications – as it’s really easy to implement.

The mistake Microsoft did was that they didn’t submit this technology standard to IETF to make it RFC – and that’s what Apple’s doing at the moment.

Yes, I’m not mistaken – Apple copied the whole idea, called it HTTP Live Streaming and submitted to IETF.

Yes, there’re differences, but they are absolutely insignificant:

  • Apple spec suggests extending M3U format for a playlist – Microsoft uses SMIL-compliant ISMC client-manifest file (i.e. playlist)
  • Apple spec defines that the server creates the playlist – in Microsoft approach the encoder creates the playlist
  • Apple spec defines encryption for media files – Microsoft doesn’t

And the whole specification that’s been proposed is weird – I think they just wanted to submit it as soon as possible before Microsoft Smooth Streaming approach gets popularity and becomes de-facto standard.

Here’s what jumped at me when I was reading the spec:

  1. section 6.2.3. Reloading the Playlist file – why specify the expiration time of the playlist separately when HTTP 1.1 already has flexible methods for
    setting expiration time of the resource?
  2. encryption – what’s the purpose of encrypting media files when there’s HTTPS? And if there’s a purpose – HTTP already provides a place where encryption could be "plugged in" – Transfer-Encoding, why didn’t Apple just
    register another transfer-coding in IANA?
  3. EXT-X-ALLOW-CACHE – why add this if HTTP already gives flexible tools to control caching?

So as I see it – Apple was just a little bit in a hurry to propose this “standard” – looks like they took Microsoft idea, added some proprietary bits and bobs without thinking them through, didn’t use what HTTP natively provides but bravely called the draft “HTTP Live Streaming”.

Awesome.


Share :

HTTP Chunked Encoding

Posted in browsers, http by sharovatov on 30 April 2009

Did you notice that some pages on the internet start rendering in your browser incrementally, block after block, and on some sites you have to sit and look at the white screen and then get the full page in one second?

There’re two main problems that can make your browser wait for the whole page to load before it starts parsing it:

  1. if you’re using IE6 and the page you’re viewing has table-based layout without COLGROUPs/COLs specifying width or table-layout: fixed CSS rule.
  2. if the page’s not being served from server using chunked encoding.

The first issue is really simple – IE6 has to know the exact width of the columns before it starts displaying the table, and if you have table-layout: fixed rule for the table or COLs with specified width – it will wait for the whole content to load, calculate the width and only then display the table. Other browsers (such as Opera, Firefox, Google Chrome) and newer versions of IE don’t have this issue and start displaying content right after they get at least a piece of it.

So while the first issue is really simple, the second is definitely more interesting!

Normally when HTTP client receives a response, it parses HTTP headers and then tries to read from the input the exact amount of bytes as specified in Content-Length header. So if it takes 3 seconds for the server-side script to prepare the page, HTTP client (and the user!) will be just waiting with opened connection for these seconds.

What was OK at the time when HTTP1.0 was invented and the web had almost only static content, authors of HTTP1.1 thought was inacceptable for era of web applications.

And HTTP 1.1 Spec introduces a concept of “Chunked” transfer encoding:

The chunked encoding modifies the body of a message in order to transfer it as a series of chunks, each with its own size indicator, followed by an OPTIONAL trailer containing entity-header fields. This allows dynamically produced content to be transferred along with the information necessary for the recipient to verify that it has received the full message

The main goal of HTTP Chunked Encoding is to allow clients to parse and display data immediately after the first chunk is read!

Here’s a sample of HTTP response with chunked encoding:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html

c
<h1>go!</h1>

1b
<h1>first chunk loaded</h1>

2a
<h1>second chunk loaded and displayed</h1>

29
<h1>third chunk loaded and displayed</h1>

0
 

As you may see, there’s no Content-Length field in the beginning of the message, but there’s a hexadecimal chunk-size before every chunk. And 0 with two CRLFs specifies the end of the payload.

So the server doesn’t need to calculate Content-Length before it starts serving data to client. This is an amazing functionality! It means that the server can start sending the first part of the response while still processing the other parts of it.

Say, you have a dynamic page with two elements, both of which are queried from the database. So you can either wait for both queries to finish, populate your template with results and send the whole page to client, or you can get first query result, send it in one chunk to the client, then do the next query and send its results in another chunk. You may not notice the difference between chunked and normal serving mode in most of the cases – but if the page is created from different sources or it takes significant time to prepare the data – user experience may be seriously improved.

Before the widespread popularization of AJAX (another Microsoft-invented technology) – Chunked Encoding was used as a core for so-called “Server Push” approach for building web-chats and other streaming purposes. The idea was simple – server didn’t close the HTTP connection and kept on sending chunk after chunk with new messages or other data. This approach had serious drawbacks – e.g. for each new client server had to instantiate a new connection (which eats resources), browsers had a limit on waiting time, so the page had to be reloaded once in a while and so on. But anyway, Chunked Encoding was widely used.

In my company we use Chunked Encoding to show loading progressbar in our online train tickets ordering system – we serve the first chunk with nicely styled <div id=”loading”></div> and when the data for the main table is ready, we serve it in the second chunk. And after the document is fully loaded, javascript routine hides <div id=”loading”></div> :) Simple and nice.

Share :

Tagged with:

Silverlight Smooth Streaming and Akamai

Posted in http by sharovatov on 29 April 2009

Just noticed that the smoothhd.com which I mentioned in my previous post serves media from AKAMAI CDN servers! Also found that AKAMAI has a contract with Microsoft to deliver Smooth Streaming content – and that’s just great. It means that if you have webcasts or any other video you want to deliver to the maximum audience with any bandwidth and CPUs, Akamai and Silverlight Smooth Streaming would be an ideal solution – you won’t even need to host video files on your servers! Or you can start with streaming from your own server and later if required you can always seamlessly switch to Akamai.

And here’re some nice videos from MIX09 (about silverlight) that I’ve found today:

As soon as I get my IIS7, I’ll definitely try streaming something :)


Share :

Silverlight smooth streaming and HTTP

Posted in http by sharovatov on 28 April 2009

I’ve read about smooth streaming technology, and I must say, I just love the way it works. It automatically and smoothly adjusts video quality and allows clients to view smooth video online regardless of their bandwidth and without a need to sit and wait staring at “buffering” message for ages – what it does it dynamically changes the video quality based on network bandwidth, CPU load and other factors.

It’s idea and implementation are so simple and beautiful that I wonder why nobody didn’t invent it earlier. This is what steps you have to follow to make it work:

  1. encode your video file in Expression Encoder 2 Service Pack 1
  2. upload it to IIS7 web server with IIS7 Smooth Streaming extension
  3. point your Silverlight player to the appropriate URL

That’s it. Here’s a showcase how this technology works – awesome!

Looks simple, right? It is simple, but there’s a huge amount of work hidden beside this simplicity. Let me dive into technical details a little bit :)

First of all, let me give you some background. Originally there were basically two types of streaming – stateful streaming and progressive download.

Good example of stateful streaming is RTSP. Client had to initiate connection to the server and send commands like PAUSE or PLAY, and server sent back video stream packets, client waited for its playback buffer to be filled with data and started playback. RTSP worked both over UDP and TCP (port 554 was used).

Progressive download – where client was sending traditional HTTP GET request and server responded with video data sent with use of HTTP chunked encoding, and client started the playback as soon as its playback buffer had enough data to play.

Both approaches had serious issues – RTSP couldn’t work for clients behind proxies or firewalls without extra efforts (that’s why Apple had to spend time inventing the wheel tunnelling RTSP over HTTP), progressive download couldn’t work fine for situation where bandwidth wasn’t good enough – have you had wonderful time sitting and staring at “Buffering” message?

So if you want to give highest video quality to users on a high bandwidth but still want to show users with low bandwidth at least something – you’ll create several versions of the same video and give users a way to choose and watch what they want.

But what if a user doesn’t know what bandwidth he’s got? What if the player itself could automatically select what video to download – high-res or low-res? What if the player could change bitrate during the playback if network conditions or CPU load changed? What if the player could instantly start playback from any point of the movie? And what if pure HTTP was used so that there would be no issues with proxies? What if each chunk of video could be perfectly cached by HTTP agent, such as proxy?

That’s precisely how Microsoft Silverlight Smooth Streaming works.

First of all, Microsoft decided to switch from their ASF format to MP4. There were many reasons for that, but the main point is that MP4 container specification allows content to be internally organised as a series of fragments, so-called “boxes”. Each box contains data and metadata, so that if metadata is written before the data, player can have required information about the video before it plays it.

So what does Expression Encoder do? It allows you to easily create multiple versions of the same video for different bitrates in this fragmented MP4 format. So you get up to 10 versions of the same video file with different resolution – from 320×200 up to 1080 or 720p. Each file internally is split in 2-seconds chunks, each chunk has its own metadata so you can programmatically identify the required chunk. Plus Expression Encoder creates two complimentary files (both follow SMIL XML standard) – *.ISM – server manifest file, which basically just describes to server which file versions have what bitrates; and *.ISMC, which tells a client what bitrates can be used and how many fragments files have.

Can you see the idea? IIS Smooth Streaming extension just maps URL to a chunk in a file. You do a HTTP GET request to a URL like this:

http://test.ru/mov.ism/QualityLevels(400000)/Fragments(video=61024)

And IIS Smooth Streaming extension checks “mov.ism” manifest file to find filename of the file with requested quality level (400000), opens and parses this file to get the chunk with requested time offset (61024). Then this chunk is returned to you in a normal HTTP response.

So you can query for any chunk of any one of your video files with the requested time offset.

Let me repeat it – you encoded your original video file into 10 fragmented video files with different bitrate. And you have a way to query for any chunk in any of these files.

So to play 10 seconds of video you have to do 5 consequent HTTP requests. As we have versions of the same video with different bitrate, we can get first chunk in the worst quality to see how it renders and what time it takes to download it, and then if CPU load is low and network is fast, we can query next 4 chunks with higher bitrate.

And that’s exactly what Silverlight Media Player component does – it requests chunk by chunk from the server and changes “QualityLevels” parameter in URL if conditions change. For example, if Silverlight Media Player sees that CPU load is too high and it’s dropping frames, or network becomes too slow, it changes “QualityLevels” parameter to a lower value so IIS Smooth Streaming extension serves next chunks from the smaller file with lower video quality.

Actually, when user starts the playback, first thing that Silverlight Media Player does is a request for ISMC file to find out how many different bitrate versions server has (and how to identify frames). And only then it composes URL to get the first chunk of video. Simple and beautiful technology.

So what do we have? Video plays smoothly – on old slow internet channels in lower quality and in full HD on fast internet and good CPUs. As HTTP is used as a transport – therefore no issues with proxies or firewalls; as each chunk is identifiable by a unique URL, every single chunk can be perfectly cached by proxies or other HTTP agents.

And as this technology is quite simple, there’s no doubt that there will be a similar module for other web servers, or even web applications achieving similar functionality!

Yes, as it’s encoded in multiple bitrate versions, it takes up to 6 times more space for one movie/clip, but if that’s what it takes to provide users with smooth playback in any conditions – I’m for it!

Thanks for another great technology, IIS Team!

Links:


Share :

McAfee scanalert img src

Posted in http by sharovatov on 23 April 2009

Recently I’ve blogged about URL Common Internet Scheme Syntax, and today when I was adding ADDTHIS bookmark button to our corporate website, I noticed that McAfee Secure uses this scheme when specifying source URL for the McAfee Secure image:

<img width="94" height="54" border="0" src="//images.scanalert.com/meter/{companyname}/13.gif" alt="McAfee Secure sites help keep you safe from identity theft, credit card fraud, spyware, spam, viruses and online scams" oncontextmenu="alert(‘Copying Prohibited by Law – McAfee Secure is a Trademark of McAfee, Inc.’); return false;">

Here we can see the best usecase for the URL notation I’ve been talking about – it doesn’t matter if client visits a page with this code in HTTPS or HTTP mode – the image will always be there. No overhead for cases when clients are in HTTP mode, no mixed content security warnings when the page’s accessed in HTTPS.

Well done, McAfee!


Share :

Tagged with: , , ,

Common Internet Scheme Syntax – detailed post

Posted in browsers, http by sharovatov on 17 April 2009

As a follow-up to my previous post about Common Internet Scheme Syntax I’d like to describe in detail why and when abovementioned approach is extremely useful.

Problem

If you serve CSS/JS or images from a domain that’s different to the domain of your page, and the page must be accessed from both HTTP and HTTPS, you must’ve already been thinking about this – what protocol scheme to set for these links? HTTP or HTTPS?

If you set your links’ URLs with HTTP scheme, and the page is accessed over HTTPS, all the resources are suddenly in a non-secure zone. Browsers behave differently, but they warn user in some way that the page contains non-secure content. Here’s the testcase. As you may see, testcase link points to HTTPS resource on allrussiantrains.com domain (one of my . This testcase has IMG, LINK type="text/css", SCRIPT and A elements pointing to a HTTP locations on sharovatov.ru domain.

So if we have HTTP urls on the page that’s served through SSL, we face the problem of “mixed content security warnings”.

IE7 shows a Security Information warning asking user if he wants to display non-secure content:

non-secure content security warning in IE

If user presses Yes, all the elements are loaded.

If user presses No, all the elements are not loaded at all.

Firefox 3.0.8 silently loads HTTP-referenced content, but shows a small icon in the right-hand corner:

security warning in FF

Firefox also changes the address bar as if the connection isn’t secured by SSL, indicating user that the browser is displaying mixed content:

Compare it to normal address bar interface when secure page is shown:

Opera 9.62 silently loads HTTP-referenced content, but shows a question mark icon in the address bar:

Compare it to the normal address bar when secure page is shown:

Google Chrome does a similar thing – displays non-secure content, but shows an icon in the address bar:

Compare to the normal secure address bar:

So all the browsers in a very obvious way alert user that the page has mixed content, and IE even fires an alert. This is clearly not suitable for public websites.

Popular solutions

Usually people solve this problem by setting all the links to be HTTPS. So whichever way the page is accessed – either by HTTP or HTTPS, all the content is served through HTTPS channel.

This is generally OK, but still couple of issues bother me:

  • HTTPS loads server’s CPU as all the traffic must be encrypted
  • HTTPS content is not cached by default

So though setting all links to HTTPS won’t cause clients any problems, it will increase server load.

Another way around is to change scheme in the URLs dynamically by a server-side language based on current scheme of the requested page. But what if you have a static html file? Then you have to edit links’ URLs in javascript. Well, in any way, changing links schemes is right a kerfuffle! :)

And if you @import some CSS files or serve background images from a different domain, you’ll have to dynamically parse CSS in order to change URL scheme in all @import rules and background-image url’s. Which isn’t always a bad thing, but as your CSS file will be dumped into response stream by your favourite scripting language, default HTTP conditional GET caching mode will stop working (while it’s supported and working perfectly fine for static files in all web servers). So you will have to either reinvent the wheel and support caching in your CSS-parsing script, or live with the fact that your CSS is going to be fetched every time your page’s loaded.

Proposed solution

We’ve got a better option!

RFC 1738 Common Internet Scheme Syntax section states the following:

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:
  //<user>:<password>@<host>:<port>/<url-path>

And RFC 3986 follows:

A relative reference that begins with two slash characters is termed a network-path reference; such references are rarely used. A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.

So you don’t need to specify HTTP or HTTPS scheme, you just put two slashes and browser adds the current scheme automatically!

I tested this URL notation in the following browsers – IE3, IE4, IE5.0, IE501, IE5.5, IE6, IE7, IE8, FF2, FF3.0.8, Opera 8.5, Opera 9, Opera 10, Google Chrome (current version) – and it works fine in all of them!

You can test it yourself – here’s the testcase. As you may see, the URL is set without a scheme and your browser silently adds the current scheme of the loaded page, be it HTTP or HTTPS! If you change HTTP to HTTPS in your address, you’ll see that the scheme in dummy.html page URL will change to HTTPS!

Conclusion

So, if you use “General Internet Syntax Scheme” URL syntax, you’ll achieve the following:

  1. you won’t have to care about URL schemes in CSS @import and background-image rules – you just put

    @import "//externalsite.com/stylesheet.css"

    or

    #someElement { background-image: url(//externalsite.com/someBG.jpg); }

    and your browser automatically puts current URL scheme and fetches the corresponding resource.

  2. you won’t have to parse CSS to change the URL scheme, and therefore you won’t break default HTTP Conditional GET caching mode, so your CSS will be perfectly cached as they remain static files
  3. you won’t have to mess with URL scheme in your JS-files, so you won’t have to parse your js-files and change URLs in them
  4. you won’t need to rewrite IMG URLs on all your pages!

Plus Google said their robot would happily parse, index and follow such links (of course, with HTTP scheme). I also asked MSN Live Search Team about that – hope they reply soon – I’ll update the post.

So – use this approach if you have a page which is accessed by both HTTP and HTTPS and whenever you need to reference any resource from a different host on this page. Plus this host must support both HTTP and HTTPS :)


Share:

Tagged with: , ,

Common Internet Scheme Syntax

Posted in browsers, http by sharovatov on 17 April 2009

Recently read an extremely interesting post on bolknote.ru about “Common Internet Scheme Syntax”.

You may have already faced quite a common problem of setting absolute URIs to a resource on a page that must be accessed by both HTTPS and HTTP schemes.

RFC 1738 Common Internet Scheme Syntax section states the following:

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:
  //<user>:<password>@<host>:<port>/<url-path>

So you don’t actually have to specify HTTP or HTTPS scheme, you just put two slashes and browser adds the current scheme automatically!

I tested this URL notation in the following browsers – IE3, IE4, IE5.0, IE501, IE5.5, IE6, IE7, IE8, FF2, FF3.0.8, Opera 8.5, Opera 9, Opera 10, Google Chrome (current version) – and it works fine in all of them!

You can test it yourself – here’s the testcase. As you may see, the URL is set without a scheme and your browser silently adds the current scheme! If you change http to https in your address, you’ll see that the scheme in dummy.html page URL will change to https!

It’s interesting to note that RFC 3986 (URI Generic Syntax) says that scheme part is required:

Each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme’s specification may further restrict the syntax and semantics of identifiers using that scheme.

, but mentions Common Internet Syntax notation in the Relative Reference section:

A relative reference that begins with two slash characters is termed a network-path reference; such references are rarely used. A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.

However, I don’t think that any browser vendor will stop support for this functionality as it’s quite useful and there’s no problem in supporting it.

Bolk, thanks for sharing this!

UPDATE: Google and Nigma.ru said their robots would follow and index such a link.


Share:

Tagged with: , , ,
Follow

Get every new post delivered to your Inbox.