Tuesday, July 26, 2011

The speed of light is too slow


Here's an article I wrote in 2006, referring to another article I wrote in 1998, I thought it was worth posting as the issues haven't gone away. 

The speed of light is too slow, again!


In 1998, I wrote an article stating that the speed of light was too slow and until we fixed it, users would receive poor web performance due to the inefficiencies of the Internet protocols.  Some people said “greater bandwidth will solve the issue” and promptly forgot about it.

Well, here we are, eight years later.  We still haven’t increased the speed of light, available WAN bandwidth has grown many times over and yet those of us remote from the data we need are still waiting for information; if anything the situation has got worse.

More users than ever are working remotely from corporate data, recent research from Nemertes Research states that “fewer than 10% of workers work at headquarters in the average company”[1]. At the same time, IT departments are consolidating servers to ease the management burden and comply with backup regulations, as an example Hewlett Packard announced it is cutting back from 85 worldwide data centers to 6[2].

The last 8 years have also changed the way that applications are delivered to users; web-based applications being the norm, (often using SSL for encryption), streaming data for training and a wealth of rich content distributed around the standard organization.  Web-based applications consume at least ten-times more bandwidth than traditional client-server applications.

Greater bandwidth is not equal to faster throughput


There’s no doubt that adding bandwidth helps delivery of data up to a point and the more users at the remote office, the greater the benefit from adding more bandwidth. 

A simple analogy is to use the idea of a 65 mile length of freeway with a speed limit of 65MPH, when empty a single car can drive the distance in one hour.  If development plans show that the freeway will be used by twice as many cars in eight years, then doubling the number of lanes will provide enough width for the new traffic.  But what of the individual sitting in his or her car, does that individual car get there any quicker?  The speed limit is still 65MPH so even though we have doubled the number of total cars, each individual car still takes the same hour to drive the distance.

To take this analogy further, if a car-owner was moving house eight years ago and it took him two trips to take his belongings along this same road, the total time to move house would be four hours (two round-trips).  Today he is moving house again and has ten times as many belongings, now unless he hires a truck it will take twenty round-trips or a total of twenty hours and the total number of lanes on the road is irrelevant for that individual.

The enemy of applications – distance


If the enemy of application delivery is not bandwidth, what is it?  It is distance.  To be more exact, the enemy is round-trip time.  And round-trip time is defined by the following:

            The speed of light.
            The real distance the data needs to travel (cables don’t go direct from source to destination).
            Any delays from routers, firewalls and network latency.
            The server and PC delays at each end.
            The amount of data that can be transmitted at one time, defined by the protocol being used.

Our protocols are inefficient over the WAN


Now for some mathematics.  Don’t hide, its not that bad.

The original design goal of TCP/IP was to create a protocol that was reliable over almost any network.  A sender transmits small (maximum 64K) packets of information and then wait for an ACKs (acknowledgements of data) back from the recipient before sending more.  The equivalent on the freeway is to take one box of belongings at a time along the 65mile route before driving back empty and collecting another box.

To make matters worse, other protocols reduce this maximum (for example MAPI, used by Microsoft Exchange, uses a maximum of 32K).

So, a single 5MB file needs a minimum of 78 round-trips (or 156 if using MAPI).

Even this assumes that TCP uses its highest window-size, however window-size is negotiated and adjusted between the devices based on response-time, TCP never gets to a 64KB window on long latency links.  There have been a number of articles and papers on this, search for "bandwidth delay product" and you'll see for example that without optimising using window scaling or other techniques it is not possible, for example, to transfer greater than 1Mbit/sec over a satellite link.  This is also a good discussion: http://packetlife.net/blog/2010/aug/4/tcp-windows-and-window-scaling/

Isn’t the speed of light so fast that this is all still only a theoretical problem?


OK, I admit, the speed of light in a vacuum is pretty fast – 299,792 Km/second or 186,282 miles/second.  However, the speed of light in fiber or copper is around 70% of that in a vacuum[3], roughly 210,000 Km/s.

So, to go back to our 5MB file that requires a minimum of 78 round-trips.  Let’s assume the server is in Boston, Massachusetts and the user is in London, a distance of 5279Km[4]. A single round-trip is double the distance: 10,558Km.  78 round-trips is therefore 78 * 10,558 or 823,524Km.  Divide that distance by 210,000 and you have a minimum of 4 seconds to retrieve the file.

But this is all theoretical and assumes a direct link from the user to the server, no routing delays, no congestion and the optimal TCP window size. 

You can calculate it yourself – it’s twice as bad as you think!


Most PCs have a utility called PING, this can be used to see the real round-trip time between devices across WAN links and the Internet.  Before you start, make sure you are really testing to the destination you think you are, there are online utilities that will tell you where the server is hosted[5].

In theory, our round-trip time between Boston and London could be as short as 50milliseconds (10,558 divided by 210,000), however try it and you’ll find it is always at least double.  While writing this near London, I tested the round-trip to three websites hosted near Boston[6] (while most of the USA was asleep for minimum congestion) and received average round-trip times of 129milliseconds.  Now that 5MB will take a minimum of ten seconds to reach me, and this still assumes no server or firewall delays, congestion on the line, no slow-starts, maximum window-size and no additional packets to request the content and deliver approval from the server.

Let’s remember, the round-trip between Boston and London is only 10,558Km.  The circumference of the earth is eight times that and the greater the distance, the worse the situation.  Some examples using other round-trip times for the same 5MB file:

            San Francisco – London                     16 seconds
            San Francisco – Sydney                      23 seconds
            Dallas – Beijing                                   21 seconds
            Paris – New Delhi                               12.5 seconds

(Don’t even think about using a satellite – geostationary satellites are based 35,000 Km above the earth introducing even greater delays).

So, to show the real problem sometimes there’s only one option.  People based in HQ need to jump on an aeroplane and work in a remote office for a week, accessing all the same data that they do at HQ!

What can be done?


In simple terms we need to reduce the number of round-trips that data needs to take to get from a server to a user.  To go back to our analogy of moving house, we could:

1.                        Throw out some of our unwanted stuff – therefore reducing the number of trips.
2.                        Optimize our delivery mechanism; hire a truck instead of using a car and get more items in one journey.
3.                        Prioritize what gets sent first.  Which is more important, the refrigerator or the curling tongs.

In the data world there are also a number of techniques that can work together to achieve faster data delivery.

Object or file caching

Keep a copy of the object at the remote site, using object caching.  When a user requests an object that has already been requested by another user, it can be delivered from the local cache (after checking with the server that the cached copy is still up to date). This reduces WAN bandwidth and latency to almost zero.

Byte caching

When an object is not fully cached, techniques to recognize repeated patterns in the data can send tokens instead of the repeated data.  This can send a few bytes instead of large amounts, thus increasing the apparent bandwidth and reducing the time to deliver the content.

Protocol Optimization

Hide the inefficiencies of the protocols by sending large blocks of data before waiting for acknowledgements, fast-start those protocols that are slow to build up transmissions and even anticipate user requests for data (if a user requests the start of a file, these devices can anticipate that the user will request the rest of the file).

Compression

Use compression technologies between the sites to reduce the bandwidth and round-trips needed.

Bandwidth Management

To make sure the systems use the available bandwidth effectively, set priorities by user group, by server, by application etc.

Remove inappropriate traffic

Let’s not forget that business traffic is often competing with non-business traffic.  Deploy devices that implement policies to block requests for inappropriate traffic.

Conclusion:  Latency – the application killer


Bandwidth is not enough - distance is the real killer.  Even with unlimited bandwidth, data still travels from server to user slowly due to the repeated trips taken before the full data arrives; we still have to wait.  Organizations need to investigate solutions to solve this problem or applications will be unusable in remote offices. 

No comments:

Post a Comment