I was working with a client, admittedly a couple of years ago, on how to scale a service to 350 Million connected users. they wanted to be able to send messages and updates between them all. they had a platform, but it was not working as expected and they wanted external, fresh brains to help sort out what the issues might be, and how to achieve their desired scale.
So I was casting around for how to architect systems at “Internet Scale”, and I found a great paper by Jeff Dean of Google. In it he describes about designing for failure, for efficiency when performing large numbers of small jobs etc.
In this presentation he had a page that was called “numbers everyone should know” which is the image below.
However nano seconds are not very close to human scale, lets scale them up 1000,000,000 times so that we can think of them as seconds, because seconds are something we can understand, and perhaps you’ll start to understand the disparity between the smallest and largest shown.
L1 Cache is the memory closest to the CPU, on the silicon, and this takes half a nano second, so think of this as seconds.
L1 can I have my data please… thanx. half a second, not bad.
Now read to from main memory (the 4Gb or so you have in your machine) is 1 min 40 seconds, not too bad, bit like using instant messaging for a conversation.
Now though, consider the read from disk at 20 million seconds, this is over 230 days or nearly 8 months.
8 months, to get something from the hard disk. This I think puts some of the power of modern computers in a way we can understand. The last, send a packet over the internet is over 4 years 9 months. A long long time to wait.
Just in case you’re thinking about it: one nanosecond is to one second as one second is to 31.7 years
So now all you have to wander about is just what your computer is dreaming of whilst waiting for us slow humans to do things, or for the internet to respond. Electric sheep anyone.
…And of that project, well we (IBM) provided them with architecture patterns and lots of advice, but alas other issues meant that the project was stopped long before they had millions of users, let alone hundreds of millions. Never mind.
Posted on October 5, 2012
0