The following text is copyright 1998 by
Network World, permission is hearby given for reproduction, as long as attribution
is given and this notice is included.
How many 9s are enough?
By Scott Bradner
If there is one thing
that the phone people are insistent on it is reliability. At least they are in
their demands on equipment. The common belief among phone people who are trying
to build data networks is that equipment needs to be "five nines"
(99.999%) reliable in order to be useful in a network that they might want to
build. I think they are wrong to want this level of reliability in data
networking equipment and I fear that their insistence on this level is
inhibiting their deployment of useful data networks.
It will be interesting
to see if they maintain this belief in the future where they will have to
compete for customers against other providers. It is currently easy for the traditional
phone company to insist on reliability at great cost since they live in a world
where increased costs mean increased revenues being authorized by the local
utility commissions. But utility commission distorted economics aside, I think
the problem here is that the people that are insisting on "five
nines" do not understand data networking.
Back in 1964 Paul Baron,
then at the Rand Corporation, produced a series of articles proposing the idea
of packet switching networks. (These papers were recently put on-line at http://www.rand.org/publications/RM/baran.list.html) He was working at a time when
there was considerable worry about destruction of the US communications
infrastructure by enemy action. He proposed a network design which would
survive large-scale node or link destruction. His design was for a distributed
network with many small, cheap packet switches and many redundant links between
them instead of the then common network design which had a few large phone
circuit switches. He showed that when reliability was measured end to end, a
distributed network would exhibit very high reliability even in the face of the
failure of a number of the switches or links in the network. He concluded
"From the user's viewpoint, the system appears to be virtually noise- and
error-free when handling data." He was describing the current Internet
architecture quite a bit ahead of its time.
A key reason to use a
distributed network is to minimize the reliance on any single network
component. The network will route around link or switch failures. In this type
of environment "five nines" reliability is way overkill. But it is
not a surprise that the phone types think in terms of the need for extreme
reliability - they generally do not have distributed networks with redundant
paths.
There are places in many
Internet service provider (ISP) networks where redundancy is not as rich as it
might be, the link to the customer site for example, and in many ISPs the level
of traffic is such that routing around a failure will cause congestion and data
loss. But Internet-style networks are not the same as telephone style ones and
the reliability demanded from each component should not have to be as high
because the net will cover up in most cases. Less expensive, reasonably
reliable switches may not result in less reliable service to the customer.
disclaimer: Harvard
spends more time understanding the reliability of people than electronic
components so the above postulation is mine.