[essay from Mike Goodin]
I've been pondering the root Y2K problem for many years, searching for a concise way to describe the true nature of the potential threat. This week, aided by the phraseology of a scientist, I've constructed this question:
"What is the fault tolerance of our globally-distributed specialization network?"
This is the relevant Y2K question. Remember, it's not the compliance of home appliances that matter ( and why polls keep asking people about home appliances is an unfortunate mystery... ) , and the likelihood of failures somewhere on the planet are all but certain. Failures are going to occur, without a doubt.
The question concerns the ability of our globally-distributed specialization network to survive faults. If the global system is highly fault tolerant, it will survive intact, with few disruptions. If the global system has low fault tolerance, we're in for a very rough ride. Perhaps even a multi-year shutdown of civilization as we know it.
FAULT TOLERANCE HAS NEVER BEEN TESTED
Recognize the fault tolerance of our "new" global community has never been tested. In the days of World War II, America was relatively isolated. We could build our own planes, trains and automobiles ( tanks, too ) . We had factories, we had relatively short, U.S.-based assembly lines with skilled U.S.-based workers who possessed labor skills. The network of specialization was much smaller, and therefore, more fault tolerant. Everybody knows the fewer pieces you have in an engine, the less likely it is to fail. Simplicity leads to reliability. Complexity results in a low fault tolerance.
Today, the manufacturing base of America is nearly extinct, and the supply lines for building products stretch across oceans, involving a half-dozen countries for parts. This is the "globally-distributed" specialization network to which I refer, and it is a relatively young system.
It's been driven by economics, by specialization, by efficient ocean-going transports and air deliveries. It's enabled by international telecommunications: e-mails, faxes, phone calls, even video conferencing. International banks allow the moving of funds from buyer to seller, through trusted international clearinghouse networks. This is, indeed, a "network" of a thousand parts, and each part of the machine must work at near-perfect efficiency for the whole system operate correctly.
WE ALREADY KNOW THE SYSTEM CAN HANDLE A 1% FAILURE RATE
So what is the fault tolerance of this system anyway? That's the debate, that's the big question. Clearly, the people who say that systems fail all the time -- with no big deal -- are missing the point. Yes, power plants fail on a daily basis. Phone lines go down somewhere on the planet on a daily basis. Banks mess up transactions with frightening regularity. We understand that this global network has a fault tolerance of at least 1%. But that's not the right question. Y2K isn't a local hurricane. It isn't a local power outage or a local bank error. It's a simultaneous, global slam-dunk event. It may raise the failure rate of this network to 10%. And *that* is the big question: is our globally-distributed specialization network able to withstand a simultaneous failure of 10% of its parts? See, isolated failures always rely on the non-failing services -- and an excess of available resources -- to complete repairs. When a power plant fails, all the power experts get called on the phone lines, and they rush to the scene to fix this lone failing power plant. They use credit cards to buy plane tickets, gas, food, you name it. And when they're done, they go home and wait for the next power emergency. This demonstrates the 1% fault tolerance of our current system. But what if ten power plants go down? Suddenly you've got 1/10th of the available resources for each power plant. Then what if the telecomm is down? You can't reach the people qualified to repair the power. If the telecomm is down, they can't use their credit cards to get there. Then what if the airlines aren't flying? You've got delays, people have to drive. So they depend on oil, but what if the oil tanker shipments are delayed?
AT WHAT POINT IS THE FAILURE UNIVERSAL?
See, at some point, somewhere between 1% and 100%, you get a total failure of the network. The real Y2K question, when you boil it down, concerns this number. What percentage of simultaneous failure can the network withstand without collapsing?
Clearly, it's something lower than 80%, something higher than 1%. Perhaps the network could withstand a 5% failure; that's debatable. Imagine if 5% of all financial transactions were bad. That would clobber the financial institutions: busy signals forever. Imagine Wall Street with a 5% transaction failure. The whole system would shut down due to the 5% failures. A 10% failure would seemingly bring most networks down. Imagine if 10% of the parts in a power plant didn't work correctly. That's an off-line plant in short order. Imagine if 10% of the parts didn't show up at the Chrysler plant. That's a sure-thing shutdown. Imagine if 10% of the water treatment plants in the country failed. It would be a Red Cross nightmare, just attempting to supply water to 10% of the population.
In my opinion, the world probably can't withstand a 10% failure rate without severe and long-term consequences. A 20% failure rate would be, I think, a fatal economic event. It would thrust the world into a depression with all the resulting costs in dollars and lives. At a 20% failure rate, the efficiencies break down: the food production and deliveries, the oil, power, banking, telecommunications, and so on.
80% ISN'T GOOD ENOUGH
This is why, when people tell you that 80% of the systems are going to be ready, that's not nearly good enough. Technically, if you believe my analysis, 80% of the systems working is still a disaster. 20% of the systems failing could break the global network's back. In fact, a 95% "working" ratio isn't good enough, either. Even a 5% failure could have long-term, painful consequences. In order to avoid the worst effects of the Millennium Bug, systems need to operate at 99% or better. We need to have less than one failure per one hundred systems. At that rate, I'm confident the fault tolerance ability is sufficient.