Tuesday, October 29, 2013

Rattus rattus

From Chapter 2: The Mythical Man-Month

Excerpt
What are the alternatives facing the manager?...The...only alternatives are to trim [the task] formally and carefully, to reschedule, or to watch the task get silently trimmed by hasty design and incomplete testing(page 24)
1660 engraving Scenographia Systematis Copernicani
Scenographia Systematis Copernicani
(The Copernican System, 1660 engraving)

From A Distant Mirror1

Excerpt
Doctors struggling with the evidence could not break away from the terms of astrology, to which they believed all human physiology was subject. (page 107)



Brooks is recounting the unenviable remedies available to a development manager after a late milestone is missed. Nearly every experienced software development manager has had to negotiate these options.

Couple things worth noting: Brooks' remedies go from bad to worse and they are not mutually exclusive, but they are exhaustive. So, as a rule, the most damaging effect, "hasty design and incomplete testing," is typically part of the package.

The recent news abounds with a shining existence proof of Brooks' Law at work: Healthcare.gov. As of this writing, the Administration is pulling out all the stops and bringing in the 'best and brightest' to help resolve the problems. But, according to Brook's Law, there will months of delays and handwringing, but, in the end, the remedies will be those plainly called out 40 years ago by the good Dr. Brooks.

Bear this in mind: the challenge of building Healthcare.gov pales in comparison to building a high reliability, autonomous, affordable, flight-ground space system. After all, the architecture for web-based data systems, like Healthcare.gov, are well understood. I don't mean to trivialize the intellectual challenge of architecting a large system, but we have several decades of experience building variants of the client/server architecture. By comparison, we do not, as yet, have a proven, nevertheless viable, approach for building a affordable, high reliability, autonomous, flight-ground space system.

If I learned anything during my years managing software efforts, it's just this: when you miss a milestone late in the schedule, it's too late. The fix must happen much earlier in the lifecycle. But what exactly should be fixed? What are the root causes of missing a late milestone?

"Ah..." you say, simple enough? Not at all. Determining a true root causes is not so simple. No matter how urgent the need for a deep understanding, the bias of accepted fact will cloud our observations. Not even when the stakes are high. Not even when the survival or the human race hangs in the balance.

In October of 1347, a Genoese trading vessel pulled in the harbor of the Sicilian port of Messina. The ship was carrying a cargo from the Black Sea. The crew was dying with large black swellings around the armpits and groin; they were dying of the plague. This was the start of the European pandemic called the Black Death.

Victims of the disease suffered terribly. Most died within three to five days of showing the first symptoms. In some cases the sick went to bed healthy and died in their sleep--they were the most fortunate. While the suffering lasted, there was little to ameliorate the agony. The treatments included blood letting and exotic medicines like powders made from stag horns, gold, pearls and emeralds. None helped.

Prevention was everywhere an urgent priority. Based on a millennia of medical know-how going back to Hippocrates (460-377 BCE) and Galen (130-201 CE), medical experts understood that the disease was spread by corrupt air, or miasma. As the pestilence spread, physicians prescribed burning incense, smoking tobacco or carrying posies as way to purify the air and stave off the disease. But the scourge spread was unabated.

As the situation grew more critical, Phillip IV,2 King of France, sent an urgent request to the medical faculty at the University of Paris for a report. The University of Paris was the leading academic institution of the day; these men were the best and the brightest. The subsequent report confirmed that the disease was spread by a miasma and, according the medical theories of the day, identified an astral alignment as the event that triggered the miasma. They were very specific. The miasma was caused by the "conjunction of Saturn, Jupiter and Mars in the 40th degree of Aquarius said to have occurred on March 20, 1345."3. The report from the University of Paris was copied and circulated. It became the accepted scientific explanation across the Christian and Muslim worlds.

Bill of MortalityBut, the ordinary person, being devout Christian, was skeptical of the scientists. Most felt they were suffering from the wrath of God caused by the indulgences of the church and the sins of society. Popular movements sprung up to appease their maker. At first there were penitent processions. When those proved insufficient, there was a scramble to obtain sacred relics, by stealing if needed. Soon, marauding mobs of flagellants traveled from town to town fomenting a hysterical and desperate religious fervor. All this was accompanied by vicious pogroms against Jews, Muslims and any group who might be responsible for the Devine wrath. Still the pestilence spread.

The Black Death would ravage Europe for nearly 50 years. By the end of the 14th century, it is estimated that 40-50 percent of Europe's population died from plague. That wasn't the end of it.

In the winter of 1664 a comet appeared in the heavens portending another disaster. An epidemic broke out in London in the fall of 1564. By the following September, the London death rate was over 7,000 per week. By that time, there had been some scientific advances and physicians had come to believe the disease was transmitted by animals as well as miasma. During the height of the epidemic, the Mayor of London ordered that great bonfires be lit to cleanse the air and that all cats, dogs and pigeons be killed. Killing the cats would prove to an error that worsened the epidemic.


Vergleich Hausratte Wanderratte DEPlague pandemics continued for another 500 years. In time it was learned that the disease was spread by fleas and rats--Rattus rattus, the common black rat. The microbe was initially spread by flea or rat bite in the form of the bubonic plague. Once a person was infected the disease would spread in a highly contagious respiratory infection called pneumonic plaque.

The actual plague bacterium was finally isolated in 1894.4 The first plague vaccine was tested in 1897. The identification of the bacillus and development of the vaccine was possible only after a series of hard-won, 19th century scientific breakthroughs. Development of the vaccine required identification of the bacillus. Identification of the bacillus was possible because, in 1870, contagious diseases were linked to bacteria.5 Bacteria became a respectable candidate for contagion only after the theory of Spontaneous generation was dismissed and Germ Theory became widely accepted as the mostly likely means of contagion.6 Each breakthrough was resisted by the establishment of the day.

The history of plague prevention seems like a particularly compelling example. Here we have a dire need for a means of prevention. The survival of all that's near and dear is at stake and yet there is a glaring inability to put aside assumptions and examine the observable evidence that might have led to the discovery of the actual means of contagion.7

Shaking off assumptions is a fundamentally hard. History is rife with tragic examples like those made by the physicians of the 14th century. It's our nature. We are reassured in the knowledge that we know how to and do the right thing. We are inclined to interpreted our observations so they our accepted views. The result is an undeserved and complacent confidence that fills the mind with a false sense of reality.

So it is with large-scale software projects. I've often heard (and sometimes asserted) the claim that '...if only [thus-and-such] had been done the work, the project would have succeeded." Frankly these claims make me uneasy. They ring of naiveté or arrogance. They fail to acknowledge our history of failures. We still see the same botched commitments, that Brooks described 40 years ago. In the simplest terms, there's a failure to recognize that the development approach for large software system is not a settled matter.

The lessons of history suggest that, if we want a different result, we should be aggressive in challenging our assumptions. In other words, if we hope to build large-scale, software-intensive space systems, we need to question our approaches to budgeting, scheduling, requirement collection, design, and testing. No doubt this is a project for a generation or two.

But, What the heck? Why not start now?

Here's a list of claims I frequently overheard during my years at NASA. For the most part, they were frustrated utterances of assumed immutable facts of life as a NASA developer. See if they don't start the gears turning about why software projects have faltered for the same reasons they did when Brooks managed the IBM 360 project in the mid 60's.
  • Projects overrun because the NASA business model encourages under bidding and the promise of unrealistic expectations.
  • Sponsors and upper management should not be exposed to development details even when those details drive cost and schedule.
  • Development methods that work on small projects scale up to meet the needs of large projects.
  • Software development and software maintenance are fundamentally the same activity and can be funded and managed the same.
  • Additional process obligations may be levied on teams without impacting cost and schedule because "they have to do that anyway."
  • Processes can be successfully deployed without tool support or field testing.
  • Reviews success depends on signaling a message of smooth sailing.
  • The ORG chart (and not the system function) drives system architecture.
Each assertion stirs a debate in my mind on the whys and wherefores. I'm quite sure there's a rat in each, but I'm not sure just where. I plan to kick around a few suggestions in subsequent posts.

Closing note:
It was never proved that astral events weren't a cause of contagion. As a measure of modest caution, I checked the internet to see if there was a upcoming conjunction of Jupiter, Saturn and Mars in Aquarius anytime soon. I'm glad to report there's only two conjunctions of Jupiter and Saturn in the next sixty years and neither are in Aquarius. I suppose that's one less thing to worry about.

1. Tuchman, B.W., "A Distant Mirror, The Calamitous 14th Century." Alfred A. Knopf. 1978.
2. Phillip IV was known as the Fortunate. First of the House of Valois. 1328-1350
3. Tuchman, p. 107
4. This discovery occurring during an outbreak in Hong Kong that killed 50-100 thousand people. The plague bacillus, Yersinia pestis, was identified separately by two scientists, Alexandre Yersin and Kitasato Shibasaburō. The discoveries happened within days of each other. There been a long running controversy because Kitasato was slow to receive the credit he deserved.
5. Robert Koch was the first to link a microbe (anthrax) with a contagious disease and proved the germ theory.
6. Spontaneous generation postulated that some life, like fleas, could arise from inanimate matter. The theory was eventually disproved by Luis Pateur by his "swan neck flasks" experiment.
7. On occasion, the actual cause was reported, but mostly ignored. For example, in 1498, a renown Physician reported the disease was "communicated by means of air breathed out and in." (Tuchman, p.106)

No comments:

Post a Comment