Thursday, August 15, 2013

The Humpty-Dumpty Effect

From Chapter 2: The Mythical Man-Month

Excerpt
Hence the man-month as a unit for measuring the size of a job is a dangerous and deceptive myth... Men and months are interchangeable commodities only when a task can be partitioned among many workers with no communication among them (page 16)

From Computers in Spaceflight: The NASA Experience1

Excerpt
NASA lessened the difficulties by making several early decisions that were crucial for the program's success. NASA separated the software contract from the hardware contract, closely managed the contractors and their methods, chose a high-level language, and maintained conceptual integrity. (Chapter 4, page 114)

From Humpty Dumpty

Excerpt
All the king's horses and all the king's men, Couldn't put Humpty together again.
This is the second of the 'fallacious thought mode[s]' Books describes as a cause for common schedule "disasters." The mistake of equating "men and months" arises from the assertion that a man-month of effort is always equivalent to a man-month of progress. In other words, "men and months" are equivalent only when the additional of a team member has zero impact on the team. That almost never happens. Here's why:
  • Some tasks cannot be subdivided
    In a famous analogy he points out that nine women cannot have a baby in one month. After all, only one woman can contribute meaningfully to the effort. Similarly, adding additional programmers to a indivisible development task provides no benefit. For example, very bad things can happen when two architects are simultaneously providing design guidance or when two programmers are simultaneously changing the same code. Adding 'men' (or women) only makes things worse--much worse.
  • If a task is partitioned, there must be intercommunication between subtasks
    After the work has been divided, the separate parts will need to be assembled into a system--the parts must fit and be ready on time. Interfaces, integration and test strategies, progress against schedule, and defect repair all must be coordinated. This coordination requires "intercommunication" between the sub teams.

    Here's the rub: intercommunication effort must be added to the work load. Intercommunication is not cheap. In fact it's very expensive. Each intercommunication path adds overhead according to Brooks' Law.
  • If 3 developers are added to a staff of 5, the number of interconnections nearly triples. If 5 developers are added, that number jumps to better than a factor of 4. Note that the additional effort to support the new developers can quickly overwhelm any benefit from adding staff to a team. And that's after the ramp up when every one is trained!2

For a very big system, like an end-to-end space system, the work must be sub-divided. As a general practice in Aerospace development, the sub task will be allocated to teams across both intra and inter institutional boundaries. When that happens the cost of intercommunication escalates.

Here's a few of factors that might be added to Brooks' Law when sub-tasks cross institutional boundaries:
  • Intellectual property must be protected. Licensing, patent filing and technology reports must be completed prior to any collaboration
  • Documentation must be approved. The information must be vetted by a document review group who typically has no background in any technical discipline.
  • Export concerns must be checked by an export control group who oversees compliance with ITAR and EAR. Depending on which export guru you get, each technical element may need to be categorized and some team members may be required to obtain export licenses from the state department.
  • Sharing a development and integration environment may require all the above steps plus a hole in a firewall. Firewall access often requires cooperation with the institutional security organization who is primarily concerned with preventing access.
  • Each of the above steps must be reviewed and vetted by management--often several lines of management. In some cases exchanges will be vetoed because of political concerns driven by inter-center or intra-center competition.

In my experience, the impact of these additional factors dwarfs the impacts of Brooks' basic law.

Consider the early decision to split Shuttle hardware and software development as described by Tomayko1. The Shuttle decision was made in the 70's when the bureaucracy was not so onerous. Nonetheless, the complexity of the system lead to considerable technical challenges (many due to limitations in the hardware architecture), a 10X cost overrun, and an ongoing maintenance cost of $100M per year just for the onboard software.3. While the job was too big for one organization, it seems debatable that separating hardware from the software development contributed to Shuttle success. Substantial government financial support would seem a more likely explanation.

A similar decision to split development between among institutions was made for the Constellation program. The effects were more devastating. In Constellation, the channels of intercommunication became cluttered; many team members spent most of their time on WebEx telecons. It became impossible to make any progress. As a matter of necessity, each group, in each center, in each project, flailed madly at their own sub task in order to make published deadlines. There was simply not time, budget or organizational infrastructure to coordinate the hundreds of subtasks.

Intercommunication may not the biggest challenge that results from partitioning work. During the decomposition process (a standard engineering process) the concept of the whole system is lost. When you decompose a complex software-intensive system into parts, it becomes extremely to reason about how the parts will behave together. Our traditional system engineering methods do nothing to alleviate the technical surprises that lead to requirement creep and a daunting integration bow wave.

Peter Newell - Through the looking glass and what Alice found there 1902 - page 110Perhaps it is fortunate the Constellation program was cancelled before anyone seriously attempted system integration. In my view it is doubtful that all the 'kings horse and king's men' could have assembled the planned pieces into a system. The effort was just too big, too complex and too dispersed.

Call it the Humpty-Dumpty effect.


1Tomayko, J., Computers in Spaceflight: The NASA Experience, Chapter Four-Computers in the Space Shuttle Avionics System-Developing software for the space shuttle. NASA contractor report, 1988.
2Brooks also says that training is needed, but I plan to discuss that in a later posting.
3Levison, N., Software Challenge of Flight Control (chapter) in Shuttle Legacy: How We Did It/What We Learned, Launius, R. (editor), 2013

No comments:

Post a Comment