Tuesday, August 20, 2013

Decomposing Software

More on the Humpty-Dumpty Effect

From Chapter 2:

Excerpt
Men and months are interchangeable commodities only when a task can be partitioned among many workers with no communication among them (page 16)

From "On the Criteria To Be Used in Decomposing Systems into Modules"1

Excerpt
...it should be possible to study the system one module at a time. The whole system can therefore be better designed because it is better understood.
Resting place for collateral damage from decomposition

Brooks' Law describes how the number of intercommunication paths grow quadratically as staff is added. Each communication path adds overhead and eats away at productive programming time. Adding staff will slow progress. A schedule problem cannot be solved by simply increasing the number of man months.

Note that these intercommunication paths imply that the system has been decomposed into understandable design chucks that can be developed independently.

By and large, decomposition is considered necessary and inherently good. Decomposition is a fundamental systems-engineering method.1 And, since Parnus2, software engineers have obligingly decomposed code into functional parts and in pursuit of "modularity and information hiding." If there's any controversy about decomposition, it concerns type: functional vs object oriented, hierarchical vs flat, efficient-and-fast vs elegant-and-slow.

Perhaps a little controversy is warranted. It was during the Constellation program I noticed the secondary effects from decomposition. These secondary effects may have far greater consequences than 'just' overrunning budget and schedule.

Consider this sketch of what happens when NASA builds software system for a mission.

The work begins by splitting the system into flight and ground systems. The flight system might be decomposed into the following subsystems: Command and Data Handling, Attitude Control, Propulsion, Thermal, Power, Structural and Avionics. The ground system might be decomposed into the following subsystems: planning, commanding, space communication, telemetry, data management and science processing. Each function in each subsystem is processing data about the state of the system. In many, many cases the same underlying state is sensed, evaluated and represented by the different subsystems. For example a voltmeter on the power bus will report changes in voltage, but the same changes will also be sensed (sometimes indirectly) by each device that uses power.

Furthermore, when the system is decomposed into subsystems, the physical relationships (i.e. the connections inherent in nature) are lost. For example, when Propulsion turns on a catbed heater; Power will register a voltage drop on the bus and Thermal will sense temperature change in of nearby structures. Recreating the relationships between states is necessary for effective fault management and autonomous flight. The current generation of spacecraft capture only the most rudimentary physical relationships. If conventional decomposition approaches were used, developing a system that replicated nature's links would be prohibitively expensive and risky.

A similar loss of the physical relationships will occur in the data model when the end-to-end software in flight/ground system is partitioned. Each subsystem team defines the domain specific representations of the information it needs. As the data is passed around it will undergo many transformations. In the transformational process, temporal and semantic information of the actual system state is lost and the information must be cobbled together from the pieces.

Why does this matter? Here's a few of the things I've seen:
  • Isolation of state information makes it impossible for anyone to know how all the parts will interact.
  • Unintended relationships are spread around the system like seeds at a watermellon stand. These relationships are very difficult to track down.
  • Change must be approached with great caution because code breaks in surprising places.
  • Testing becomes arbitrarily narrow, obvious and monstrously expensive. Thorough testing is simply infeasible.
  • Maintenance costs grow like kudzu and overwhelms any available funds that might be used for new approaches.
But, here's the real gotya: The effect decomposition has on the institution.

Just like the design is partitioned, the development work is split up and the funding divvied among organizations. These funds are the life blood of each organization and managers will fight like the dickens to keep a task alive--even if the technology they own is moribund. What began as an engineering decision soon becomes become a fixture in the political landscape that, once established, must be protected. And since the political divisions mirror the system decomposition, a new mission system will be architected along existing organizational lines. i.e. the organization becomes the system architecture--an ossified one at that.

Can you imagine what would happen if a methodology or technology came along that would perturb the subsystem boundaries? Not pretty. As the architecture becomes annealed to the organization, an organization becomes resistant to technology that threatens a charter. Likewise criticisms or advances in technology that come from outside the organization must be defended against since the claim of unique expertise is required for survival. (Otherwise, why not go elsewhere?)

Add it up. Once THE decomposition is place, the software becomes brittle and expensive to change just as the organization's very survival starts depending on the status quo. The result is a fusion of system and institution that is highly resistant to change and nearly impermeable by new ideas. When creative ideas and developments appear they tend to languish and die on the shelf--a damning claim for an organization like NASA that's chartered to be a world leader in technology.

I would be the last to advocate that we stop the decomposing our systems. However, if there's a scintilla of truth to this posting, decomposition inhibits change. Reconciling the need to decompose with the need to change may present the next generation of software engineers with one of their greatest challenges.

1 Parnas, D., "On the Criteria To Be Used in Decomposing Systems into Modules." Communications of the ACM. 1972

2NASA Systems Engineering Handbook

Thursday, August 15, 2013

The Humpty-Dumpty Effect

From Chapter 2: The Mythical Man-Month

Excerpt
Hence the man-month as a unit for measuring the size of a job is a dangerous and deceptive myth... Men and months are interchangeable commodities only when a task can be partitioned among many workers with no communication among them (page 16)

From Computers in Spaceflight: The NASA Experience1

Excerpt
NASA lessened the difficulties by making several early decisions that were crucial for the program's success. NASA separated the software contract from the hardware contract, closely managed the contractors and their methods, chose a high-level language, and maintained conceptual integrity. (Chapter 4, page 114)

From Humpty Dumpty

Excerpt
All the king's horses and all the king's men, Couldn't put Humpty together again.
This is the second of the 'fallacious thought mode[s]' Books describes as a cause for common schedule "disasters." The mistake of equating "men and months" arises from the assertion that a man-month of effort is always equivalent to a man-month of progress. In other words, "men and months" are equivalent only when the additional of a team member has zero impact on the team. That almost never happens. Here's why:
  • Some tasks cannot be subdivided
    In a famous analogy he points out that nine women cannot have a baby in one month. After all, only one woman can contribute meaningfully to the effort. Similarly, adding additional programmers to a indivisible development task provides no benefit. For example, very bad things can happen when two architects are simultaneously providing design guidance or when two programmers are simultaneously changing the same code. Adding 'men' (or women) only makes things worse--much worse.
  • If a task is partitioned, there must be intercommunication between subtasks
    After the work has been divided, the separate parts will need to be assembled into a system--the parts must fit and be ready on time. Interfaces, integration and test strategies, progress against schedule, and defect repair all must be coordinated. This coordination requires "intercommunication" between the sub teams.

    Here's the rub: intercommunication effort must be added to the work load. Intercommunication is not cheap. In fact it's very expensive. Each intercommunication path adds overhead according to Brooks' Law.
  • If 3 developers are added to a staff of 5, the number of interconnections nearly triples. If 5 developers are added, that number jumps to better than a factor of 4. Note that the additional effort to support the new developers can quickly overwhelm any benefit from adding staff to a team. And that's after the ramp up when every one is trained!2

For a very big system, like an end-to-end space system, the work must be sub-divided. As a general practice in Aerospace development, the sub task will be allocated to teams across both intra and inter institutional boundaries. When that happens the cost of intercommunication escalates.

Here's a few of factors that might be added to Brooks' Law when sub-tasks cross institutional boundaries:
  • Intellectual property must be protected. Licensing, patent filing and technology reports must be completed prior to any collaboration
  • Documentation must be approved. The information must be vetted by a document review group who typically has no background in any technical discipline.
  • Export concerns must be checked by an export control group who oversees compliance with ITAR and EAR. Depending on which export guru you get, each technical element may need to be categorized and some team members may be required to obtain export licenses from the state department.
  • Sharing a development and integration environment may require all the above steps plus a hole in a firewall. Firewall access often requires cooperation with the institutional security organization who is primarily concerned with preventing access.
  • Each of the above steps must be reviewed and vetted by management--often several lines of management. In some cases exchanges will be vetoed because of political concerns driven by inter-center or intra-center competition.

In my experience, the impact of these additional factors dwarfs the impacts of Brooks' basic law.

Consider the early decision to split Shuttle hardware and software development as described by Tomayko1. The Shuttle decision was made in the 70's when the bureaucracy was not so onerous. Nonetheless, the complexity of the system lead to considerable technical challenges (many due to limitations in the hardware architecture), a 10X cost overrun, and an ongoing maintenance cost of $100M per year just for the onboard software.3. While the job was too big for one organization, it seems debatable that separating hardware from the software development contributed to Shuttle success. Substantial government financial support would seem a more likely explanation.

A similar decision to split development between among institutions was made for the Constellation program. The effects were more devastating. In Constellation, the channels of intercommunication became cluttered; many team members spent most of their time on WebEx telecons. It became impossible to make any progress. As a matter of necessity, each group, in each center, in each project, flailed madly at their own sub task in order to make published deadlines. There was simply not time, budget or organizational infrastructure to coordinate the hundreds of subtasks.

Intercommunication may not the biggest challenge that results from partitioning work. During the decomposition process (a standard engineering process) the concept of the whole system is lost. When you decompose a complex software-intensive system into parts, it becomes extremely to reason about how the parts will behave together. Our traditional system engineering methods do nothing to alleviate the technical surprises that lead to requirement creep and a daunting integration bow wave.

Peter Newell - Through the looking glass and what Alice found there 1902 - page 110Perhaps it is fortunate the Constellation program was cancelled before anyone seriously attempted system integration. In my view it is doubtful that all the 'kings horse and king's men' could have assembled the planned pieces into a system. The effort was just too big, too complex and too dispersed.

Call it the Humpty-Dumpty effect.


1Tomayko, J., Computers in Spaceflight: The NASA Experience, Chapter Four-Computers in the Space Shuttle Avionics System-Developing software for the space shuttle. NASA contractor report, 1988.
2Brooks also says that training is needed, but I plan to discuss that in a later posting.
3Levison, N., Software Challenge of Flight Control (chapter) in Shuttle Legacy: How We Did It/What We Learned, Launius, R. (editor), 2013

Saturday, August 10, 2013

Castles on the ground

From Chapter 2: The Mythical Man-Month

Excerpt
For the human makers of things, the incompletenesses (sic)and inconsistencies of our ideas become clear only during implementation. (page 15)

Brooks points out that since a programmer works with "thought stuff," and not physical things, there's a tendency to be optimistic1 about the effort required to deliver code. Physical things constrain a creation and keep the creator honest. For example, "Lumber splits. Paint smears." In other words, you cannot create a house from playing cards, but you can build castles in the air.2 Software can seemingly do anything that can be imagined. That's the risk. It's only in implementation that we can know what's truly possible.

From my perspective, Brooks is right as rain. It's only in the doing that the necessary discoveries occur. And since the activity happens in a government-sponsored, bureaucratic context, these discoveries are both technical and programmatic.3

Of course planning must be done in advance of the discoveries. If you had to make a plan, you would probably weight the following technical considerations:
  • Skill of the available programmers
  • Experience with the selected development tools
  • Familiarity with the development and deployment platforms
  • Familiarity with the application design
  • Adaptability of the architecture
  • Intelligibility of the code
However, if you want to build a castle on the ground, here's a few programmatic considerations to factor into the effort estimate:
  • Mismatch of available funds and customer needs
  • Delinquent or non-existent requirements
  • Staff availability
  • Onerous institutional processes obligations
  • Meetings, meetings, and more meetings
  • Funding and proposal cycles that inopportunely draw off the key staff
  • Vacillating management support
  • Staff shuffling when the institution is between large projects
During my stint in NASA, I needed to progressively increase the schedule allotment for programmatics. The programmatic tax on a key developer can be as high as 50%. In other words most of your best talent may be consumed supporting non-technical work. Ironic. But the programmatics are a real and consequential as the funding that supports the entire effort. Ignore the programmatics and you will underestimate. (Typically this cost must be hidden since it's "just part of the job." But I digress.)

Back to Brooks' assertion that the discovery is in the doing. Remember all this discovery is happening on the clock. Decisions are made on the fly. An unwitting decision may (and often does) impact a generation of developers. Recognizing what is and what isn't an important design consideration is the realm of talent.

I have been most fortunate to have worked with a few programmers with the vision to see when the obvious led to trouble. Hiring one of these folks requires luck. No amount of punctuation or letters after a name indicates talent. Over the years I came to believe that hiring practices restricted to grads, especially recent grads, made it nearly impossible to find the best talent.

Experience is the best teacher and the best antidote to naïve optimism.

1I'm reminded of that old chestnut. What's the difference between an optimist and a pessimist? An optimist believes we live in the best of all possible worlds. A pessimist is sure of it.
2Another chestnut. What's the difference between a neurotic and a psychotic? A neurotic builds castles in the air. A psychotic lives in them.
3Programmatic refers to the activities required to obtain and keep government funding.

Wednesday, August 7, 2013

The zen of pragmatic realism

From Chapter 2: The Mythical Man-Month

Excerpt
first false assumption... is that all will go well, i.e., that each task will hike (sic) only as long as it "ought" to take.. (page 14)
This is the first of the 'fallacious thought mode[s]' Books will describe to explain why schedule "disasters" are so common. He argues that programmers are inherently optimistic and will under estimate effort because they assume that a task will go well. He explains that this optimism is born of the idealistic view that there will be few difficulties. A view that is unjustified because, programming, like all creative activity, will succumb to the unforgiving "medium of execution." Difficulties that can only be overcome with the hard work needed to ferret out the lurking details that cause most of the evils facing a development effort.


Brooks was writing at a time when most programmers were young, "younger than computers." But that was then. We're now two generations removed. Those young programmers of yore are now grey-headed veterans. Young programmers may still be optimistic, but those vets are not. Pessimistic is a more apt description. A few years worth of 100-hour work weeks will do that.

"How long will that take?" I must have asked that question a zillion times. My mental model for the answer was a bi-modal distribution with a direct correlation between experience and conservatism. The more experience, the greater the influence of pessimism, the greater the estimate. In part that was because delivery commitments are negotiated, and everyone knew that the schedule would not grow until the project is in trouble. In part it was because you learn there's never enough time. If Brooks were to update this first fallacy, perhaps he might say we should weight effort estimations according to an individual's optimism/pessimism rating.

Brook's "fallacious thought mode(s)" gets at something deeper--a fundamental pragmatic realism. There are non-technical forces that profoundly impact the development of software. For years I worked with a very talented engineer who just called it nature. In his view nature was the cause of circumstances that were inevitable. We'd often sit in my office and lament that nature tended to thwart our view of the good. For example, when we'd learn that another round of scare funds had been awarded to a buzz-word proposal, he would say that people would rather take a pill than do the hard work to get healthy. That was nature. When we'd see a poorly conceived design lauded by a politically-motivated manager, we'd think, "That's nature."

Here's a few examples of what happens in nature:
  • Programmers will program. If there are no requirements, they will make them up.
  • Every programmer develops his or her own concept of what and how things should be improved. Given the least opportunity, they will do it.
  • Programmers will want to use what they already know and will resist using new tools when faced with a deadline.
  • A baseless rumor can kill a good idea. A manager will be hard pressed to discern what's a baseless rumor and what's a legitimate technical concern.
  • A budget that is too large is just as bad as a budget that is too small. The additional resources open the door to unwanted improvements whose real cost is only discovered late in the schedule
  • A manager without software development experience cannot to persuaded by an appeal based on the technical details of the software.
In time I grew to understand nature as the way things would happen despite the appearance that they could be otherwise. The causes didn't matter. What mattered was the simple acceptance of nature as the starting point on the road to meeting an objective.

To the best of my knowledge, nature, or whatever you might call it, has never been the subject of serious study. Yet it would seem that more than the lines of code, system complexity, or lack of physical constraints, these forces of nature do more to shape software disasters than any other cause.

Over the years I noticed that the institution rewarded those managers who instincts were in harmony with nature. I never mastered nature's pragmatic realism or accepted it as ineluctable; probably because I felt a duty to do better. But then, I was never enlightened in the ways of a large, government-funded bureaucracy.

Friday, August 2, 2013

Stagnation through Competition


Continued from the previous post "Sliding into the inferno"

Excerpt
From a comment on In the beginning... there is the estimate Matt says:
"...way to compress [cost and schedule] is to deploy two development teams in competition with the same requirements"

In response to the July 19th posting, Matt suggested that competition might do away with "the faux rigor... [that keeps] the BOE hounds away."

I'm skeptical.

In its way, NASA is very competitive. Centers must compete for missions. Contractors1 compete on RFPs. The NASA engineering cadre competes with contractors to keep the work in house. Researchers from a broad community compete for scarce NASA research funds. JPL engineers pass Ames and Goddard engineers in the halls of JSC and KSC competing for some human flight business. The smaller NASA centers like Glenn, Langley, Stennis, IV&V and Dryden all compete for a piece of the shrinking NASA budget in an epic struggle to survive. All the while, JSC, MSFC and KSC sit atop the crowded human-flight hill in an uncomfortable alliance nervously watching the fracas on the Hill.

But competition does not a free market make.
  • Headquarters must keep all it's center-children alive or incur congressional wrath.
  • Funds are allocated with all the hard-nosed selectivity of a jobs program
  • Headquarters is highly attuned to influences on Congress and the contractor's have hired lobbyist to assist in the selection of this year's program.
  • Centers are be mindful to the perception of favoritism. Everyone at the contractual table gets a piece of the pie.
  • Contractors (i.e. the private sector)compete on price ensuring the new thing will be the same as the last thing.
  • In-house engineers must match contractor prices so they too budget to build the same thing they built last time.

It's a competition for influence. Funds are awarded in a political-charged, bureaucratic milieu that is rife with ambition. The actual factors in a decision may never be known, but every decision will be carefully justified; a paper trail will secure a record of sound judgment. How could it be otherwise? There are billions of dollars in play before the watchful eyes of powerful competing interests and a cadre of auditors ready to throw a flag.

The leveling forces of a free-market competition do not work in this environment. Faux rigor is beneficial. It provides a sufficient foundation to ensure that the laws of 'business as usual' will produce a predicable outcome. And, if engineers like anything, they like a predicable outcome.

The consequence?

There is precious little incentive to break new ground. Rewards go to the proposals that preserve the past. Despite NASA's reputation as a technology leader, sellers cannot expect to win with a proposal that pushes technology boundaries and buyers cannot reasonably select one. If the past forecasts the future, we should expect NASA to continue to build the same basic system that was deployed in the Apollo era. The method of competition ensures it.

1For example, Lockheed, Boeing, Orbital, Raytheon, BAE, Ball, Northrop Grumman, etc.