Wednesday, July 31, 2013

Sliding into the inferno

From Chapter 2: The Mythical Man-Month

Excerpt
...when schedule slippage is recognized, the natural (and traditional) response is to add manpower. Like dousing a fire with gasoline, this makes matters worse, much worse. More fire requires more gasoline, and thus begins a regenerative cycle which ends in disaster. (page 14)

The problem is that modern charcoal, manufactured under strict consumer-safety guidelines, is one of the least-flammable substances on Earth. (from "Need a light" by Dave Barry, The Miami Herald, June 25, 1995.)

In a volume full of indispensable insights about software development, this is perhaps Brooks' most important. When the schedule is slipping, adding staff will make things worse.

How do you know the schedule is slipping? It may not be obvious. Can you rely on reports from programmers? Maybe. How do you know that the separate dependent pieces of code will integrate when delivered? You don't!

Teams that use modern development processes have a better idea of how progress matches plan, but that knowledge is imperfect. In my experience, yesterday's eminent disaster is inevitably replaced with today's catastrophe and tomorrow's surprise1. There's always something that may be nothing.

Since a reported slip will likely trigger 'help' from project management, a savvy NASA software manger will delay reporting a schedule problems until they are an inescapable certainty. Software is not unique in this regard; the same holds true for the other engineering disciplines. The first manager to report a slip does so at her own hazard. However, once the slip is official, everyone gets breathing room. The key is hold-on until someone else caves. The result: the schedule doesn't visibly slip as the project slides into trouble.

When the slip comes, it is expensive and consumes valuable political capital. At that point schedule margin is gone and the work must be accelerated. That's when gas gets poured on. But it's a different kind of fire than Brooks describes, one with more smoke and less heat. Programmers are rarely added. Here's a few of the obstacles a NASA software manager might confront:
  • You can't just drive over to Programmer Depot and find staff. The tools NASA uses are specialized.2
  • It takes a while to come to grips with the domain. You have to grok the existing code base, the hundreds of requirements and the stack of standards volumes from organizations like CCSDS . A programmer's ramp-up to productivity takes many months--too late to even consider hiring by the time the gas starts flowing.
  • The hiring bureaucracy is stubborn and slow. e.g. It takes at least 6 weeks for a very pushy manager to bring in a contractor. The paperwork is daunting--the acquisition organization is a leviathan.
  • Lead developers are rarer than hen's teeth. The culture does not provide the opportunity for programmers of talent to acquire the necessary design or leadership skills. Worse, the good ones lose interest and move on to Google or Microsoft. (I hope to take up this topic when looking at Chapter 3, "The Surgical Team.")
Nonetheless, there is a staff up. Instead of programmers, project management will add testers and systems engineers to close out 'problem paper.' There's a lot's of problem paper. Management places a very high priority on closing out problem paper prior to launch. If there is a mission failure, open paper is evidence of negligent management. The closure process becomes a large, if short-lived, cottage industry.

The closure process is intensive and the closure team requires a lot of interaction with the developers. In effect, the development team is pulled into the closure processes, and, aside from fixing a few of the most significant defects, development grinds to a halt.

However, all is not lost; the schedule can be met. Most of the remaining work will be postponed until the operations phase where the development costs are hidden. These upgrades may continue for years. In the end, the team is left with a piecemeal system, that is brittle and costly to maintain. But since there's no competition, what-you-get is what-you-get. This process is consider normal.

If you happen to be a forward-thinking, inspired software development person with a penchant for building the next generation of spacecraft, you've just slid into the inferno.

But...
What if there was actual competition where quality mattered? Would the result be different?



1 Engineers loathe the unexpected.
2 The shuttle engine software is written in Jovial.* The pool of skilled Jovial programmers dried up in the early 80's. Apparently, a Shuttle Engine programmer was required to train for 2 years before being allowed to touch the actual flight code.

*This footnote is incorrect. The shuttle software was written in HAL/S. I discovered the era while preparing a subsequent posting many months later (April '14.) I had believed that shuttle software was written in Jovial since I'd heard that in briefings during the Constellation Program. Interestingly, Jovial was used in defense systems that were developed during this era. For example, the initial B-2 software was written in Jovial (see http://www.semdesigns.com/Products/Services/NorthropGrummanB2.html)

1 comment:

  1. The follow exchange occurred in facebook. I've copied it here (with Ron's permission) because it might be of wider interest.

    Ron Dockal
    Good article. A better response to the situation is to have pre-defined "off ramps". i.e. Content that can be completed with cost/schedule, that will achieve some of the goals/requirements, and use legacy capability from there to perform the mission. As the article says, during operations you can finish the project. The worse thing is to leave a project in a state that achieves nothing, while you throw good $ (programmers) after bad.

    Kenny Meyer
    The downside of doing development during flight is that you can get stuck with an architecture that is expensive to maintain. It can drain the coffers for decades. No money left for a fix. Any ideas for escaping the maintenance money pit?

    Ron Dockal
    Make sure that new code always ends up fully replacing legacy code, not just partial. Otherwise you end up susaining both new and old, and you software sustaining costs just continue to increase. Painful lesson learn in my area. Also, watch out for FOSS. It is wonderful during development, but is turning out to be expensive to sustain. Finally, make each new project come in with a business case. Business case shoudl be reviewed each year to make sure it is still a plus.

    Kenny Meyer
    I think the real challenge is that NASA software neither budgets or schedules are not realistic. The impact of politics and unnecessary risk aversion make all but impossible to meet the very sensible things you suggest. In other words, if we had had the proper political and managerial support, we could replace the legacy code. If only...

    Ron Dockal
    Agree with your comments on the budget and schedule. The problems I keep seeing is that many folks have never had any training. Developing a Basis of Estimate (BOE) instead of just guessing and estimating the cost is one of the problems.

    Kenny Meyer
    Question... What's FOSS?

    Ron Dockal
    FOSS is Free and Open Source Software. Many developers will find a piece of FOSS, download it, and integrate it. This is done instead of writing the custom code or buying a piece of COTS code. It gets the function quickly, but... Over time, there are security issues with the FOSS, upgrades, miniml support for it if you encounter problems later, mutliple version to sustain, and so forth.

    ReplyDelete