Wednesday, July 31, 2013

Sliding into the inferno

From Chapter 2: The Mythical Man-Month

Excerpt
...when schedule slippage is recognized, the natural (and traditional) response is to add manpower. Like dousing a fire with gasoline, this makes matters worse, much worse. More fire requires more gasoline, and thus begins a regenerative cycle which ends in disaster. (page 14)

The problem is that modern charcoal, manufactured under strict consumer-safety guidelines, is one of the least-flammable substances on Earth. (from "Need a light" by Dave Barry, The Miami Herald, June 25, 1995.)

In a volume full of indispensable insights about software development, this is perhaps Brooks' most important. When the schedule is slipping, adding staff will make things worse.

How do you know the schedule is slipping? It may not be obvious. Can you rely on reports from programmers? Maybe. How do you know that the separate dependent pieces of code will integrate when delivered? You don't!

Teams that use modern development processes have a better idea of how progress matches plan, but that knowledge is imperfect. In my experience, yesterday's eminent disaster is inevitably replaced with today's catastrophe and tomorrow's surprise1. There's always something that may be nothing.

Since a reported slip will likely trigger 'help' from project management, a savvy NASA software manger will delay reporting a schedule problems until they are an inescapable certainty. Software is not unique in this regard; the same holds true for the other engineering disciplines. The first manager to report a slip does so at her own hazard. However, once the slip is official, everyone gets breathing room. The key is hold-on until someone else caves. The result: the schedule doesn't visibly slip as the project slides into trouble.

When the slip comes, it is expensive and consumes valuable political capital. At that point schedule margin is gone and the work must be accelerated. That's when gas gets poured on. But it's a different kind of fire than Brooks describes, one with more smoke and less heat. Programmers are rarely added. Here's a few of the obstacles a NASA software manager might confront:
  • You can't just drive over to Programmer Depot and find staff. The tools NASA uses are specialized.2
  • It takes a while to come to grips with the domain. You have to grok the existing code base, the hundreds of requirements and the stack of standards volumes from organizations like CCSDS . A programmer's ramp-up to productivity takes many months--too late to even consider hiring by the time the gas starts flowing.
  • The hiring bureaucracy is stubborn and slow. e.g. It takes at least 6 weeks for a very pushy manager to bring in a contractor. The paperwork is daunting--the acquisition organization is a leviathan.
  • Lead developers are rarer than hen's teeth. The culture does not provide the opportunity for programmers of talent to acquire the necessary design or leadership skills. Worse, the good ones lose interest and move on to Google or Microsoft. (I hope to take up this topic when looking at Chapter 3, "The Surgical Team.")
Nonetheless, there is a staff up. Instead of programmers, project management will add testers and systems engineers to close out 'problem paper.' There's a lot's of problem paper. Management places a very high priority on closing out problem paper prior to launch. If there is a mission failure, open paper is evidence of negligent management. The closure process becomes a large, if short-lived, cottage industry.

The closure process is intensive and the closure team requires a lot of interaction with the developers. In effect, the development team is pulled into the closure processes, and, aside from fixing a few of the most significant defects, development grinds to a halt.

However, all is not lost; the schedule can be met. Most of the remaining work will be postponed until the operations phase where the development costs are hidden. These upgrades may continue for years. In the end, the team is left with a piecemeal system, that is brittle and costly to maintain. But since there's no competition, what-you-get is what-you-get. This process is consider normal.

If you happen to be a forward-thinking, inspired software development person with a penchant for building the next generation of spacecraft, you've just slid into the inferno.

But...
What if there was actual competition where quality mattered? Would the result be different?



1 Engineers loathe the unexpected.
2 The shuttle engine software is written in Jovial.* The pool of skilled Jovial programmers dried up in the early 80's. Apparently, a Shuttle Engine programmer was required to train for 2 years before being allowed to touch the actual flight code.

*This footnote is incorrect. The shuttle software was written in HAL/S. I discovered the era while preparing a subsequent posting many months later (April '14.) I had believed that shuttle software was written in Jovial since I'd heard that in briefings during the Constellation Program. Interestingly, Jovial was used in defense systems that were developed during this era. For example, the initial B-2 software was written in Jovial (see http://www.semdesigns.com/Products/Services/NorthropGrummanB2.html)

Saturday, July 27, 2013

And what would you like with that mythical man-month?

From Chapter 2: The Mythical Man-Month

Excerpt
....because we are uncertain of our estimates, software managers often lack the courteous stubbornness of Antoine's chef. (page 14)
Brooks has grabbed the colors of uncompromising-quality from the chef at Antoine's, the famous restaurant in New Orleans. Brooks argues that software, like Oysters Rockefeller, Pompano en Papillote, Eggs Sardou or Pigeonneaux Paradis, can't be rushed. He conjectures that if software managers had reliable estimation methods, they, too, could argue that quality is worth the wait and confidently admonish the customers to cool their heels.

If he worked for NASA, Brooks would learn that even a 100%-certain estimate could not buy him schedule--even if he was as stubborn as a tabasco stain on a white tablecloth. If he tried, he'd soon be in the scullery watching Ronald McDonald's apprentice running the kitchen. Why? The sponsors (i.e. customers) are using someone else's money to buy stuff whose quality is difficult to measure. What's more, both buyers and sellers operate in a bureaucratic context under the failure-is-not-an-option mandate.

Of course quality matters, but it is not the primary concern; especially at the beginning of the project when the plans are set. An experienced and dispassionately-rational software manager in NASA knows that there will be a magic moment in the project when the management flinches and the credit card comes out. If we need Moet Chandon Dom Perignon , bring it on! Until that time, you'd better stick to even the most fantastic schedules and budgets or it'll soon be time to pass the toque.

There's hardly a better recipe for cooking up the mythical man-month. However, if you consider the NASA business model, these unrealistic schedule and budget plans make sense.
For example, the following factors might weigh on the mind of a NASA program or project manager:
  • Launch windows are based on physics. Missing a launch window is really expensive. For example, the 2-year slip of MSL cost hundreds of millions of dollars.
  • Government funding is fickle. What's provided today will be taken away tomorrow and restored the day after.
  • In the culture of consensus, missing a deadline or overrunning a budget will rarely cost a management job. Responsibility is shared.
  • Total cost of ownership is never a factor. There is always more money for software development after the spacecraft is launched.
  • Success in NASA is ostensibly determined by meeting science goals. However, as the government's most visible investment in the country's future, NASA provides an administration (arguably NASA's real customer) with one of the best sources of positive PR. Stage a good space drama with a happy ending, the customer is happy and the cost will be forgotten.

From a project management perspective, what's needed is the right estimate--the one that gets the job. If you get the job, the extra money will come. No amount of courteous politeness or better estimation will convince a project manager to accept a plan that kills a job--especially when the funding can be captured in the later stages of the project. Besides, NASA will always have money. If the budget balloons, the consequence is just the delay or cancellation of other projects. Life goes on.

As for being the chef, the most senior NASA software managers might be likened to a kitchen assistant. Until or unless the sponsor choses a different objective, NASA will remain a science-focused, hardware-centric organization. Meanwhile, the chef is king in the kitchen; software does not control the menu.

And what, sir, would you like with that mythical man-month?

Monday, July 22, 2013

Is it done yet?

From Chapter 2: The Mythical Man-Month

Excerpt
...our estimating techniques fallaciously confuse effort with progress hiding the assumption that men and months are interchangeable. (page 14)

It's official now. The fallacy of confusing effort with progress is has been deemed a good management practice. It's called "Earned Value Management" or EVM. EVM is now required for DoD and NASA projects. In fact, both DoD and NASA have entire divisions devoted to the practice and enforcement of EVM.

In a nutshell, EVM is a schedule and cost tracking technique for measuring progress as planned scope and planned cost against actual progress and actual cost. The overall scope of a project is called its planned value (PV). The PV is the sum of the PVs for the individual tasks that compose the project. Progress towards completion is called earned value (EV). EV is acquired according to management-defined earning rules. If a task is reported as 50% complete, it acquires half it's EV. If that task represents 25% of the entire project, then the project acquires 12.5% EV. In other words, the project is 1/8th done. When the project EV equals 100% of the PV, the project is, by definition, done.

The EVM notion of progress is based on the tacit assumption that there is a fixed relationship between completed scope and budget.1 Consider what happens if there's a scope change. If there is more scope, have you suddenly earned less? If there's less scope, have you already earned more?

In essence,the EVM measure of progress begs the essential question: What does it mean to be done?

So happens that determining when a software product is done is a really hard problem. Is it done when the requirements are met? What if the requirements change? Is it done when a version is released? What if the product is missing key feature or has key defects? Is it done when the product goes into maintenance? How do you decide what's a fix and what's a feature? Is it done when the product is no longer in maintenance?2 How would you differentiate between done and dead? Or is it just done when the money runs out?

Good definitions of scope are usually expressed as requirements 3. During the planning phase of a task, a good software team will define the system requirements and then select a subset that fits schedule and budget as the definition of done.

Sound like a good basis for EVM? Not in my view. Here' a few reasons why:
  • Requirements change. Why? You get smarter as the system is built. Ambiguities need to be clarified. Design considerations drive compromise. Estimates are wrong. Priorities change. Customer needs change. Budgets change. Schedules change. Surprises are a fact of life.
  • Estimation of development effort based on requirements is very difficult to get right. (And, not only for the reasons described in the previous post called In the beginning....) Gauging the effort takes a strong intuition, deep technical knowledge and lots of practice.
  • Not all development methods are equal. 'The last 10% takes 90% of the time,' was the by-word when the waterfall development model was the standard. The newer iterative/incremental models, like Agile, provide a much better indication of project progress because the systems is continually integrated and tested. Surprises come early; you know where you are in the schedule and can adopt accordingly. Unfortunately, that means the initial PV and the recorded EV are meaningless.
  • Earning rules that allow EV for partial progress invite schedule surprise. In my view, software is not done until it has been tested successfully--until that point, you don't really know how close a task is to completion. It's 100% or nothing. However, in an EVM managed environment, if you don't provide an estimate of partial progress, you won't show any EVM progress until you are done. That doesn't look good to the bean counters. Consequently, almost everyone reports partial progress leading to highly unreliable estimates of actual progress and last minute schedule slips.
  • EVM inhibits needed schedule changes in response to the actual state of development. Scope change invalidates PV and previously reported EV. It also triggers EVM rework by the administrative team and unwelcome review by a harried project management who is weary of the surprises coming from the software development teams. Consequently, an experience software development manager may maintain one schedule for EVM reporting and one schedule for tracking the real work. Evil lurks.

From what I have observed, EVM is not just an innocuous drain of resources. A manager whose EV falls below the curve may lose funding. In a famous incident, the Navy's A-12 Avenger program was cancelled because of EVM data showed the project was in trouble. Even worse are the mendacious survival behaviors that have and will evolve as a EVM workarounds.

I am not advocating that software development be done without accountability, but EVM institutionalizes a technique that "fallaciously confuse(s) effort with progress." Ironically, progress can be accurately and intuitively understood by simply using iterative/incremental development methods that methods don't fit well with EVM.

EVM is just one the many new institutionalize obligations that have taken hold in the agency during the last decade. In an effort to develop policies that ensure efficiency and rigor, NASA has added cost and spawned management techniques that operate in the shadows.

1 Without out this assumption EVM would only report how much budget was left or how much time was left on the schedule.

2 Ironically, at that point the product has very little value because no sensible user would invest in an orphaned software product unless they were planning to continue development and maintenance on there own nickel.

3 Where requirements could be expressed as 'shall' statements, use cases, test cases, scenarios, story points or any form that describes what the software will do when its done.

Wednesday, July 17, 2013

In the beginning... there is the estimate

From Chapter 2: The Mythical Man-Month

Excerpt
...our techniques of estimating are poorly developed. (page 14)

Adapted from "Creation" by Phillip Medhurst
In NASA, software managers obtain funding for software projects by writing proposals. The proposals are high-level descriptions of the task that include both cost and a schedule estimates that will govern the job.1 These estimates are typically insufficient and will need augmentation. Enter the mythical man-month.

Underestimation is not a new problem. In chapter 8, 'Calling the Shot,' Books suggests a remedy. He outlines an engineering approach to estimation using a parametric cost model. A lot of progress has been made since then. We now have sophisticated analytic, cost-estimation tools like COCOMO and SEER-SEM . Both are commonly used to gauge proposed NASA and DoD budgets. Both use parametric models based on the software size, modifying parameters and consults a substantial dataset derived from 50 years of carefully scrubbed project data. For example COCOMO allows the user to dial in parameters for the estimated lines of source code (often called SLOCs), required reliability, anticipated complexity, performance constraints, and team experience. You feed in the parameter values and the tool spits out a cost.

I've used COCOMO and SEER-SEM -- at least I've worked with an expert who knew how to properly set the parameters. The results were credible; they matched my thumb-jump estimates. And, since the models used historic project data, there was a plausible basis-of-estimate.

Sadly, the use of COCOMO was mere makework. In practice, the submitted estimates were based on institutional conventions that relied on in-house cost models. In my view, these in-house models codify the underestimation that characterized earlier projects.

Why would an organization adopt an approach that leads to chronic under estimation? There are more reasons than you can shake a stick at, but they are all the rational outcome of the cultural, financial and engineering forces native to a large government-funded bureaucracy. Here's an illustration of how those forces might work:
  • A proposal manager will strive to contain cost. Government cost guidelines or cost caps must be met or the proposal is a gonner.
  • The proposed system will be assumed to have the same requirements as the last system. This will be perceived as cheaper and less risky.
  • The separate engineering disciplines will compete for a funding wedge in a zero-sum game. A favorite will emerge that adds science value.
  • A high degree of software inheritance will be assumed. Since nothing new is needed, the estimate becomes the cost of the last project.
  • Since the inherited software already exists, the proposed software costs will be expected to be cheaper than the last project cost.
  • The estimation process will repeat in a cost-cutting spiral until the engineering teams agree to a budget that is uncomfortable. The staff must be fed.
  • Cost models will be used, but only to demonstrate that that a reduced cost is credible. (Skillful use of the parameters, particularly the reuse parameter is valuable).
  • Experience shows that approved government contracts come with a wink and a nod. The government sponsor is more likely to add budget late in the project when program success is on the line.
If the proposal wins, the events leading to the appearance of the mythical man-month might go like this:
  • The project manager keeps reserves as long as possible in the likely event of last minute cost surprises.
  • The project customers make changes that drive requirement changes. Some change are important; some are frivolous. All must be addressed.
  • Salary and overhead costs will have increased since proposal submission. The budget remains fixed, so the work must be done by a smaller workforce.
  • Government cuts lead to project cuts. Then each discipline will fight push the cuts elsewhere in the project. The management must account for the hard fact that without hardware there is no system. The choices are limited. Software and operations budgets are the best candidates.
  • The software for the new project will not be the same as the last project. The platforms will be different. The new requirements will impact the design. Additional resources are needed.
  • The schedule remains fixed to meet the launch date. The engineering teams do their best to make up the differences with long hours.
  • As the crisis become apparent, reserves become available.
  • Management invokes the obvious remedy: add staff
I've heard different costing experts say that proposed estimates are typically off the actual costs by 50% or more.2 Is it any wonder? In my view, good cost estimation is a black art--the problem has too many unconstrained degrees of freedom. The bigger the project, the greater the cultural and financial forces, the more degrees of freedom, the blacker the art.

I have little faith that better tools will lead to better estimates. The tools merely serve the institution. However, available budgets would be allocated more effectively if project management operated as if the mythical man-month was not a myth.

Easier said than done.

1In some sense cost and schedule are two sides of the same coin. In practice they are not. Each estimate is constructed against different constraints.
2The actual hours worked are not reported and the actual man-months are not known.

Thursday, July 11, 2013

Brooks' Law

Brooks's Law:
Adding manpower to a late software project makes it later (page 25)

From Chapter 2: "The Mythical Man-Month"

Excerpt
The law is pithy and clear; the graph illustrative. In the text, Brooks provides a cursory sketch of causes and some descriptive math to depict the relationship between labor and time. It's a tidy summary of what I've repeatedly experienced. It reaches into the heart of what traditionally has been called "the software crisis".1

There's no point in summarizing the chapter for the umpteenth time. The 'mythical man-month' has been a topic of research and discussion for decades. There's a zillion blogs, and journal articles. Rather, I plan to focus on my experiences coping with budget and schedule constraints, the underlying reasons for why they are what they are and how I managed overruns and slips.

A personal note... I can't read the chapter without a tightening of the gut. I was no match for the forces that drive the law. As a colleague often suggested(quoting a poster from Despair.com) "Perhaps the purpose of your life is only to serve as a warning to others." Sadly, I was succored by the struggles around me; I saw little objective evidence that any my colleagues or acquaintances were doing better. (However, the most successful were much better at managing perceptions--a topic for another day.)

I'll close this entry with an admonition for any software development manager with budget authority: you owe it to your team to read Chapter 2.

1 If you have a chance, I recommend also reading Dijkstra's 1972 Turing Lecture: "The Humble Programmer".

Tuesday, July 9, 2013

Entering the Pynx

The Pynx was the place in ancient Greece where Athenians gathered for popular assemblies. It is credited as being one of the most important sites in the creation of Athenian democracy.

For the past few weeks I've been seeking feedback for the blog from about a dozen ex-colleagues. Since these postings refer to events as I recall them, it seemed prudent to get feedback from people who were also there. After all, like anyone, my memory is imperfect, my interpretations biased, my judgment flawed. If the postings are more like a lunatic ramble that an measured reflection, better to find out sooner than later.

Not surprisingly, not a single comment was entered in the blog. These people are busy. However, the responses in email, phone calls and during lunches was mostly positive. Some expressed concern about repercussions. Some felt there was much more to say on each topic. Some felt the focus on space systems was too limited. Others had advice radically different (albeit thoughtful) interpretations. No one admonished me with an alarmed concern about my mental health. On the contrary, all said there need for a forum to discuss software in NASA.

I figure this is as good a place to start as any. As of today, I opened up Reflecting on the 'Mythical Man-Month' in the Space Age to public access. Maybe the ecclesia will be there.

Next: Reflections on Chapter 2: The Mythical Man Month

Tuesday, July 2, 2013

Reflecting on the Tar Pit

From Chapter 1:


So far, this blog has just been focused on the familiar and weary tale of struggle in the Agency tar pit. As yet, no remedies or palliatives have been uttered. Nonetheless, our lofty goal is to suggest remedies or, at least, alternatives that might help the next guys built the large-scale, affordable, reliable software needed for the next generation space system.

It's only remotely possible that any suggestion made here would be of true or lasting merit. But there is a remote possibility--in fact, an inchoate idea or two is lurking in the dark recesses as a muddle. Nothing new. Nothing fancy. Nothing elaborate. Nothing easy. If the answers were easy, we would have already escaped the tar pit.

Before moving on, I'd encourage all to ask, why has so little has changed? And, it's not just NASA.

I'm reminded of a snippet from book by Amory Lovins had written in the 70's. He was urging the power utilities to consider the use of conservation as alternative to new power plants as a means of meeting projected energy needs (a radical idea at the time). ...Belief in economies of scale leads to ever larger and more centralized facilities... Such an energy future, is short, is merely the past writ large."

It seems we're facing the same mindset. Given the current course, we plan to meet our software engineering future by doing the same thing, only more of it. If that worked, would the Mythical Man-Month still be a relevant description of contemporary software engineering problems?

Lovins had a suggestion, a theory, a tractable alternative. I don't think the same could be said for any new approach that would facilitate the engineering of a large-scale, affordable, reliable software for the next generation space system. Not yet anyway.

Monday, July 1, 2013

Producing yesterday's product today

From Chapter 1:

Excerpt
The last woe, and sometimes the last straw, is that the product over which one has labored so long appears to be obsolete upon (or before) completion. (page 9)

Brooks completes his list of "inherent woes" with the inevitable fact of obsolescence. While the developers were heads-down grinding out the software, the competition has hopscotching ahead with new features and the platforms and tools have evolved to the next thing. It's just a fact that software is always out of date the moment it's completed. In the fast moving world of software development, nothing stings the programmer like hearing there's already something better out there and they are working in the past. In closing, Brooks offers a mature (but unsatisfying) managerial perspective: better is better, but the product has got to get out the door and provide "real solutions to real problems."

I have repeatedly experienced this very woe. The temptations are great. There's always a new shiny thing promised by Sun/Oracle, Microsoft, Apple or Wind River. And then there's that great idea the customer wants which could easily be crammed in (maybe without telling management.) Why not just get that in now? The pursuit of perfect software can become a race on a gerbil wheel. At some point a developer will weary and be done with the software, but the software is never done.

Perhaps it will be a surprise to learn that a NASA software engineer is hardly, if ever, plagued by obsolescence. Developing for a mission is a heads down business with little time for the temptations of shiny new things. It's life in a technology bubble. Competition is not a factor--only a need to demonstrate unique requirements. The newest flight software runs on decade old processors (rad-hard design rules) and flight-software design is governed by strict inheritance. For ground systems, maintenance is the watchword since improvements are prohibitively expensive due to brittle, legacy architectures. But, most importantly, there is a conventional wisdom that new is risky. (It's a conventional that deserves scrutiny.) So, if you happen to be a developer working on mission software, there will diminished prospects for meaningful innovation--obsolescence is a fact of life; not a problem.

In the "Aristocracy, Democracy and System Design" chapter, Brooks argues that programming is creative. In fact, programmers will create. That's what they do. So what happens when programmer creativity meets insularity and a risk-averse management? Programmers will create anyway, but in the context of this insular technical milieu. The result: there is a cadre of very talented people busily reinventing technology 5 years behind the curve--innovation on the trailing edge.

As a development manager, I often tried to advance the use of new technologies that would help us build better systems. Most attempts were thwarted for the reasons described. But what troubled me most was that my peers, who shared my frustrations, accepted the status quo as the way things would always be.