Thursday, September 12, 2013

Orders of Complexity

From Chapter 2: The Mythical Man-Month

Excerpt
...testing is usually the most mis-scheduled part of programming...Failure to allow enough time for system test, in particular, is peculiarly disastrous. Since the delay comes at the end of the schedule, no one is aware of schedule trouble until almost the delivery date. Bad news, late and without warning, is unsettling
to customers and to managers. (page 20)

From A Spiral Model of Software Development and Enhancement1

"Stop the life cycle-! want to get off!"
"Life-cycle Concept Considered Harmful."
"The waterfall model is dead."
"No, it isn't, but it should be."
On to Brooks third, and last, 'fallacious thought mode': Unwarranted optimism lead to inadequate test schedules.

As a remedy he suggests the development schedule be allocated as follows:
  • 30% Planning should represent ~30% of the schedule
  • 20% Programming
  • 50% Testing
This may be one area where Brook's commentary is dated or at least peculiar to the development of a new compute platform. Three objections come to mind: 1) the implied use of the waterfall development model , 2) the suggested length of the schedule, and 3) underestimation of the testing challenge. I'll try to tackle each in turn.

Waterfall development

The idea behind the waterfall model is simple. First you do analysis (i.e. requirements), then you do design, then you do implementation, then you do test. After that you deployment and keep the system alive and healthy with maintenance.

On the surface, the waterfall model is very desirable. It's easy to comprehend. Budgeting and scheduling are straight forward. Progress is obvious. Best of all, the approach lends itself to the currently-fashionable, CMMI-fueled metrics craze--especially EVM. (see Is it done yet?.)

The only problem with the waterfall model is that it doesn't work. Software development simply does not function that way. There's too much to discover in the doing. It was the waterfall model that gave rise to that famously captious platitude, "The last 10% of the schedule takes 90% of the time."2

In 1986, Barry Boehm published highly influential article1 that suggested a spiral model for software development.3 In a nutshell, you might think of spiral as a bunch of mini waterfalls wrapped in a waterfall. The point is that the analysis, design, implementation and test is integrated in a repeating cycle. This allows the inevitable discoveries to be incorporated into the development process. Boehm's article changed things forever in the software community--I do not personally know a skilled software developer who currently prefers the waterfall model to one of the spiral models.

From a budgeting and scheduling perspective, spiral models do not isolate test as a distinct phase. A good team will incorporate testing every part of design and implementation effort as well as dedicated test phase prior to delivery. Testing is just part of the development mélange. There will not be a testing milestone that culls out 50% of the schedule. Developers who work for managers that insist on milestones that represent a 50% test schedule will need to keep up on those handwaving exercises.

Schedule length
Starting and completing a development cycle is expensive. A good programming team will develop momentum, so it is advisable to provide a programming phase that gives the team a chance to work up a head of steam and make substantial progress. However, the programming phase (and ultimate product release) should not so long that the customer requirements change or the sponsorship interest waffles. For major release of a infrastructure or embedded application, I discovered a 16-week programming phase a sweet spot for a 6-month release cycle.4

For the sake of argument, assume a 16-week programming phase. Using Brook's recommendations the breakdown would look like this: 24 weeks of planning, 16 weeks of development and 40 weeks of testing. That would be an overall schedule of 80 weeks! Heck, the Empire state building was build in less time than that. If my experience is indicative, a development effort that goes a year and a half without a major release is doomed. Why? Platforms change, security requirements change and customer requirement change and, most importantly, the sponsor's priorities change. In other words, a schedule with based on the '50% test rule' is unrealistic.

Testing Challenge
To first order, there are two types of testing: tests that check if a product meets requirements and tests that check if the product performs correctly. The former is know as validation; the latter is known as verification (V&V in the common parlance). I do not know of a single case where a non-trivial software product was completely validated or verified.

Consider what's entailed.

Validation requires the testers develop tests that prove a requirement is met. For example a typical requirement may say that that 'the system shall authenticate a user.' Simple enough, but imagine all the unstated possibilities. What institutional infrastructures should be supported? Is it authentication with using LDAP? Active Directory? An custom adaptation layer? In theory, functional requirements and design specifications will spell out all the ambiguities; in practice they the never will. As a result, the process of test design is interpretive and begs two key questions: Is the interpretation correct? What is omitted?

There's a greater test challenge. The tester is asked to determine if the program can enter an illegal state. From a purely software perspective, the test might check to see if the program runs out of resources, corrupts memory, or violates a process deadline5. Checking all the possible system states is simply impossible with current practice.

For space systems, there a much greater challenge, the tester would need to check that the software will never put the entire system into an illegal physical state, i.e. a state that causes mission failure. For example the tests should check that the software will not allow for a power failure, uncontrollable tumbling of the spacecraft, or ramming an instrument platform into a solar array. There are a very large number of permutations in the antecedent conditions that might lead to an illegal system state. If that wasn't a big enough challenge, the space system must be tested in simulated conditions and the simulations themselves are subject to the same test challenges.

In other words, no matter what the schedule allocation, complete testing a space system is impossible. As a practical matter, testers focus on a few well-defined operational scenarios and ensure the system works in just those limited conditions. Given the highly constrained NASA software budgets testing is really just a best-effort.

For the current generation of space systems, our development methods, scheduling and test practices are good enough. But these practices severely limit what can be accomplished. Autonomy is primitive. Operations expensive. Reliability suspect. However, if we are going to build smart systems with capabilities that can accomplish complex operational goals, we will need to develop methods for handling additional orders of complexity.


1. Boehm B., A Spiral Model of Software Development and Enhancement. ACM SIGSOFT Software Engineering Notes. Volume 11 Issue 4, August 1986. Pages 14-24

2. Sometimes coined as the ninety-ninety rule.

3. 'Spiral' was just the start. Since then there has been a string of refinements: Objectory, Rational Unified Process, Team Software Process, Extreme Programing and Agile. Each with a different emphasis; each offering a panacea. Over the years, I never met an experienced developer who preferred the waterfall to these later developments.

4. Shorter release cycle are very desirable minor upgrades on well understood systems. However, if system failure can cause mission failure, more rigor is required. A topic for another day.

5. For a real-time system, meeting a process deadline is necessary for program correctness.

No comments:

Post a Comment