Friday, September 12, 2014

I will return

The Sierra hiking season is over—for me at least.  I'm not one of  those hearty souls who think little of marching over the passes in all conditions. It's good to know your limitations.

I'm looking forward to wending my way through " Aristocracy, Democracy, and System Design,"the next chapter of the M-MM, but not before trying to tackle a couple other projects including a blog of this summer's Sierra hiking adventures. Till then...

Tuesday, July 1, 2014

Smart guys

From Chapter 3: The Surgical Team

Excerpt

The success of the scaling-up process depends upon the fact that the conceptual integrity of each piece has been radically improved — that the number of minds determining the design has been divided by seven. So it is possible to put 200 people on a problem and face the problem of coordinating only 20 minds, those of the surgeons...Let it suffice here to say that the entire system also must have conceptual integrity, and that requires a system architect to design it all, from the top down.
(Ch 3. p. 36-7)

...Conceptual integrity in turn dictates that the design must proceed from one mind, or from a very small number of agreeing resonant minds.(Ch 4. p. 44)

From The Republic1

Excerpt
...neither cities nor States nor individuals will ever attain perfection until the small class of philosophers..are providentially compelled, whether they will or not, to take care of the State, and until a like necessity be laid on the State to obey them...
— Plato, around 380 BC, Book VI. (499b)
2
Building a large-scale-software system, like a spacecraft, requires a lot of people. That's a problem. According to Brooks' Law the number of communication paths grows exponentially with the number of programmers. Hence the project manager's grand challenge: ensure that hundreds of engineers from different disciplines are all working with the same principal goals in mind. Good luck. Nothing is more dismaying than another-vague-management-mandate while the clocking is ticking and the schedule is evaporating. Fact is, most engineers don't grok the deep concerns outside of their domain and don't care so long as they have sufficient guidance and schedule to meet their deadlines.

Brooks has a plan to remedy the imminent communication ills prescribed by Brooks' Law. Put a smart guy in charge. (See Surgical Team.) Why? Because a smart guy becomes the sole decision maker and the system becomes the "product of one mind." Except for the odd case of a multiple personality, this simple reduction declaws the exponent.

There's a long standing precedent for putting a smart guy in charge. Ancient Greece has it's Archons3 and the Roman Republic had it's Dictators4. Consolidation of power has its place; it also had its risks.

Plato witnessed consolidation of power at its worst. He was a student of Socrates during the rule of the 30. The 30 were a group of aristocrats installed by the Spartan general Lysander after he defeated of the Athenian Navy (404 BC); a defeat the finally ended the Peloponnesian wars. The 30 were charged with dismantling the Athenian democracy and replacing it with an oligarchy. They were brutal, merciless and corrupt. They confiscated property, they exiled opponents and executed others. By some accounts the 30 executed 5% of the Athenian population.

The 30 lasted but a year. Democracy was restored to Athens (403 BC) , but there were scores to settle. No one was immune. Socrates, who had quarreled with nearly everyone, had his enemies5. Every Athenian citizen had the right to initiate criminal proceedings. A poet named Meletus brought the charge of 'corrupting youth' against Socrates. Socrates would be tried by a jury of 500 farmers and receive a death sentence.

Politeia beginning. Codex Parisinus graecus 1807. "Politeia beginning. Codex Parisinus graecus 1807" by Plato - Henri Omont, Oeuvres philosophiques de Platon: Facsimilé en phototypie, à la grandeur exacte de l’original du Ms. grec 1807 de la Bibliothèque Nationale. Paris 1908.. Licensed under Public domain via Wikimedia Commons.
Title page from oldest
manuscript of The Republic
These tragic events would stick with Plato. 20 years later, he would write The Republic, his sweeping treatise describing a just and virtuous utopian state ruled by a philosopher-king who "loved truth above all things."
Until philosophers are kings, or the kings and princes of this world have the spirit and power of philosophy, and political greatness and wisdom meet in one, and those commoner natures who pursue either to the exclusion of the other are compelled to stand aside, cities will never have rest from their evils, nor the human race, as I believe, and then only will this our State have a possibility of life and behold the light of day.Plato, The Republic, Book V. (473c)

Plato saw these philosopher kings as the product of new, improved society. They would be the offspring of carefully selected parents. They would be sent away from their families at an early age to be rigorously educated in all the arts and sciences. They would learn to resist temptation and stand firm. They would hold military office. They would serve out of duty, not glory, and distinguish themselves in all things. They would be fully dedicated to the well-being of the state and the public good. Above all, they would be guided by philosophy. If they should survive to 50, they would be ready to govern.

The idea of the philosopher king has animated the thinking of political theorists from Cicero to Nietzsche. For good reason. Concentrated authority tends to be efficient, effective, focused. Dispersed authority tends to vacillate and respond to crises without conviction.

That's the theory. In practice the utopian-minded governments tend to be the cruelest of the cruel. A line up of 20th century tyrants could man a parade of evils. Back in the days before political correctness, when I studied the Republic, philosopher kings were out of favor. Oh yea, there was the amazing Marcus Aurelius, but his accomplishments pale in comparison with idealist tyrrants like Franco, Lennin and Mao. I was coached in the admonitions of Karl Popper.

Consider this passage:
Sir Karl Raimund Popper. By Flor4U (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
Karl Popper, Viennese Philosopher (1902-1994)

From The Open Society and Its Enemies

Excerpt
What a monument of human smallness is this idea of the philosopher king. What a contrast between it and the simplicity of humaneness of Socrates, who warned the statesmen against the danger of being dazzled by his own power, excellence, and wisdom, and who tried to teach him what matters most — that we are all frail human beings. What a decline from ...Plato's kingdom of the sage whose magical powers raise him high above ordinary men; although not quite high enough to forgo the use of lies, or to neglect the sorry trade of every shaman — the selling of spells, of breeding spells, in exchange for power over his fellow-men.—Karl Popper, Chapter 8, "The Philosopher King."

So why all this high minded Plato and Popper? They point to a signal failure in Brooks' remedy. He grossly underestimates the challenge of preparing and selecting the right sort of person in a position of leadership. Of the dozen of engineering leaders I met and worked for in NASA, I can count on one hand, with a couple fingers left over, those who had the wisdom and maturity for the positions they held. But that will always be the case when the politics of "selling spells" governs the allocation of authority.

So what then might we hope for in chiefs equipped with the magical powers of enlightened leadership and worthy of the surgeon's scrub? Here's my short list:
  • They are versed in history and philosophy. They've internalized the biographies of famous leaders. (OK, this is a strong personal bias.)
  • They trust their team.
  • They operate from a deeply understood set of first principles. They do not have to look in the book to decide what they think.
  • They have an intuition about the implication of a decision without requiring a detailed understanding.
  • They are not constrained by convention
  • They selectively muck around in the details. i.e. they are highly selective, preferring to a small set of key concepts to the scrutiny of minutiae.
  • They inspire the best talent.
  • They create an atmosphere of the possible, not the impossible.

It's a tall order, but people of this caliber exist. I've met them. They worked for DOD-related projects. But the DOD culture has an element of candor—lives depend on it. On the other hand NASA is an underfunded bureaucracy with a 'failure is not an option' ethos. Risk takers and free thinkers are dangerous to a fragile status quo and they don't thrive in NASA. Over the years I saw the most talented engineers pushed to the sidelines while the competent-but-unimaginative and politically-focused soared to positions of technical leadership. Plato's suggestion that "commoner natures...stand aside" is long lost to history.

I prefer to think of it this way... It's NASA! Isn't NASA the very emblem of of what's good and forward thinking of the US? Is it out of line to have aspirations for the US space agency that run contrary to those of a conventional bureaucracy?

While there is no utopias, there are times, accidents really, when talented and right minded people are elevated to positions of authority. Perhaps we should take a page from Plato and contemplate how would we train leaders of this stature. DoD trains soldiers to be leaders. NASA trains engineers to be bureaucrats. Universities train business management skills don't bother with technical leadership. Perhaps it time.

Before heading off to the next chapter (after a lengthy interlude traipsing in the Sierras), I've got a final knock on Brooks' assertion that the Surgical team enables scaling: it doesn't. It merely pushes the scaling problem up a level. For if you have 20 tasks, each lead by a Surgeon, these Surgeons will ultimately have to coordinate and the communication paths will, according to Brooks' Law, proliferate.

The way I see it, you won't succeed on a large project by attempting to reduce the number of communication paths. They come and go as the nature of the work demands. A system that limits communication is worse than no system at all. A different concept is needed.

For my nickel, the solution lies in an architectural approach (or style as Garlan6 puts it) that provides a common language that facilitates genuine communication and provides a set of constraints that gives each member of the team the direction they need without constant, direct consultation. It's a high order, but not without precedent.7

Perhaps the real challenge is to find a means of raising standards in an organization that prizes the status quo.



1. Lectio Divina provides four links for the entire text of the Republic: Part 1, Part 2, Part 3, Part 4.
2. For over a century, the Loeb Library has provided English translations of the original text for amateur readers. Plato, The Republic. Loeb Classical Library, Books 1-5 (#237) and Books 6-10 (#276). 1935.
3. The word anarchy meaning 'no ruler' is derived from archon.
4. In dire time, like the Hannibal lurking on the outskirts of Rome, the Senate appointed a Dictator to a 1 year term. In peace time, Rome was ruled by two Consuls in a power sharing arrangement that required consensus. This power sharing arrangement lead to the one of Rome's worst military disasters at the battle of Cannae where Hannibal crushed a vastly superior Roman army.
5. Critias, a key leader of the 30, had once been a student of Socrates.
6. See An Introduction to Software Architecture
7. I worked once on a architecture and software framework that established a viable multi-discipline taxonomy and architecture. See Mission Data System.

Thursday, June 12, 2014

The peculiar arrangement

From Chapter 3: The Surgical Team

Excerpt
In the surgical team, there are no differences of interest, and differences of judgment are settled by the surgeon unilaterally. These two differences—lack of division of the problem and the superior-subordinate relationship—make it possible for the surgical team to act uno animo. (p. 35)

It's summer; at least for all practical purposes. My attention has drifted north to the trails, cirques and passes of Sierra's. The thought of alpine air and mountain vistas reminds me of the exalting promise I once felt when developing software for space applications before realizing that the actual work takes place behind communication barriers in a bureaucratic miasma. So today, my thoughts have returned to Brooks' Law and the Mythical Man-month.

Brooks has a solution for the communication problem: the "Surgical Team." (See "A fly in the ointment" for a description of the communication problem described by Brooks' Law.)


If you've been reading along, you'll remember there are just two technical contributors on the 10-person "Surgical Team": The "Surgeon and the "Co-pilot." (If not, see "Picking the Archetypes for the Surgical Team".) The Surgeon is the chief. The Co-pilot is #2. The Surgeon's "alter ego." As #2, he or she shares the same responsibilities and skills as the Chief. However #2's responsibility is restricted to that of a sidekick, "thinker discussant" who knows all the code and, maybe even, writes some.

This strikes me as a peculiar arrangement. What exactly is #2 authorized to do on the Surgical Team? In Brooks' scheme: nothing.

For example, what if #2 meanders over to the Editor's cubicle and drops off a few changes to an ICD (i.e. Interface Control Document). Would it be prudent for the Editor to make alterations without being confident they meet the Chief's approval?   Hardly.
The co-pilot is only supposed to tap into team communications
From Mythical Man-Month p. 38 (1975 version)
And what if the Editor has questions? Should she rest easy that #2 has the authoritative answers? Only at her own hazard. And, possibly, the task's since without a clear line of technical authority there is a risk that each team member will develop to an independent concept of the interface. Same goes for the Administrator, Programming Clerk and the rest of crew.

So, in fact, if #2 talks to anyone, his presence probably increases the number of communication paths by N-squared and defeats the very intent of the Surgical Team. That was certainly my experience—I can't recall a time when a reliable decision came out of an audience with a #2.

So, what could reasonably expected from #2?

It's a relevant question for NASA.  There is a pandemic of #2's. Typically, they go by names like "Deputy," "Assistant," or "Associate." There are "Deputy Section Managers", "Deputy Architects" and "Assistant Division Managers." That's not to mention "Deputy Division Chiefs", "Assistant Project Managers", "Deputy Directors", and "Associate Administrators". There are even "Deputy Associates" and "Associate Deputies." The list goes on. Any Chief with a scintilla of actual authority is likely to get a #2.

Magistrado Ampurias P1020724
Some chiefs have limited authority
However, the provisioning of authority does not always rest with the Chief. In fact the relationship between the Chief and his #2 may be the best indicator of the Chief's standing in the woof and warp of power and organizational influence.

For example, a chief may get stuck with a #2 whose real job is to guard the interests of a higher authority and ensure the chief steers the party line. Or, he may be stuck with a fallen  manager who can no longer land a portfolio and must be parked in a desk job. Or, worse, he may be assigned an-anointed-young-turk who carpools with an executive VP.

On the the other hand, the chief may be a "made man" and have actually authority. In this case he may select his own #2 and use him as he sees fit. Here's a few examples:
Some chiefs have clout
(Meyer Lansky did)
Attack dog
The best defense is a good offense, preferably one that preserves the appearance of impartiality. #2 can keep anyone with an opposing view safely on the defensive. For example, just as a Vice Presidential candidate is expected to relentlessly hector the opposition, a worthy #2 will undercut technical initiatives that threaten the Chief.
Henchman
Every job has its dirty deeds. For example, since the Agency does not tolerate failure (see "Failure is not an option"), remedial measures, like blame displacement, are frequently needed. #2 can do that dirty work so that the Chief's hands remain bloodless.
Parade sweeper
Leadership is not merely a matter of opining with one's feet on the desk — especially in a a paperwork hungry bureaucracy. Someone must take on the stultifying job of feeding the beast meaningless paper. Here's where #2 may make his most meaningful contribution.
Spy
Knowledge is power. A smart Chief must know what the troops are thinking. But, a Chief can have no real friends; unfiltered information seldom penetrates the management bubble. There's too much to be gained by flattery and it's too tempting to rail against leadership. #2 can help. She can be a friend, a trusted advocate for the working stiff, and a reliable reporter of the sentiments that lie beneath the surface.
Hand holder

Two heads is better than one; at least when it comes propping up the boss. With #2 at his side, a Chief makes a much more commanding presence. Best of all the Chief can be assured of close support should he be faced with dissent. There's no better hedge against those unsettling moments of insecurity.
Note to Chief: Effective hand-holding requires occasional dissention to preserve the appearance of honest and considered support. Don't be alarmed.


Gaius Gracchus Tribune of the People
Gaius Gracchus, tribune of the people,
presiding over the Plebeian Council
A short digression
It was different once. Back in the Roman Republic1 Tribunes2 could only exercise their authority in person.

Tribunes were the elected representatives of the plebeians (i.e. non-aristocrats that held property). They were charged with protecting plebian interests. However, a Tribune was not a magistrate. A Tribune could not create laws, render judicial decisions or have any of the other powers of a government official.

Rather, Tribunes had the peculiar power of being able to veto any magisterial decision, including decisions made by Rome's highest elected official, the Consul. No one could stop a Tribune. But there was a rub. The Tribune had to physically present while the act was occurring. For example if Marius ordered the execution a political opponent, a Tribune could step in and stop the process. But the intervention had to be in person. No deputy. No Associate Administrator. No house guards. No #2 of any label. Only the Tribune's physical presence would do!

In recognition of the threat to life and limb, Tribunes were sacrosanct. Sacrosanctity meant that no one was allowed to touch a Tribune. Any interference was punishable by death. But there was a rub. The minute the tribune left, the act could be completed. The veto had no legally lasting effect.

Imagine how that might change things today.


And then there's the politically motivated selection of #2. This was the case when Napoleon appointed General Jean Victor Marie Moreau to command his Rhine Army.

Général JEAN VICTOR MOREAU
General Jean Victor Moreau
(1763-1813)
Moreau was a popular member of the French aristocracy and a distinguished general of proven merit. In 1799, the Directory removed Moreau from his command—Moreau's loyalty had been unfairly questioned. It was a period of political instability and guillotine-fueled paranoia. But, the French armies were suffering defeats in the Austrian war; the Directory was desperate for talented field commanders. Moreau was re-appointed, but ill feelings and mutual distrust persisted.

Enter Napoleon. On his return from Egypt in 1799, Napoleon staged a succesful coup d'état that overthrew the Directory.3 Moreau played a key role. As a reward Napoleon named Moreau commander of the Army of the Rhine which lent credibility to Napoleon's fledging reign as First Consul.4

The good feelings would not last. Just months after his receiving his commission, Moreau refused to execute Napoleon's brilliant plan to invade Italy by crossing the Alps.5 Moreau told Napoleon that he should "follow the routes of earlier campaigns." Bad mistake. Napoleon would later write, "It was impossible to overcome the obstinacy of Moreau...He at first refused to command under me...he objected to my plans, pretending that passage of Schaffhausen was dangerous." For the moment, Napoleon did nothing in response to Moreau's insubordination. According to Napoleon, "I was not yet sufficiently firm in my position to come to an open rupture."

Three years later, Napoleon would find the opportunity to dispose of Moreau, but associating him with a Royalist plot. Moreau was forced to flee to the United States, but returned 10 years later to help the Swedes and Russians defeat Napoleon.6

Note to Chief: Don't forget that #2 might reappear as part of antithetical cause. Be nice or be merciless.

Louis Alexandre Berthier
(1753-1815)
The tale of Moreau contrasts sharply with Louis Alexandre Berthier. Berthier was also a highly respected member of the aristocracy who served with Napoleon from the initial campaigns in Italy till the 1813 defeat. Berthier was brilliant, a quick study of military necessity and a master of detail. He was by far, Napoleon's most able staff member. No one was surpassed him for understanding and executing the Emperor's commands.

After Napoleon was shipped off to Elba (1814), Berthier retired. When Napoleon returned for the 100 days, Berthier was refused to rejoin the Grande Armee. Two weeks before the Battle of Waterloo, Berthier died by fall from a window.7,8

Note to #2: Your fate may depend on your chief. Enlist with care.



Being #2 is no picnic. For some it's a rite of passage. For others a last resort. But, whether a deputy is motivated by ambition or self preservation, he is not his own master. So observe your #2. Read the tea leaves. There's a lot that can learned about the real politics of your organization. If you aspire to be Chief, your future may depend on it.

Speaking of the future, I see an indigo sky and a few Sierra mountain passes up ahead.



1. The Roman Republic is traditionally dates from ~500-27 BCE when August founded the Empire.
2. Tribune is derived from tribe.
3. Napoleon's coup was a close call that required a good bit of intrigue.
4. In the beginning Napoleon shared power with two other men: Jean-Jacques-Régis de Cambacérès and Charles-François Lebrun. Within a year, there was a new constitution that consolidated all power in hands of the First Consul. In 1804, Napoleon would declare himself Emperor.
5. The strategy would lead to the victory at Marengo, one of Napoleon's greatest achievments.
6. He died at the Battle of Dresden.
7. Many historians surmise that Berthier was assassinated. Throwing someone out a window is called Defenestration and is a time honored form of execution.
8. Chandler's "The Campaigns of Napoleon was the source for these anecdotes. Chandler, D. "The Campaigns of Napoleon. The Mind and Method of History's Greatest Soldier." Scribner. 1966. p. 269, 308.

Tuesday, April 29, 2014

Buyer's guide for the NASA software catalog

From NASA's Software Catalog

Excerpt from catalog cover letter
From the rudimentary but effective Apollo Guidance and Navigation System that landed the first humans on the lunar landscape to the 500,000 lines of code used to put the Mars Curiosity Rover on the surface of the Red Planet, software has always been at the core of NASA’s mission successes…The technologies featured in this catalog represent NASA’s best solutions to a wide array of complex problems, and they are on offer here to the public for use.
—David Lockney, the Technology Transfer Program Executive
caveat emptor
Last month the NASA Technology Transfer Program announced its new Software Catalog. It's a listing of NASA software that's available for the asking.

This is an impressive list. According to the Press Release (14-102), "NASA is making available to the public, at no cost, more than 1,000 codes ..."

I am not sure what's meant by "codes." Could they mean source code? Could they mean libraries? Could they mean executables? One thing is certain, it's unlikely the release was written by someone who doesn't know much about software. I digress.

The catalog lists over 160 pages of software products. The products are organized into 15 categories including "Data Servers Processing and Handling," "Crew and life support," "Autonomous Systems," and "Vehicle Management" just to name a few. Some of the items look pretty interesting. Here's a sampling:
  • Station/Orbiter Multibody Berthing/Docking Analysis Tool (SOMBAT).
    U.S. Government Purpose Release, Page 109
    Described as a "multibody dynamics and control system simulation tool." If this is anything like the DARTS/DSHELLproduct developed at JPL, there must be a lot of math in this package. I wonder if SOMBAT was ever used for flight. I wish I'd been at the team meetings where serious-minded engineers were gravely proclaiming the importance of SOMBAT. It's a perfect subject for Vi Hart. She sings about a Laserbat. Could there be a connection? (I strongly recommend viewing Vi Hart's marvelous Twelve Tones video.
  • Station Spacewalk Game App
    General Public Release, Page 109
    Described as simulations of Extravehicular Activities (EVAs) conducted by NASA astronauts on missions to the International Space Station. I really wanted to try this one out. To my disappointment, the game crashed two different browsers. Bummer.
  • Real-Time Kidney Stone Tracking Algorithm
    U.S. Government Purpose Release, Page 116
    Described an algorithm that uses focused ultrasound to clear stones from a kidney. The software is a component of the "Rolling Stones prototype;" i.e. you need the Rolling Stones software. (Not in the catalog.) I'd give this offering high marks for its low pun.
  • Rover Software (RoverSW), Version 1
    Open Source Release, Page 116
    Described as a modular, extensible framework for exploration robots. The core provides the for building a service-oriented architecture that uses middleware. Hard to believe this code has flown because 1) it's open source1 and 2) no current flight system, I know of, would waste the onboard cycles on middleware. One day. Still, I bet it is a fun research project and might be just be the thing for a graduate student project.
  • MAVEN Flight and Ground Software
    U.S. Government Purpose Release, Page 116
    Described as a flight/ground package. Here's some substance. The products include simulation models, regression tests and even the flight code that runs on the spacecraft's flight processor. This is a proprietary product of Lockheed Martin Space Systems that's built on the technology developed for JUNO and MRO. It may not be the latest technology, but it's the real deal. I wonder what would happen if SpaceX wanted to license it.
Notice the release categories. Every item in the catalog has been assigned a release category. The categories define access requirements and restrictions. If you want some of this "no cost" software these constraints on release are important.

Here's the description of the release categories from from the catalog:
  • General Public Release—For codes with a broad release and no nondisclosure or export control restrictions
  • Open Source Release—For collaborative efforts in which programmers improve upon codes originally developed by NASA and share the changes
  • U.S. Release Only—For codes available to U.S. persons only, with no further transfer of the software allowed without the prior written approval of NASA
  • U.S. and Foreign Release—For codes that are available to U.S. persons and (under special circumstances)persons outside of the U.S.
  • U.S. Government Purpose Release—For codes that are to be used on behalf of the U.S. government
    • Project Release—For use under a contract, grant, or agreement
    • Interagency Release—For use by U.S. government agencies
    • NASA Release—For use only by NASA personnel and contractors
A quick search of the catalog produced the following breakdown of categories:2
  • Items categorized as General Public Release: 27 (3%)
  • Items categorized as Open Source Release: 111 (11%)
  • Items categorized as U.S. Release Only: 213 (22%)
  • Items categorized as U.S. and Foreign Release: 22 (2%)
  • Items categorized as U.S. Government Purpose Release: 590 (61%)
  • Combined total of the above: 963
In other words, nearly 60% of the offerings are available to the "public" only when they are working on a government contract. Makes me wonder what's meant by "public." Is it my imagination, or does it seem that the promise of "making available to the public, at no cost, more than 1,000 codes ..." is an oversell?
NASA Press Release (14-102)

It's worth noting that under Title 37, any software developed with government funds is available for free, under a Government Use License, to any business or institution doing government business.3 After all, the government doesn't want to pay to develop the same product twice.

More interesting perhaps is the NASA software that is not listed in the catalog. The excluded software should also be available under a government use license. Take for example, the software on the Mars Opportunity Rover; it is not in the catalog. Isn't that ironic since, the Opportunity software is featured in the catalog's cover letter written by the Technology Transfer Executive?4 (see the excerpt at the top of this posting.)

So let's say you find some software in the catalog that you want. Here's a few things to bear in mind:
  • This is not commercial software. The software has probably not been "packaged" for use by a 3rd-party user. i.e. there may not be sufficient information about installation or use. (Not for lack of professionalism on the part of the development team.)
  • The software may not be immediately available. Someone may have to tar it up and do a bit of head scratching to decide what's in the package. Budgets are limited and the engineer who does the packaging may do the work on their own time.
  • Many of the items are "research grade." The software is the product of talented domain experts who work with limited requirements and have little or no budget for independent testing.
  • Technical support may be iffy. If you need support or bug fixes, you will probably have to sign a NASA Space Act Agreement in order to transfer funds to NASA to pay the engineers who would provide support. (No fun here.)
  • Don't expect a dedicated programming team. NASA funding is quixotic. Engineers have to move from project to project and task to task to stay employed
  • Your needs may not be a priority. First dibs on engineering time will always go to mission needs. Unless you have management connections, you may be paying to bring a new developer up to speed or the response may not be timely.
  • Don't count on getting an exclusive license for government projects. The law mandates that NASA must provide the software to anyone doing government work.
  • As a rule, software that's been on a mission has been subjected to much more testing that non-mission software. However, this is no guarantee of quality.
  • Some of the items in the catalog are decades old. The code may be very brittle.

So, before deciding to bet you project on a free NASA product, here's a few questions you might ask:
  • Is the code under maintenance? What is the annual maintenance budget? (NASA budgets are not secret)
  • Is there customer support? Who do I need to contract with to get the service?
  • What's the bug reporting and repair cycle? How do I figure the cost of bug repair?
  • How often are new versions released? How are patches handled; are they coupled with system-wide releases?
  • Are the original developers still available?
  • Are there any restrictions on the licensing? Can the license be exclusive?
  • Was the software used on a mission?
  • Is there a replacement product in the works?
  • What kind of documentation is available?
  • Is there a user community?
  • Are there export restrictions (ITAR or EAR) of any kind?

NASA has a charter obligation to see that its "products and processes...benefit the lives of ordinary Americans and the U.S. economy." This make sense. It's good idea to get the maximum benefit from a government investment.

However, is the oversell necessary? At times it seems that excellence in engineering been replaced by excellence in public relations. At the very least, the catalog reflects an ignorance about software. After all, software is not like a hardware piece-part that still has value after sitting on the shelf a few years. Software only maintains its value if there's a team actively working on the product. If funding for the product is cut, the product dies. If that filter were applied, the catalog might have a hundred items instead of a thousand.

Despite the fact that "software has always been at the core of NASA’s mission successes," NASA does not act like it is in the software business. If it did, the NASA culture would be radically different.

It could be otherwise you know.


1. The International Tariff in Arms Regulations (ITAR), prohibits NASA from distribution flight software without careful review by the State Department. Foreign licensees of flight software must obtain a export license and make regular reports about protective measures to the state department. Not fun.
2. The figures in the list were obtained by doing a case-sensitive search of the catalog for the text of each restriction type. e.g. a search of "Open Source" produced 112 instances including 1 instance in the text. The total may be low because the naming of the restrictions is inconsistent. e.g. not all items include the word "release" i.e. "General Public" may be used instead of "General Public Release." While not exact, the figures are indicative.
3. See Title 37: Patents, Trademarks, and Copyrights PART 404—LICENSING OF GOVERNMENT-OWNED INVENTIONS §404.4 Authority to grant licenses. Note: No license is required when sharing occurs for work that being done under the same task order.
4. Despite the Federal Law, the Curiosity software is closely held. I worked on one project where the MSL management would not permit sharing of the code with a development team doing work under the same task order. I addressed the reasons for this in a previous posting: Sharing is no panacea.

Wednesday, April 23, 2014

Sasquatch, El Dorado and Bug-free Software

Digression from the previous post: A fly in the ointment

From They Write the Right Stuff 1

Excerpt
What['s]...remarkable is how well the software works. This software never crashes. It never needs to be re-booted. This software is bug-free. It is perfect, as perfect as human beings have achieved....
Once upon a time...
there was perfect software
...That's the culture: the on-board shuttle group produces grown-up software, and the way they do it is by being grown-ups. It may not be sexy, it may not be a coding ego-trip -- but it is the future of software.— Charles Fishman, Fast Company, 1996

From Software and the Challenge of Flight Control2

Excerpt
A mythology has arisen about the Shuttle software with claims being made about it being “perfect software” and “bug-free” or having “zero-defects,”all of which are untrue.—Nancy Levison, 2013
After reading the last posting, a friend, from my NASA days, sent me a link to a 1996 Fast Company article by Charles Fishman about the on-board Shuttle team at Johnson Space Center. The article, They Write the Right Stuff describes how the team managed their work. For the remainder of this posting, I'll refer to the article as simply the "...Write...Stuff."

My friend, who is very familiar with how Shuttle software was developed and maintained, was pointing out that the "...Write...Stuff" described the shuttle software team as having a 10-to-1 ratio of programmers to testers—roughly the same ratio proposed for Brooks' Surgical team.3, 3a That's an exception. The NASA software teams I saw were typically staffed with a 1-to-1 ratio of programmers to a combination of system engineers and testers. However, if you included management and process oversight staff (i.e. those who do not contribute directly to developing the product), the ratio was more like 4 to 1.

I found the "...Write...Stuff" disturbingly wrong headed. It portrayed the Shuttle software development methodology as the wave of the future——the very culmination of software development as it should and will eventually be for all-time. Fishman is unapologetic this claim.
As the rest of the world struggles with the basics, the on-board shuttle group edges ever closer to perfect software. — Charles Fishman

Sure, the "...Write...Stuff" was written over 15 years ago. It's a feel-good piece. It's written by a journalist (i.e not a software engineer) for a for-profit publication whose sales depend on stirring the heart of your wistful science romantic who adores tales of shiny things that are technical.

So why take the piece seriously? Because like Sasquatch , or El Dorato, the best tall tales could be true. After all, who knows what's really beyond the campfire. A monster? A city of gold waiting for someone with the will to find it? When understanding is limited, the unlikely seems credible and impossible disprove. Who's to say a tall tale of software perfection is a fiction? Well-meaning people who haven't lived in the woods or traveled the West can be taken in—especially when, they want to believe.

Consider the plight of your typical techno-political bureaucratic manager. His efforts to reign in those software people fail repeatedly. He has endured scores of broken commitments. He now believes but a fraction of what he hears. He is flummoxed. He is living the misery of the 'software problem'4. He hears of a cure all. A new tool; a better process; a silver bullet; a perfect team building the perfect software. He is a captivated. He has a course of action; a fix is in reach. Shuffle a few budgets; levy a new approval process, demand a new document, an additional review, a few more extra process steps.5 Voilà! No more problems. What could be easier?

In other words, articles like the "...Write...Stuff" foster a management expectation that there is a quick fix. When these fixes are imposed on development teams pressing to meet a delivery, they can do real damage. Improvement is hard. It's expensive. It's disruptive. It's risky. Perfection is the stuff of stories.

Space Shuttle Main Engine Hoisted into Test Stand - GPN-2000-000546
Space Shuttle Main Engine hoisted into the A-2 Test Stand, Stennis Space Center  (1979)

An abridged history of shuttle software6
The on-board shuttle group worked on the code that controlled the Shuttle rocket engines. The "...Write...Stuff" portrays them as a highly regulated, strictly-managed group who worked regular hours. These constraints were necessary because software had to work or risk human life.

While that is true, the interesting thing about on-board software is how it came to be. No account of the team's success is complete without accounting for the quarter century of prior effort. To understand why that matters, the history of shuttle software might be instructive.
SSME controller
Space Shuttle Main Engine Controller (2005)
The controlled was attached directly to the engine body
In July 1971, NASA contracted Rocketdyne to build the Space Shuttle Main Engine (SSME). Each shuttle main engines had a pair of dedicated, redundant controllers that were attached directly to the engine. These controllers managed the low-level functions of the main engines like servo control, command data converter, sensor data transmission and fault response. In the initial implementation, these controllers were Honeywell HDC-601s and the software was written in assembler. By early 80's it was clear an update was needed. Rocketdyne updated the SSMe controllers to the Motorola 68000 and rewrote the software in C.

In March 1973, NASA contracted with IBM to build the Primary Avionics Software System (PASS). PASS was comprised of two parts the Flight Computer Operating System (FCOS) and the application software. FCOS handled engine sequencing, steering and redundancy management by providing the controlling function for the software built by Rocketdyne. The application software which included guidance, navigation and systems management. PASS is the software that is discussed in the "...Write...Stuff."

PASS ran five 'general purpose computers.' IBM selected the AP-101/S processor which was based on the same architecture as the IBM 360. Earlier versions of the AP-101/S had been flown on the B-52 and the B-1B. The processor used a variety of word sizes depending on the function. For example, instructions could be either 16 or 32 bits. Floating point could be 32, 40 or 64 bit words. The average speed for math operations was about half a MIP. Program state for all process was preserved in 64-bit status word that was updated with every instruction cycle. As if that wasn't complicated enough, the processor was capable of handling 61 interrupts at 20 priority levels while preserving real-time constraints.

The five AP-101/S computers were divided into two separate systems. A quad-redundant, fault tolerant 'primary' system, and a Backup Flight Control System (BFS). PASS controlled both. All the computers processed the same data. This meant all the primary computers had to stay in synch. Reliability and safety in the primary was based on a voting scheme that checked all the computers to ensure they were producing the same result. If one computer had a different result, it was isolated.

If the primary system failed, BFS kicked in. That meant the BFS also had to be synchronized with the primary. Synchronizing the primary and backup systems would proved to very difficult and lead to a major defect that delayed the initial shuttle launch. The synchronization problem is a fascinating story captured in John Garman's article, "The Bug heard 'Round the World." 7

Shuttle software loads.
(from Tomayko, Developing software for the space shuttle )
Each processor had access to about 100K words (36-bit words) of shared memory. Memory was addressed by using a 16-bit word plus an extension for the last 4-bits from the status word. Since the memory was limited, software loads had to be swapped out at different mission phases. The largest loads were about 100K words for Ascent and Entry. Swapping software loads while maintaining system state must have been a bit tricky.

HAL/S code snippet
from http://history.nasa.gov/computers/Appendix-II.html 
PASS was going to be (and still would be) a difficult programming job. After coding the Apollo software in assembler, the engineering team knew PASS would be too complicated to write in assembler. After much debate, they decided they needed a high-level programming language. In 1969, NASA contracted Intermetrics8 to develop a new programming language, High-order Assembly Language/Shuttle or HAL/S9

The software was expensive; the cost was vastly underestimated. Originally, NASA thought the cost for development of the Shuttle software would be around $20M. In the end, NASA spend $200M for the original development (and that's 1970 dollars). Not surprising, over 50% of the PASS modules changed during the first 12 flights in response to requested enhancements. By 1991, the Agency had spent a total of $324 million. In 1992 NASA estimated it was spending $100M a year for maintenance of the onboard software.10

It's easy to understand why the maintenance costs were astronomical. The system was very complex and difficult to understand. Each release had to be free of any mission-ending errors. Both factors would drive testing budgets. Not only because large testing teams running extensive test protocols were necessary, but because the hardware and software required for development and test was not commercially available. Hardware had to be stockpiled, salvaged or reengineered from scratch at a significant cost. Similarly, all the software development tools had to be custom built and all programmers had to be trained at agency expense. (No university was churning out cadres of HAL/S programmers.)

Despite all that expense, the software was not bug free. According to Levison, during the first ten years, 16 'severity 1' errors were found. Of those, eight of remained in the flight code—operators simply avoided command that might trigger these errors. In addition, there 12 errors of lower severity that occurred during flight.

Not all errors occurred in the early days. In April 2009, a serious software communications error occurred in flight a few minutes after Endeavor reached orbit. So happens that bug was introduced in 1989 when a warning about code misalignment was inserted in the code. In other words, despite the Fishman's panegyric, the shuttle software was never bug free.

The "...Write..Stuff" is not entirely false. It has elements of truth which, to my mind, only serve to lend credibility to the otherwise misleading and potentially pernicious claims about "perfect software." It's no mere detail that the "bug free" software had been under maintenance for 25-years. The "...Write...Stuff" team was not doing development. It did not even have to deal with the major maintenance bug-a-boos of migrating to new hardware, operating systems, or compilers.

In my experience, managers who haven't programmed are unable to grasp the fundamental differences between development and maintenance tasks. They tend to be content with the idea that one size fits all. (see The Maintenance Mindset.) In other words, since the approach used by the on-board team produced "perfect software," there was and is a tendency to believe that the on-board shuttle development methods should be applied to all NASA software tasks.

I experienced fall out from this mindset while working on the Constellation Program (Cx). In the beginning of the project, we were full of hope. We were developing plans to use modern software techniques like use of software architecture and product-line concepts, an advanced computing system in the avionics package and the use of modern fault analysis techniques. These had all been used successfully in DOD or European projects. But our efforts were in vain. The project leaders had adopted a rigid philosophy like the one described in the "...Write...Stuff." All the new approaches were vetoed. It would be business-as-usual. The only software engineering team on the project was defunded.

I often wonder how things might have been different if the Cx Project Manager had absorbed the basic truth about software development that was captured so handily by John Garman in his article about the development of the shuttle software.
"If there are lessons to be learned from "the bug", they must be in how we view ourselves and our task. Building software in a large system against fixed schedules is not conducive to "bugfree" products. We can minimize the errors, and we can minimize the flight criticality of the ones that remain...but we can't treat it like a problem with a methodological solution." — Jack Garman, "The Bug Heard 'Round the World" (1981)
Despite all the tumult that the cancellation of the Cx Program created in the Agency, from a software perspective, it's a good thing it was stopped. The management did not have a realistic understanding of the software challenge they were facing. By contemporary standards, the Shuttle's PASS software would not be considered a 'big' package. Estimates at the time sized the Cx software in 100's of millions of lines of code. Using the methods described in the "...Write...Stuff," those software development efforts would not have converged—even with an infinite amount of money. An unimaginable failure was in the making.

If you are working on the development of a large software system, you can only hope that your manager doesn't read the "Write Stuff" or if she does, she knows it's irrelevant. Better she should read the account of the team that built the the HAL/S compiler. It's much better advice:
Software is a very unusual industry. You can run an assembly line with a whip, although American managers are belatedly realizing that there are better ways...Formal devices and management tricks can aid or impair it, but the impetus must come from within. The collective mind and will of the technical staff is the essence...If management strays too far from the will of the workers, tries to shape it into something that it is not, the only possible outcomes are chaos or outright rebellion. — Tony Flanders, A History of Intermetrics
But then who knows, maybe, one day, someone in senior NASA management will grasp its own lessons learned, so that, once again, NASA can do the right stuff.


1. Fishman, C. "They Write the Right Stuff." Fast Company. 1996
2. Levison. "Software and the Challenge of Flight Control" (Chapter 7), Space Shuttle Legacy: How We Did It/What We Learned AIAA. Edited by Roger Launius, James Craig, and John Krig. 2013.
3. He tells me that out of a staff of 400 only 50 were allowed to write code.
3a. Remember that Brooks's proposal most included non-technical staff. See "Fly in the ointment."
4. First described in the 1968 publication "Software Engineering, Report on a conference sponsored by the NATO SCIENCE COMMITTEE."
5. The additional effort is typically mandated the empty reassurance that "you have to do the work any." Actually we didn't.
6. The following is based on material found in four excellent sources:

a) Tomayko's Computer in Spaceflight: The NASA Experience, Chapter 4
b) Levision's Software and the Challenge of flight Control
c)Lethbridge's Spaceline.org. http://www.spaceline.org/rocketsum/shuttle-program.html
d) Mattox and White's Space Shuttle Main Engine Controller

7. Garman was Deputy Chief of the Spacecraft Software Division at JSC and played a key leadership role in shuttle software development. See Garman, J. "The Bug Heard 'Round the World," Sofware Engineering Notes. Vol. 6. No. 5. October 1981.
8. Intermetrics was formed by a group from Draper Labs who had worked on the Apollo software. The story of Intermetrics is fascinating. One of the principles, Tony Flanders, has posted a very interesting history. see http://www.whysheep.com/i2/daf-history2.html
9. According to Tomayko, 2001 a Space Odyssey was playing in the theaters at the time the new language was contracted. Perhaps Kubrick's film influenced the name.
10. Id. Levison.

Thursday, April 3, 2014

A fly in the ointment

From Chapter 3: The Surgical Team

Excerpt
...one does the cutting and the others give him every support that will enhance his effectiveness and productivity.(p. 35)
Brooks says that some jobs are too large for a small team. However, if you have a large team, Brooks' Law kicks in adding significant communication overhead, bogging down progress and bloating cost and schedule. What's the manager of a large software project to do?
Brooks' Law

Brooks has a grand plan. In subsequent chapters (4, 5, 6 and 7), he will offer a solution for managing the large project. This remedy is built on the Surgical Team concept.

What is the Surgical Team exactly? Brooks spells it out in Chapter 3 (see Picking the Archetypes for the Surgical Team for a summary).

There are a two key concepts of the Surgical Team.
  • The job gets divided up into tasks that can be handled by a "10-man programming team"
  • Each task produces a system is the "product of one mind," the Chief Programmer. As Brooks put it, "one does the cutting."
To make this work, Brooks prescribes the following:
  • The Chief Programmer will be supported by 7 specialists (plus 2 secretaries) who are each responsible for some part of the "design and implementation"
  • Each specialist communicates to the rest of the team through the Chief.1 This cuts the number of communications path by over 80%.
  • Of the 7 specialists, only one, the co-pilot, is a programmer and he "is not responsible for any part of the code."
In other words, it appears that Brooks is suggesting that for each team of 10 there be only ONE programmer. To first order this means that a staff of 100 is needed to support 10 programmers. As Brooks puts it, "So it is possible to put 200 people on a problem and face the problem of coordinating only 20 minds..." (page 37.)

Admittedly the Surgical Team as described is anachronistic. Many of the duties that previously required staff, like the 'Program Clerk,' are now done with software tools. But the main point is that the Chief is king and the co-pilot is prince; together they rule a small kingdom of specialists who do not program. A variation of the Chief-Copilot tuple is promoted by devotees of Agile Programming as pair programing.

But remember, the purpose of the surgical team is to provide an organizational remedy that solves the 'big team' problem for projects that require 10's if not 100's of programmers. (see Small is beautiful, but not applicable.) Here's where the Surgical Team salve gets sticky.

What happens in the common case that a the job cannot be partitioned into a thin layer of one-man tasks? Does the Surgical Team recurse into a hierarchy of Surgical Teams? Would you restrict a Chief Programmer on one leaf task from talking to a Chief Programmer on another leaf task? If not, the communication paths would multiply geometrically obviating the need for the surgical team.

On the other hand, if all communication was routed up the Surgical tree, there would most certainly be a communication and decision bottleneck that would be worse than any cure. As Brooks puts it, "'Schedule disaster, functional misfits, and system bugs all arise because the left hand doesn't know what the right hand is doing.' Teams drift apart in assumption" (page 235)

We had just that sort of bottleneck on my last NASA project. Our chief engineer insisted on being part of every decision. In just a month, our work ground to a dead standstill; our obligations did not. In order to the make the progress needed to meet our commitments, our experienced programmers simply completed the work without discussing it with the technical leadership. It was the only rational thing to do.

You won't find a Chief Programmer in the group
And, then there's the question of how to staff a large project. Brooks was clear that the product should only be programmed by Chief Programmers with possible small contributions from the Copilot. (So, you might say the Surgical Team has 1.1 programmers.) He describes the Chief as a developer with "great talent, ten years experience, and considerable systems and application knowledge..." (page 32). How would you staff such a team? Where would you find 10's or 100's of men and women with that skill level? And, if you did find them, how could you retain them between assignments?

It's possible that Brooks does not really mean there should only be 1.1 programmers on the surgical team. In chapter 7, Brooks discusses how senior programmers, development programmers, project programmers, junior programmers and trainees would fit into an organization. Why bother with these Indians if all you need is Chiefs? Of course if he's not serious about 1.1 programmers per team, then the bug-a-boo of Brooks Law reasserts itself.

Here's the rub. There is no easy way around Brooks' Law. The Surgical Team does not alleviate the large-team problem. There's a fly in the ointment.

I suspect when the Mythical Man-Month was written, a Surgical Team with 1.1 programmers made sense. Today, it's a flawed concept. So for purposes of this blog, I'll stretch the meaning of the Surgical Team to include as many programmers as the Chief Programmer needs to complete the task on time. Of course that means there may be no escape from Brooks Law.



1. For a diagram, see the online version Text page 36 which is pdf page 50.
2. Brooks defines architecture as "the complete and detailed specification of the user interface." (p.45) In this case "user interface" refers to a programming interface. Today, software architecture refers to a a much broader scope of concerns.

Friday, March 28, 2014

Sharing is no panacea

From Chapter 3: The Surgical Team

Excerpt
Absolutely vital to Mills's concept is the transformation of programming "from private art to public practice" by making all the computer runs visible to all team members and identifying all programs and data as team property, not private property. (p. 32)

From You and Your Research1

[Newton] said, "If I have seen further than others, it is because I've stood on the shoulders of giants." These days we stand on each other's feet!2Richard Hamming

Brooks demands "public practice" because hiding code impedes communication. There is great merit in the suggestion because many development problems stem from the lack of information about the evolving design. In fact, Brooks suggests that the team include a full-time, "Program Clerk" whose primary task is to keep the code in an open library where it can be seen by all team members.

That was then before Source Control Systems made sharing code routine. But code sharing is not the paradise Brooks envisioned. Like an ice cream-based weight-loss cure, the arrival of effortless sharing is wonderful in theory, but too good to be true.

Why? First of all sharing qua sharing is no panacea. What good is there in sharing artifacts that are unintelligible?

The current generation of programming technologies can make code very hard to understand—much harder than 40 years ago. If a programmer uses object-oriented inheritance, generic programing, aspect-oriented programming or dynamic binding, the actual behavior of the program may not be apparent by just looking at a few source code files. If you find yourself trying to grok a non-trivial program, chances are good that you'll be studying the interaction of code in a few dozens of source files, pondering hundreds of pages of cryptic web docs, and spending a few weeks of runtime debugging in order to really understand what that the code is doing.

Brooks recognized most code is not built to be used or understood by other programmers. He explained that if code is intended to be used by others, the development costs may jump by an order of magnitude. The multiplier depends on the type of system product being built. Here's how he classifies those different product types:

Brooks'software product classification

Brooks' breakdown of of software  product types
(from page 5, Mythical Man-Month )
  1. A Program
    • The simplest of the product types.
    • Built to be run by the programmer who wrote the program—typically a domain expert.
    • "Commonly produced in a garage."
    • Vast majority of products developed to support NASA missions.
  2. A Programming System
    • Collection of interacting programs that work together.
    • Assemblage consists of an entire facility.
    • Requires careful control of interfaces.
    • So long as the interfaces are controlled, the system will work without code sharing.
    • Brooks estimates the cost of a 'Programming System' is 3 times as much as just 'A Program.' However, the cost grows geometrically with the number of interacting programs.
    • Grows brittle with age. The system gets brittle as it ages; changes to one program may break many others.(see Maintenance Mindset)
    • NASA ground systems are 'Programming Systems.' Most are several decades old, brittle, antiquated and very expensive to maintain.
  3. A Programming Product
    • Intended to be fixed and used by anyone.
    • Written in a generalizable fashion.
    • Well documented.
    • Includes effective set of test programs.
    • Works on many different operating environments and with different data sets.
    • Costs 3 times as much as just 'A Program.'
    • Typical of a commercial product. For financial and cultural reasons, NASA does not produce programming products.
  4. A Programming Systems Product
    • Combined capabilities of the 'programming system and the 'Programming Product.'
    • Today a 'Programming Systems Product' is called a framework.
    • A 'Programming Systems Product' is cost 9 times as much as a 'Program.' (Ouch!)

Aside from the significant increase in the cost of building a product that's intended to be shared, there are other reasons why sharing is no cure all. Consider the plight of the poor programmer:
  • Programmers work in an engineering culture which is intolerant of errors.
  • Any non-trivial program will have bugs.
  • A programmer's code can be broken by the work of another programmer. That 'other' programmer could work on the same team or another team.
  • Broken code has consequences. It often creates a lot of extra work that may threaten the schedule.
  • Programmers tend to delay exposing a flaw in their own code as long as possible because in most cases it will be fixed in time. The interim can be nerve wracking.
  • Programmers are quick to expose a flaw in someone else's code because a quick fix may help with the assignment at hand or because it helps buy more coding time.
  • It is often difficult to identify who was responsible for a specific break.
  • The specific cause of a breakage is seldom black and white. In a large bureaucracy, like NASA, there can be political consequences for the programmer or programming team considered the offender.
By and large, programmers are a very bright bunch who quickly learn how to protect themselves. One of the best methods of self defense is a good offense. This especially true when an harried manager must rely on his 'go-to' programmer for an appraisal of coding progress and any potential threat to his primary concerns: cost and schedule.

Cultivating a manager's trust is an art; but not necessarily a Machiavellian one. Top-notch programmers are a highly-competitive, self-confident lot.3 They know how to inspire confidence.

During the years I worked on development projects I came to recognized a few traits common to the most talented coders:
  • They have a better command of programming languages and thoroughly understand the operation of the compiler and/or the interpreter.
  • They are great debuggers. Parsing a 50-line output from stderr is a piece of cake.
  • They have an uncanny mastery of multi-threaded programming which by any account is an arcane art.
  • The have the ability to grasp the long-term consequences of code organization and data structure.
  • They are often highly critical of other programmers and are adept at finding fault.
  • They believe they can do most any job and do it better than it was done by the last guy.
Would you want to share your code with someone like that?! Isn't it reasonable ask why anyone would want to make it easy for an outsider to erode the confidence of upper management. Is it any wonder that teams are reluctant to share?4

This reluctance to share is standard practice among flight software teams. It doesn't matter if you have every imagined credential, you will not get access to the code unless you are on that flight software team.

So happens that government-funded projects are required by law to share the code with others doing government work.5 It is illegal to withhold the code. Yet, in common practice, it is common for the project to withhold code. No one in the project complains. It's a very effective way to keep the flight software from the prying eyes of engineers who might expose fundamental design flaws.

Don't get me wrong. Code sharing is good and much better than keeping it private. However, it's not enough to simply make it available. It must be made available in a form that can be readily understood. But, intelligibility costs and THAT is the real challenge. How do we affordably create programs for a complex system with code that can be readily understood? This is, and has been, an area of active research. Progress has been very slow. We have a long way to go.

On the other hand, competition among top-notch programmers and programming teams will always be intense. Especially in the space business where government funding is a zero-sum game. So, if you share code, stay alert. Someone might be looking for a good pair of feet to stand on.



1. Hamming. R. "You and Your Research." Transcription of the Bell Communications Research Colloquium Seminar 7 March 1986. J. F. Kaiser, Bell Communications Research.
2. A another version of this quote, "Physicist make progress by standing on one another's shoulders, Programmers make progress by standing on one another's feet." is often attributed to Watts Humphrey.
3. I never considered myself to better than a middling programmer.
4. There's an interestingly sharp contrast with unfunded open-source development where programmers are eager to share. The competition for funding makes all the difference because funds can't be shared without a cost.
5. The code can be shared with any business or institution doing government work under a government-use license (Title 37: Patents, Trademarks, and Copyrights PART 404—LICENSING OF GOVERNMENT-OWNED INVENTIONS §404.4 Authority to grant licenses.). No license is required when sharing occurs for work that being done under the same task order.

Friday, March 21, 2014

Picking the Archetypes for the Surgical Team

From Chapter 3: The Surgical Team

Excerpt
...each segment of a large job be tackled by a team, but that the team be organized like a surgical team rather than a hog-butchering team. That is, instead of each member cutting away on the problem, one does the cutting and the others give him every support that will enhance his effectiveness and productivity.(p. 32)

From The Good Soldier Švejk1

Excerpt
When Švejk subsequently described life in the lunatic asylum, he did so in exceptionally eulogistic terms: 'I really don't know why those loonies get so angry when they're kept there. You can crawl naked on the floor, howl like a jackal, rage and bite. If anyone did this anywhere on the promenade people would be astonished, but there it's the most common or garden thing to do. There's a freedom there which not even Socialists have ever dreamed of.Opening sentences of Chapter 4,

At the beginning of chapter 3, Brooks argues convincingly that you can not do a big job with a small team. He explains that when the team is big, a raft of inefficiencies are introduced. In fact, unless care is taken, the inefficiencies will soon outweigh, and even nullify, the added productivity that motivated hiring additional staff in the first place.

The large team problem is relevant to the space biz because developing a space system is an inherently large job. (see Small is beautiful, but not applicable)

Brooks recommends a solution that was originally proposed by Harlan Mills2. During the 70's and 80's, Dr. Mills is one most influential figures in computer science. Among other things, he was chairman the NSF Computer Science Research Panel, editor of IEEE Transactions in software Engineering, governor of the IEEE Computer Society, chairman of the Air Force Computer Science Panel and an IBM Fellow. In addition to these most impressive credentials, Dr. Mills was the originator of the Cleanroom software development process. By all accounts he was a true visionary.

Mills suggests that each part of a job be assigned to a team that's organized like a surgical team.3 This team would be organized according to the driving principle that there be "well differentiated and specialize roles."

Here's a rehearsal of the Mills/Brooks surgical team roles with a few observations of how they play out in NASA.
The Surgical Team
Chief Programmer
In keeping with the surgical team metaphor, Brooks calls the chief programmer the surgeon. The surgeon is responsible for the design. She or he also codes, tests and writes the documentation. On top of that, the chief is also responsible for code management and process tools.4
 
Brooks recommends that the chief have at least 10 years in the saddle and plenty of systems and application knowledge. He doesn't mention that this doyen will need to have the endurance of a sled dog because he or she will be facing the prospect of a 100 hour work week.
Copilot
Number 2 on the surgical team is the co-pilot (the surgical team has lifted off.) The co-pilot is the chief programmer's "alter ego." He or she shares the same responsibilities and skills as the chief. However the co-pilot's responsibility is restricted to that of a sidekick, "thinker discussant" who knows all the code and, maybe even, writes some. In some cases the co-pilot serves with plenipotentiary powers when the chief is otherwise occupied. Finally the co-pilot is a hedge against any rogue bus that might flatten the chief.
 
Brooks suggests the co-pilot need not be as experienced as the chief. He doesn't mention that the co-pilot will most likely be a cypher since a dissenting voice in NASA is likely to be traded to another team.
Administrator
The administrator handles the money" and other duties that make up 95% of the typical management responsibilities found in a large bureaucracy. But, Brooks is emphatic, the "surgeon is boss."
 
Brooks skips lightly over the what it means to handle the money. If he means writing proposals, writing budgets, allocating funds, hiring staff, and lobbying with funding sources, he's described a full time role that preempts any technical work. What's more he's allocated the source of all political power to the administrator, including the ability to hire a new surgeon. Perhaps this role should really be called manager.
Editor
Brooks recognizes that engineering team must document how the system is built and how it should be used. For unity of conception, many of these documents must be written by the chief, who is one horrendously busy person. Hence, the editor is needed to gather the recorded wisdoms of the chief and turn them into a thing of beauty and devotion.
Two secretaries
In Brooks' world of the 1970's, documents were not generated automatically. Someone had to type them. As a practical matter, he suggests that the editor and the administrator get secretaries.
 
Those were the days. On my last job at NASA, a secretary (now called an administrator) was shared by 20 engineers. They were our experts on the bureaucracy; they instructed us on how we needed to complete those endless administrative tasks. And while a trip to the secretary usually resulted in another work assignment, their instruction was necessary for survival in the maze of institutional rules.
Program clerk
Any programming effort produces a mountain of artifacts. Managing all the programming products is not for the faint-hearted. Consequently Brooks adds a Program clerk to his team. Nowadays, the job is almost entirely handled by software.5
Toothsmith
Brooks knows that just because you have tools, it doesn't mean that they will always work or that the team will know how to work with them. As a matter of necessity, Brooks recommends the team include a toolsmith.
 
The need for a toolsmith is as great as ever. Ironically, this position is seldom funded and usually lands as an extra duty for one of the better programmers. As a practical matter, this means one of you best team members is making a small contribution to the actual product.
Tester
Brooks know that it's not going to be right just because the programmer said so. He recommends that a team include a tester. Nowadays we'd call this an independent tester. The role is now considered de rigueur.
Language lawyer
From the earliest days, high-level programming languages have been as vague and suggestive as a sacred text. If a team is to work against a common understanding, official interpreters are necessary. Brooks calls them language lawyers.
 
The term is now derisive. Nobody likes being told what to think, but most everyone recognizes commonality is better than anarchy.
Over the years the Surgical Team has been the target of pejorative commentary. However, it is still considered a viable team organization—no doubt because no suggestion by Brooks is to be taken lightly. But, the surgical team is now archaic, except for one thing: the existence of a chief programmer. In my experience, most of the better functioning teams have their most talented engineer working as chief programmer. NASA is a notable exception. Believe it or not, I was prohibited from naming a chief programmer because such a role did not appear in the institutional rules. Perhaps this is typical of a large bureaucracy.

Brooks does not provide all the guidance that's needed to staff the Surgical Team. He has failed to account for personal proclivities and career ambitions. This is especially important for a NASA software manager because the Agency is not your typical workplace. For despite its recent lackluster record, NASA remains the object of adoration for talented space romantics who imagine their own greatness and are eager to make a mark in the annals of space history. In particular, a manager must be wary that, by the modest exercise of authority, he might convert a team member, one destined rise to levels of influence and power, into an adversary. What's a manager needs is a scale that helps predict if a team member is destined for bureaucratic glory.

I've never been one to pass up the chance to remedy this sort of omission. After weeks of undisciplined research, I found inspiration in Jaroslav Hasek's classic study of the Austrian military6 and pulled together a new qualitative measure: Team Archetypes Scaled for Potential Glory (TASPG). It appears here for the first time.

TASPG seeks to predict how individual Surgical Team members will fare. The scale establishes a set of archetypes and the predicts likely ascension of each. (see Figure 1 below.)
The Archetypes
Straight-shooter
The straight-shooter is tone deaf or indifferent to the impact that actions have of the keepers of the treasury. Blessed with a clear and constant vision of what could and should be, the straight shooter lacks the ability to see that in reality the organization cares little for the end result. While there is little risk that the straight shooter can hit a political target, there is a genuine risk of ricochet which adds a creative aspect to management since a fertile imagination is needed to keep the monthly reports sounding worry-free.
Techno-purist
The techno-purist lives in isolation, always purported to be close, but never-ever reaching, a holy grail. She rhapsodizes on the object of her quest in an incomprehensible language. And, despite never seeing the work used in actual practise, the techno-purist has an undaunted, irrepressible, laser-like determination. She is convinced that funding is a birthright and the hereditary responsibility of management. While her babbling7 will never be understood by those who pose a threat in upper management, this fundamental bond of dependency ensures neither the techno-purist or the manager will ever be lonely.
Pollyanna
The Pollyanna has suffered many setback, but has confidence that this time things are different. At core he is an irrepressible and persistent optimist who eagerly brightens the day with hope despite all evidence to the contrary. This enviable trust in good things stems from a perennial inability to derive the future from the past. And, "since those without hope are wretched,"8 he makes an important contribution. There is little risk from the Pollyanna because, to paraphrase Prospero, hope is not the stuff that funding is made on.
Sycophant
The Sycophant is master of knowing agreement. He knows how to agree in just the right measure. As a patron's fervent protector, he knows with whom he should disagree and when. In ancient Rome, a triumphant general rode in a four-horse chariot past cheering hordes while a slave who whispered, "momento mori".9 Today's bureaucratic proconsul is more likely bring the sycophant along to handle the messy business of quarreling with a naysayers. However, like a fickle house cat, he is warm and affectionate, so long as there's tasty fare. But if a neighbor has better table scraps, danger lurks for any creature that resembles a rat or a bug. Be wary of the Sycophant.
Politician
The politician shifts adroitly with the winds of influence and fashion. Look behind your local Grand Poohbah and you will find the politician in his wake. He travels light; unburdened by principle or unneeded loyalties. And he's agile, ready jump Poobahs as the occasion calls. So, be on the lookout, the politician may soon jump into the express lane and pass you on the road to glory.
Foot Dragger
The Foot Dragger is the embodiment of mature engineering judgment. Possessed of a domineering judgement, she will climbed to the rank of senior technical lead as a reward for holding back progress. She can often be recognized by the rendering of a considered technical judgment with the words, "I don't understand." So, if you are stymied and your budget and schedule or evaporating, there is likely a Foot Dragger in the lead.
Perfectionist
The Perfectionist knows the dreaded consequence of compromise. He stands alone as the standard bearer of quality and the last bastion against the tide of disaster. While the perfectionist is unable to bring a task to conclusion, he is able to maintain technical ascendancy by subduing dissenters with disapproval. But you need not worry about the Perfectionist, the tide of events will surge past him.
Awfulizer
In the Awfulizer's assessment, failure lies at every turn. The broken interface, the overlooked requirement, the architectural mismatch, the woeful ignorance of managers all foretell of impending budget or schedule debacles. Success is always a miracle. The Awfulizer poses no direct concern for, like the beeping horns of Manhattan, the warnings are just part of the background. However, pay attention, for a warning may arm a hostile team with budget stealing rhetoric.

NASA HQ in background.  This photo is in the public domain in the United States because it is a work prepared by an officer or employee of the United States Government as part of that person’s official duties under the terms of Title 17, Chapter 1, Section 105 of the US Code.
Figure 1: Team Archetype scaled for potential greatness
Note that the archetypes are not orthogonal. For example, a single team member, like the chief programmer, may be a combination of the Foot Dragger and Politician archetypes. The NASA data confirms this particular prediction. More sophisticated results that combine weighted assessments are pending. For example, a team member, like the co-pilot, may best be characterized as part Sycophant, part Politician and part Pollyanna with relative weights assigned to each.

Regrettably the TASPG project is unfunded; the results may be slow to reach publication. In the meantime, perhaps this preliminary result will serve as a guide for the harried manager who's trying to assemble just the right team.



1. Hasek, J., "Good Soldier Svejk". Penguin Modern Classics. 1983. p 31.
2. Mills, H., "Chief programmer teams, principles, and procedures,"IBM Federal Systems Division Report FSC 71-5108, Gaithersburg, Md., 1971.
3. I'm not certain if the term "surgical team" was from Mills papers or Brooks' description of Mills suggestion. If the former, there's a nice symmetry with Mills use of "Clean Room" for this development methodology. After all, you certainly want you surgical team to work in a clean room.
4. Brooks was writing in the era before commercial products for configuration management and document generation were widely available. This is an extrapolation that may not reflect his intent.
5. For projects of a couple of hundred people, there will often be a project librarian who manages an electronic libraries for documents. The programming artifacts are typically multi-tiered, but in the development phase a build manager usually is responsible for the source code repository.
6. Hasek, J., "Good Soldier Svejk". Penguin Modern Classics. 1983.
7. The word 'barbarian' was coined by the ancient greeks to describe people who spoke by making the babbling' sounds of a foreign language.
8. Paraphrase from William Hazlitt, English critic (1778-1830)
9. "...remember that you will die..." )