The resilience error and technical debt

Last updated on October 4th, 2018 at 02:09 pm

I’ve mentioned the reification error in a previous post (see “Metrics for technical debt management: the basics”), but I haven’t explored its dual, the resilience error. Let me correct that oversight now.

The future USS Zumwalt (DDG 1000) in sea trials, December 2015
The future USS Zumwalt (DDG 1000) is underway for the first time conducting at-sea tests and trials in the Atlantic Ocean Dec. 7, 2015. The first of the Zumwalt class of US Navy guided missile destroyers, it is designed to be stealthy, and to be supported by a minimal crew. After the program experienced explosive cost growth, the class has been downsized from 32 ships to three, and complement increased from 95 to over 140 to reduce capital costs. The three vessels on order now have significantly reduced missions. As one might expect, the causes of these troubles are much debated. But it’s possible that the resilience error plays a role. Before the first of a new class of ships goes to sea, it exists as an abstraction—a collection of concepts, plans, promises, and technologies, tried and untried. Many elements of this collection have never inter-operated with other elements. The first ship represents the first opportunity to see how all the elements work together. Although troubles often appear even before the ship is fully assembled, anticipating all troubles is extraordinarily difficult.

Reification risk is the risk that an error of reasoning known as the reification error might affect decisions—in this case, decisions regarding technical debt. The reification error [Levy 2009] [Gould 1996] (also called the reification fallacy, concretism, or the fallacy of misplaced concreteness [Whitehead 1948]) is an error of reasoning in which we treat an abstraction as if it were a real, concrete, physical thing. Reification is useful in some applications, such as object-oriented programming and design.

But when we reify in the domain of logical reasoning, troubles can arise. For example, we can encounter trouble when we think of “measuring” technical debt. Strictly speaking, we cannot measure technical debt. It isn’t a real, physical thing that can be measured. What we can do is estimate the cost of retiring technical debt, but estimates are only approximations. And in the case of technical debt, the approximations are usually fairly rough—they have wide uncertainty bands. That’s one way for trouble to enter the scene. When we regard the estimate as if it were a measurement, we tend to think of it as more certain than it actually is. Technical debt retirement projects then overrun their budgets and schedules, and chaos reigns.

For example, if we think we’ve measured the MPrin of a class of technical debt, rather than that we’ve estimated it, we’re more likely to believe that one measurement will suffice, and that it will be valid for a long time (or indefinitely). On the other hand, if we think we’ve estimated the MPrin of a class of technical debt, we’re more likely to believe that obtaining a second independent estimate would be wise, and that the estimate we do have might not be valid for long. These are just some of the consequences of the reification error.

The resilience error

If the reification error is risky because it entails regarding an abstraction as a real, physical thing, we might postulate the existence of a resilience error that’s risky because it entails regarding an abstraction as more resilient, pliable, adaptable, or extensible than it actually is.

When we commit the resilience error with respect to an abstraction, we adopt the belief—usually without justification, and possibly outside our awareness—that if we make changes in the abstraction without fully investigating the consequences of those changes, we can be certain that the familiar properties of the abstraction we modified will apply, suitably modified, to the new form of the abstraction.  Or we assume incorrectly that the abstraction will accommodate any changes we make to its environment.

Sometimes we benefit when we modify abstractions; usually we encounter unintended and unpleasant consequences. For example, unless we examine our modifications carefully, it’s possible that the implications of a modification might conflict with one or more of the fundamental assumptions of the abstraction.

Examples of the resilience error

Perhaps a (ahem) concrete example will illustrate. Consider the steel hull of an ocean liner. We can manufacture it more cheaply if we can devise a way to use less steel. So one approach to that goal is to remove a small portion of the bottom of the hull, say, a circular hole one meter in diameter. We send some people into the ship to do the work, and they return with panicky reports of water coming in. But the ship seems fine, so we reject the reports. Even a day later, all seems well. But by the end of the second day, the trouble is obvious. The ship is sinking.

The problem in our example is that the circular hole in the hull violated a fundamental assumption about how ship hulls work: they work by keeping all water out of the ship. We had extended the idea of hull to make it lighter, but in doing so, we encountered some unintended consequences because our extension violated a fundamental property of hulls.

Now for a less fanciful example.

Consider the fictitious company Alpha Properties LLC, which manages small condominium associations (from 25 to 100 units). Things have been going swimmingly at Alpha Properties, and they’ve decided to expand to handle large condominium associations. Their financial accounting software  has worked well, and their employees have become quite expert in its use. Alpha management has heard good reports from other management companies that deal with large client associations. So Alpha decides to use the same software for its larger accounts too. But things don’t work out so well.

The software is fine, but the processes used by the staff are cumbersome and slow. For example, setting up a new association requires too much manual data entry. For a 100-unit association, client setup wasn’t a burden, but for a 900-unit association the problem is just unmanageable.

This is a fine example of the resilience error. When we make this error, we fail to appreciate how an abstraction can encapsulate assumptions that make for difficulties when we try to extend it or apply it in a new or altered context. In this example, Alpha’s data flow processes are the abstraction. The context is signing up a new client association. When the context (signing up a large new client) is different, it violates an internal assumption of the abstraction (the data flow process for signing up a new client).

How the resilience error leads to technical debt

In many cases, the resilience error is at the heart of the causes of technical debt. It works like this. We have an asset that works perfectly well for one set of applications or in one set of contexts. We want to apply that asset in a new way, which might (or might not) require some minor extensions. When we try it, we find that the asset incorporates some assumptions about the application or the context, and one or more of those assumptions are violated by the new application or the new context. Scrambling, we find some quick fixes that can get things working again, but those fixes usually aren’t well designed or easily maintained. The result is a trail of technical debt.

Acquiring companies is like that. Before the acquisition, we think we’ll be able to merge the IT operations to save some expenses in operations. When we actually try it, though, merging them proves to be far more expensive than we imagined. Ah, the resilience error.

What makes this situation so difficult is that often we’re unable to anticipate what assumptions we might be about to violate. That’s why we make the resilience error.

Spotting difficulties with adapting to new applications and new contexts isn’t so difficult with physical entities. For example, we can see in advance that a square peg won’t fit into a round hole. But with abstractions, we can’t always see the problems in advance. Piloting, prototypes, games, and simulations can help us avoid some trouble, but not all.

References

[Gould 1996] Stephen Jay Gould. The mismeasure of man (Revised & Expanded edition). W. W. Norton & Company, 1996.

Order from Amazon

Cited in:

[Levy 2009] David A. Levy, Tools of Critical Thinking: Metathoughts for Psychology (second edition). Long Grove, Illinois: Waveland Press, Inc., 2009.

Cited in:

[Whitehead 1948] Alfred North Whitehead. Science and the Modern World. New York: Pelican Mentor (MacMillan), 1948 [1925].

Cited in:

Other posts in this thread