Policymakers whose areas of expertise have little or no overlap with software engineering might be at a bit of a disadvantage when the conversation turns to refactoring. That can happen when the matter at hand involves software assets that carry technical debt, because the engineers are likely to argue for resources and time to be set aside for refactoring. Other “re” words are also likely to pop up: restructuring, re-architecting, rewriting, replacing, repairing, retiring, retreating, and re-engineering are examples. Some of these are clear—and clearly unaffordable—but some are less clear. What’s needed is a lucid explanation of refactoring for policymakers.
An LED traffic light. This type of signal is more efficient and more cheaply and easily maintained than incandescent signals. But in terms of traffic control, LED signals and incandescent signals are equivalent.
To refactor1 a software asset is to improve its internal structure without altering its external behavior [Fowler 1999]. The improvements usually relate to maintainability or extensibility, and for software, that usually requires improving the readability of its code (for engineers), though it might entail some minor changes of other kinds. Instance by instance, these improvements are usually small in scale. Even so, a refactoring effort might involve small changes throughout the entire asset or throughout an entire suite of assets.
Although we usually regard refactoring as a software-related activity, refactoring, like technical debt, is a concept that can apply to any technological asset. To render the refactoring concept useful for assets other than software, we must be a bit more precise about the effects of the changes involved in refactoring.
A more general definition of refactoring
Refactoring an asset inherently changes that asset; what distinguishes refactoring from other kinds of changes is the observability of the changes. For the definition of refactoring used in software engineering, the changes are observable only to the software engineers who maintain or enhance the asset.
Here’s a definition of refactoring that’s somewhat more widely applicable:
To refactor a technological asset is to apply a series of small, behavior-preserving changes to improve the structure of the asset in ways that have effects that aren’t ordinarily observable externally. When effects are observable externally, they’re very specific, usually related to attributes such as quality and usability.
For example, after a municipality replaces incandescent traffic lights with LED traffic lights, there’s no effect on traffic control. To the untrained eye, or to the trained eye that’s otherwise preoccupied, the change isn’t noticeable. But those responsible for signal maintenance or for monitoring operating costs will notice significant advantages. With respect to traffic flow, we can therefore regard the change to LED traffic lights as a refactoring of the traffic control system.
Refactoring in manufactured consumer items can be more difficult to recognize, because the useful life of the item so often ends while the item is still in the hands of the consumer. For example, we might ask how to refactor a certain subassembly of an automobile that’s already in service. Some writers have identified the vehicle recall as a kind of refactoring [Shroyer 2016]. But I prefer to regard successive models of manufactured items as containing refactorings of earlier models.
For example, in robot vacuum cleaners, the iRobot Roomba is now available in a ninth-generation “series,” though the exact number of the generations depends on what one counts as first-generation. In laptop computers, most manufacturers’ offerings do change from one model to the next version of that model. Some of these changes are more significant than what we might consider to be refactoring, such as Apple’s removal of the MagSafe power connector [Spence 2018]. For laptops, more likely to be a refactoring would be a change to a slightly more efficient internal fan.
Other applications of the refactoring concept
The refactoring concept can also apply to processes. Indeed, failure to refactor business processes is sometimes a cause of needless complexity, high maintenance costs, and other difficulties in technological assets that must interact with processes that need refactoring [Distante 2014]. The refactoring concept—that is, to improve internal structures while preserving external behavior—might even find use in organizational restructuring and debt restructuring.
Endnote
[1] One might wonder why this process is called refactoring. Martin Fowler, the author of the classic 1999 book about refactoring, has investigated the etymology of the word and concludes that it likely arose in the Forth and Smalltalk communities in the 1980s. [Fowler 2003] Jump back to the text
References
[Distante 2014] Damiano Distante, Alejandra Garrido, Julia Camelier-Carvajal, Roxana Giandini, and Gustavo Rossi. “Business processes refactoring to improve usability in E-commerce applications.” Electronic Commerce Research 14:4 (2014): 497-529.
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
To refactor an asset is to apply a series of small, behavior-preserving changes to improve the structure of the asset. When effects are observable externally, they’re very specific, usually related to attributes such as quality and usability.
Because situational details are so important, there’s no formula for choosing to outsource technical debt retirement projects. But there are guidelines.
When we retire technical debt from assets, service disruptions might be necessary. Automating some procedures can minimize those disruptions and avoid errors and rework.
Organizations are more likely to gain control of their legacy technical debt portfolio if they begin by controlling the transformation of incremental technical debt into legacy technical debt.
Technical debt that appears as discrete chunks—localizable technical debt—is more readily retired, because we can do it incrementally, piece by piece. It’s a particularly manageable challenge.
For some assets, we can’t allow debt to persist, and we can’t afford replacements. We must retire the debt. This post begins exploring what it takes to design projects to retire technical debt in irreplaceable assets.
From time to time, people ask me about the wisdom of outsourcing technical debt retirement projects. Because the answer depends so strongly on the particulars of the situation, there’s no general answer. But there are general guidelines—factors to consider when making the decision. Let’s refine the question first, in the form of a case:
Our organization uses an array of software and hardware assets to execute our mission. We developed some of these systems so long ago that the original developers have departed. They left here for other companies, or they left in spinoffs, or they moved on to other parts of our company. Some of these moves were due to reorganizations, some to promotions, and some to personal career decisions.
Most of the people who are now maintaining these assets have learned by doing. This has been necessary because we haven’t kept the documentation current enough to be a reliable reference. We know that the systems harbor significant levels of technical debt, and the documentation itself carries debt. So we want to retire all that debt, but it’s a big job. Should we hire contractors? Or a vendor who specializes in large scale technical debt retirement projects?
This is a typical situation, but many variables are unspecified. And typically, even more variables are unknown. Those unspecified or unknown variables make the decision tricky. To illustrate, I’ve listed below seven issues that would affect decisions about outsourcing technical debt retirement projects.
In-house staff probably has useful knowledge
The dilemma: outsource technical debt retirement, or do the work in-house?
If the in-house staff has much undocumented information about the current configuration of the assets, they have an enormous advantage over contractors or an outside vendor trying to do the same work. And even though the in-house staff wasn’t involved in initial development, they probably have valuable knowledge of the asset if they’ve been engaged in maintenance or enhancement to any significant degree. And they probably know more about the assets than any outsider would. So if the ultimate decision is to outsource the work, try to devise an arrangement in which the most knowledgeable in-house staff are acting in a reference role.
Debt retirement effectiveness depends on knowledge of enterprise strategy
Knowledge of enterprise strategy is useful in technical debt retirement projects. For example, suppose we know that a future project will be rendering some or all of a given asset irrelevant. We can use that knowledge to focus the debt retirement effort.
However, in some cases, revealing strategy to outside vendors is risky, even with ironclad NDAs in place. So some asset owners avoid revealing strategy information. They accept that the outside vendor might perform otherwise-wasteful tasks. This approach can be a low-cost way to manage the risks that arise from revealing strategy. Others choose to perform the work in-house. Working in-house enables them to use their knowledge of strategic direction when allocating effort in debt retirement or when deciding what the transformed asset should look like.
Detailed knowledge of the debt retirement effort is itself valuable
Knowledge of the what and why of the actual debt retirement work can be helpful in resolving any difficulties that surface after completion. That knowledge is also helpful in future work on similar assets.
With outsourcing, after the work is done, any unreported information about what the vendor did and why they did it departs with them. If in-house staff perform the work, that information remains in-house. This can be very helpful if the asset is a critical asset, or if you expect further future enhancement work or debt retirement work on that asset or similar assets.
Debt retirement work almost inevitably generates new knowledge
When people work on debt retirement, they usually have specific objectives. Even so, as they work, they generally uncover issues they hadn’t anticipated. Both in-house staff and contractors experience these aha’s. The difference between them is what happens after the work is done.
If in-house staff does the work, they can use this newfound knowledge in other projects, including new development. Not necessarily so with the outside vendor. If the same vendor is employed again for another effort, they can apply that knowledge if doing so is in scope for the next contract. But if that vendor doesn’t return, or the scope of subsequent efforts doesn’t permit it, then they can’t apply that knowledge. Moreover, the vendor might not even report what they found, though most would because they hope it will lead to more work. If they do report it, the in-house contract monitor should be sophisticated enough to recognize how valuable that kind of information is. Sadly, many are not.
Asset service disruptions can be problematic
Another difficulty with outsourcing technical debt retirement projects relates to asset service disruptions. In some debt retirement efforts, some assets must be taken out of service for periods that are moderately disruptive or worse. In-house staff likely have relationships of long standing that make cooperation, negotiation, and consideration relatively easy.
If negotiation difficulties arise, the lowest level executive or manager who’s responsible for all parties can facilitate resolution. And over time, with practice, all parties learn to work out these issues more effectively. With outside vendors, this process can be more difficult, because of the absence of existing relationships, the termination of relationships when vendors exit the scene, and the lack of formal authority of some specific executive or manager.
If in-house staff can’t do the work, consider hiring
If the in-house staff is overloaded, or if they lack the skills necessary to take on the technical debt retirement effort, outsourcing can seem like the only workable approach. Not so fast though. If a stream of debt retirement projects is in your future, consider the advantages of building a debt retirement function with a long-term agenda. Examine again the factors cited above to determine the scale of the advantages of building such a team.
Outsourcing probably works well for refactoring
The one activity for which outsourcing can be a big win is refactoring. Refactoring doesn’t usually require much knowledge of company strategy. And it doesn’t require much “non-localizable” knowledge. That is, the requirement that the refactoring not cause changes in asset behavior enables the asset owner to write a very tight contract with the debt retirement team. They can then perform their work with confidence because they can test the asset’s behavior incrementally. Also, with refactoring, asset service disruptions are usually minimal.
One last suggestion. With outsourcing, the vendor might have significantly more experience with technical debt retirement efforts than does the client. This asymmetry gives the vendor an advantage at every stage. For technical debt retirement efforts, they know more about contracting, devising statements of work, defining acceptance criteria, and managing risk. Most important, they have experience dealing with the many speed bumps that can occur in these projects. To manage the risks of that advantage, consider retaining a consultant experienced in these situations. This person’s role is to monitor communications between enterprise and vendor to ensure fairness. The mere presence of such an individual can deter the vendor from some of the abuses that can be so tempting in these asymmetric situations when trouble arises.
When we transform assets in order to retire some of the technical debt they carry, service disruptions are sometimes necessary. To minimize service disruptions while technical debt retirement efforts are underway, it’s advantageous to automate some procedures. Automation-assisted technical debt retirement provides two important benefits: limited disruption of operations and error avoidance.
I’m using the concept of automation a bit loosely here. I don’t mean to imply that these procedures are autonomous. What I mean is that engineers working on technical debt retirement projects have available an array of tools wide enough to enable them to perform many operations with a minimum of thought. For example, when a test is automated in this sense of automated, an engineer can issue a command such as, “Test Module Alpha Using Test Suite Delta,” which results in the execution of a set of predefined tests. Following execution, the appropriate engineers are notified of the results, and the results are recorded in the proper places. If the results are anomalous, engineers can then take appropriate action.
Benefits of automation-assisted technical debt retirement
The I-35W Bridge collapse, day 4, Minneapolis, Minnesota, August 5, 2007. The proximate cause of the collapse was underweight gusset plate design, which made the bridge vulnerable to the increased static load due to concrete road surfacing additions over the years, and to the weight of construction equipment and supplies during a repair project that was then underway [NTSB 2008]. When we conduct maintenance or technical debt retirement projects involving assets that must remain operational during project execution, we risk stressing the asset in ways that extend beyond its safe operation envelope. The National Transportation Safety Board found that this occurred in the case of the I-35W bridge collapse. These effects are more difficult to imagine in software systems, but they can occur when load is shifted from the systems undergoing modification to other systems that can then become overloaded. Or these effects can occur when load is shifted not from one asset to another, but from one time window to another on the same asset, resulting in high loads in some time windows. Photo by Kevin Rofidal, United States Coast Guard, courtesy Wikimedia Commons.The more obvious benefit of automated procedures is speed. For example, an asset removed from service for testing can be returned to service more quickly if the testing is automated. And if trouble erupts during operations when a newly transformed asset is placed into service, the untransformed asset can be swapped back into its place quickly if insertion procedures and removal procedures (roll-out and roll-back) are automated. Tools for releasing newly transformed assets, and rolling back to the previous release if necessary, provide another example of automation assist. These tools are just a few of the many elements of a set of practices collectively known as continuous delivery [Humble 2010].
The second benefit of this kind of automation is error avoidance. For example, inconsistent or incomplete testing can fail to find errors and defects, and that leads to rework and further disruptions. Another way to generate trouble: performing tests incorrectly, and therefore finding “defects” that aren’t there. Automated procedures are much less prone to error, if they’re periodically maintained, tested, and certified. For example, if testing a module at a certain level requires running a suite of tests, engineers needn’t remember (or take time to look up) how to prepare the asset for tests, how to run the tests, or what the members of the test suite are. Long advocated as an essential element of sound engineering practice, test automation can avoid some of these problems. But it’s far short of a panacea [Bach 1999].
Other automation opportunities
Sometimes debt retirement itself can be automated. When we can retire instances of the technical debt in question by performing an automated transformation on an asset, the transformation is faster and more reliable.
The most important form of automation associated with technical debt retirement is automation-assisted regression testing. Investments in thorough and focused regression testing have potentially shockingly high returns in the debt retirement context, and in the contexts of development and routine maintenance.
To perform a regression test on an asset that has undergone some kind of change (or whose context has undergone change) is to operate, employ, measure, or inspect the asset under a specified set of conditions to determine whether those changes caused the asset to fail to meet some standard that it had previously met before the change. That is, a regression test is a procedure that determines whether the asset has regressed as a result of the change. Automated or automation-assisted regression tests enable the members of the debt retirement project team to detect problems in assets that they’ve transformed before the business units that depend on those assets encounter problems during their potentially expensive operations [Ge 2014].
Many of these same regression tests can also be useful during enhancement and ongoing maintenance of the asset. In many instances, investing in automated regression tests well in advance of the debt retirement project can enhance development and maintenance performance relative to those assets. Later, when the debt retirement project begins, the previously obtained results of regression tests will already be available.
For some debt retirement projects, specially created automated regression tests might be beneficial. Assigning engineers to automation tool development for debt retirement projects is probably the best way to support these needs.
Last words
These automation capabilities are unlikely to be available commercially, because they’re so specialized to the asset being tested. And because general applicability is unnecessary, building them in-house is both practical and economical, if the necessary skills are available. These investments can be justified economically if we take into account the savings resulting from reduced service disruptions for the debt-bearing assets.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
All technical debt in enterprise assets is either incremental technical debt or legacy technical debt. Incremental technical debt is technical debt newly incurred. It can be newly incurred exogenous technical debt, or it can be endogenous technical debt incurred either in projects currently underway, or projects just recently completed. Legacy technical debt is technical debt associated with assets, and which wasn’t incurred recently or which exists in any form prior to undertaking work on those assets. All legacy technical debt was at some point incremental technical debt. The vast amounts of legacy technical debt most organizations now carry are nothing more than the accumulation of incremental technical debt. The path to managing legacy technical debt therefore begins with controlling incremental technical debt.
A sinking rowboat provides a useful metaphor for illustrating the effects of incremental technical debt. The enterprise is the rowboat; the leaks are the properties of the enterprise and its environment that lead to creating incremental technical debt; water entering the boat through leaks is incremental technical debt; the accumulated water in the bottom of the boat is legacy technical debt.
Organizations are more likely to gain control of their legacy technical debt portfolio if they begin by controlling the formation of incremental technical debt, and its transformation into legacy technical debt. A metaphor might make this clear:
If you find yourself in a sinking rowboat, bailing out at least some of the water is a good idea, and it might be necessary in the short term. But at some point, fixing the leaks where the water comes in is advisable. Unless you address the leaks that already exist, and prevent new ones from forming as the rowboat ages, your fate is sealed. You’ll spend increasing portions of your time, energy, and resources bailing out your leaky rowboat, and declining portions of your time, energy, and resources rowing the boat towards your objective. And when you do devote some time and energy to rowing towards your objective, you’ll find the rowing surprisingly difficult, because the boat is lower in the water, and because you must propel not only the mass of the boat and its payload, but also the dead weight of the water in the bottom of the boat.
In this metaphor, legacy technical debt is the water in the bottom of the boat, and incremental technical debt is the water coming in through the leaks. The leaks are the proximate “causes” of technical debt. The root causes of the leaks are the root causes of technical debt.
If the enterprise is in the midst of a legacy technical debt emergency, retiring some of it is necessary in the short term. But unless the enterprise addresses incremental technical debt and its root causes, a new burden of legacy technical debt will accumulate. That accumulation is then likely to eliminate the benefits of having retired the current burden of legacy technical debt.
So after the legacy technical debt emergency is passed—or if resources permit, during the emergency—establishing measures, procedures, and practices for controlling incremental technical debt would be prudent.
This change might be less challenging than it sounds. With respect to endogenous incremental technical debt, the teams that incurred it are either still at work, or just recently dispersed. Their understanding of the incremental technical debt is still fresh in their minds. If their projects are still underway, and if budget and schedule permit, retiring the incremental technical debt in the context of those projects is a superior strategy. For projects that have already delivered their work products, a somewhat less preferable—but still practical—approach involves re-assembling some of the team to retire the incremental technical debt as soon as possible, while memories are still fresh. Other approaches might be needed for incremental exogenous technical debt.
For the most part, the problem of controlling incremental technical debt isn’t a technical one. It usually reduces to a problem of finding time and resources to undertake the task.
Why resources aren’t available to retire incremental technical debt
The immediate reason why most teams don’t have enough resources to retire their incremental technical debt is that the organization, as a whole, doesn’t plan for retiring incremental technical debt incrementally. This immediate reason, though, isn’t fundamental. The lack of resources is a symptom of deeper dysfunctions in the organization. The real question is this: Why do so many organizations fail to allocate time and resources to retire incremental technical debt incrementally? Here are three reasons.
Misunderstanding (or no understanding) of the concept of technical debt
The organization is unlikely to be able to manage any kind of technical debt unless its people understand the concept. They must understand that technical debt isn’t necessarily the result of engineering malpractice. Much technical debt arises either as a natural result of working with technology, or as a result of organizational forms that compel people to behave in ways that lead to generating technical debt. Unless the people of the organization accept these truths, allocating sufficient resources to managing incremental technical debt is unlikely.
Decisions regarding technical debt management ultimately reduce to a choice between allocating precious resources to technical debt retirement, and allocating them elsewhere. To make this choice responsibly, it’s necessary to fully appreciate the cost of carrying technical debt. Most believe that these costs appear in the form of lost engineering productivity. While that is indeed a factor, other factors can be far more important.
For example, if entry into an important market is delayed by even as little as 30 days due to debt-depressed engineering productivity, the financial consequences can be enormous and insurmountable. Or delays in diagnosing and repairing a fault in a product can produce financial liabilities that can actually sink the company. When one considers all possible financial consequences of carrying technical debt, it becomes clear that managing technical debt effectively is actually a strategy for survival. The decision to allocate appropriate resources to incremental technical debt retirement does require modeling these costs—calculations that few organizations actually undertake.
Miscalculating projections of returns on investments
Failing to estimate MICs with sufficient precision is problematic, as noted, because it reduces the quality of decisions regarding short-term resource allocations. But it also affects long-term projections, which depend on estimating returns on investments. For example, to choose between investing in retiring incremental technical debt from an asset and investing in new capabilities for the same asset, one must compare the projected value of each choice. If the decision-maker’s understanding of the technical debt concept is deficient, or if the calculations of MICs now or in the future are incomplete or underestimated, the investment decision is likely to be biased in favor investing in new capabilities. Incremental technical debt retirement is thereby systematically deferred or avoided altogether.
Policy recommendations for controlling incremental technical debt
The simplistic approach to controlling incremental technical debt is to provide more money to projects and to engineering functions. While that approach will be somewhat helpful, its results will likely be disappointing when compared to approaches that combine resource augmentation with changes in enterprise policy, processes, and culture.
Let’s begin with an example of a needed cultural change. Nearly anyone who makes or influences decisions might occasionally bear some responsibility for incurring incremental technical debt. To achieve effective control of technical debt requires that all such people understand how to change their behavior, whether they’re acting individually or in collaboration with others. Any guiding principle offered to them must be simple to state and easy to understand, because we must communicate it to nearly everyone in the enterprise. Here’s a sample statement of a useful such principle:
Those whose decisions cause the enterprise to incur technical debt are accountable for securing the resources needed to retire that debt, and for supplying compensating resources to those within the enterprise, or among its customers, who suffer depressed operational effectiveness during the period in which that technical debt is outstanding.
I call this the Principle of Accountability. It’s a corollary of what Weinberg calls “Ford’s Fundamental Feedback Formula” [Weinberg 1985], which captures the idea that people make better choices when they must live with the consequences of those choices.
General guiding principles are necessary, but not sufficient. Here are five examples of changes that help in controlling incremental technical debt [Brenner 2017b].
1. Adopt a shared concept vocabulary
There must be general agreement among all parties about the meanings of concepts that relate to incremental technical debt formation.
Examples: the definitions of “done” vis-à-vis projects, strategic technical debt, reckless technical debt, unethical technical debt, exogenous-technical-debt, endogenous technical debt, MICs, MPrin, and more. An enterprise-wide education program, including on-line reference material and new-employee orientation components, are probably also necessary.
2. Accept that technical debt is a fact of technological life
There is a widespread belief that most technical debt results from engineering malpractice. Although some technical debt does arise this way, most does not. For examples of other causes, see “Non-technical precursors of non-strategic technical debt.” Some technical debt arises because of advances external to the enterprise, beyond its control. Development-induced or field-revealed discoveries are especially difficult to avoid. In many instances, technical debt is an inevitable result of using technology.
3. Track the cost of carrying technical debt
The cost of retiring a particular class of technical debt (its MPrin) is significant only in the context of planning or setting priorities for resource allocation. In all other contexts, knowing that cost has little management value. What does matter, at all times, is the cost of carrying that technical debt—the MICs, or metaphorical interest charges. (See “The Principal Principle: Focus on MICs.”) MICs can fluctuate wildly [Garnett 2013].
Build and maintain expertise for estimating and tracking the costs of incurring and carrying each class of technical debt. Know how much each kind of technical debt contributes to these costs, now and for the next few years.
4. Assign accountability for kinds of technical debt
Some kinds of incremental technical debt result from actions (or inactions) within the enterprise; some do not. To control the kinds of incremental technical debt that arise from internal causes, hold people accountable for the debt their actions generate. Use Fowler’s Technical Debt Quadrant [Fowler 2009] as the basis for assessing and distributing internal financial accountability for debt retirement costs (MPrin) and metaphorical interest charges (MICs). For example, in Fowler’s terms, a Reckless/Inadvertent incident could carry a rating that would lead to imposing higher assessed charges for the debt originators than would a Prudent/Deliberate incident, the charges for which might be zero.
5. Require that deliberately incurred technical debt be secured
In the financial realm, secured debt is debt for which repayment is guaranteed by specifically pledged assets. By analogy, secured technical debt is technical debt for which resources have been allocated (possibly in a forward time period) to guarantee the debt’s retirement and possibly its associated MICs.
This policy implies that deliberately incurred technical debt, whether incurred strategically or recklessly, must be secured. If anyone involved in a development or maintenance effort feels that technical debt has been incurred, a dispassionate third party, unaligned with any function involved in the effort, reviews project deliverables for the presence of technical debt. However, allocating future resources might require securing commitments of resources for fiscal periods beyond the current one. For many organizations, such forward commitments might require modifying the management accounting system.
Last words
Controlling incremental technical debt requires changes well beyond the behavior and attitudes of engineering staff, or the technologies they employ. Achieving control of incremental technical debt formation requires engagement with enterprise culture to alter the behavior and attitudes of most of the people of enterprise.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
When technical debt appears as discrete chunks—that is, when it’s localizable—we can often retire the debt incrementally, system-by-system, module-by-module, or even instance-by-instance. These approaches offer great flexibility, both technically and financially, which makes retiring localizable technical debt a particularly manageable challenge.
Localizable technical debt
Electricity pylons, Hamilton Beach, Ontario, Canada: a small part of the AC power grid, which seems destined one day to manifest a great deal of non-localizable technical debt. Photo by Ibagli courtesy Wikimedia Commons. Pylons in the same line are visible in Google Street View.
In “Technical debt in a rail system,” I explored the case of Amtrak’s Acela Express. In that example, I explained that Acela’s passenger cars are designed to tilt to compensate for centrifugal forces that appear when the train rounds curves. The technical debt is in the form of tracks that were too close together to permit the trains to tilt as much as they’re designed to, which limits the trains’ speed rounding curves. The instances of this debt are the curves in which the tracks are too close together. These instances are thus inherently localizable.
In “How technical debt can create more technical debt,” I described an example in which an organization is unable to upgrade its desktop computers from Windows 8 to Windows 10. In this case, each computer running Windows 8 is an instance of this form of technical debt.
Both of these examples illustrate the sorts of technical debt in which the instances are localizable—each instance is self-contained, and we can “point” to it as an instance of the debt in question. But localizable technical debt need not be associated with hardware. In software systems, for example, localizable technical debt can exist in a module interface, which might have been designed to meet a requirement that’s no longer relevant. That module and any other modules that interact with that interface manifest that technical debt.
Non-localizable technical debt
Non-localizable technical debt is debt for which the instances are amorphous or system-wide, or span the bulk of the system, if not all of it. Retiring non-localizable technical debt typically requires extensive re-engineering of the assets that manifest it.
For the most part, non-localizable technical debt arises at the level of system architecture or above. One can easily imagine this occurring in software systems, where physical constraints have little meaning, but let’s consider a hardware system to illustrate the importance of this concept.
Until relatively recently in the United States, most electric power consumers used power for incandescent lamps, heating, or for electric motors in elevators, refrigeration, home appliances, pumps, and so on. These applications are compatible with an alternating current power distribution system (AC grid). The AC grid is more efficient than an equivalent direct-current architecture (DC grid) when power generation plants are few and relatively distant from power load sites, because transmission losses are lower for AC than for DC.
However, advances in electronics and in distributed power generation technologies are eroding the advantages of the AC grid [Dragičević 2016]. Most electronic devices, including phones, computers, rechargeable power tools, LED lighting, and electric vehicles use DC internally. To access the AC grid, they use converters that change AC power into DC power, which entails efficiency losses due to the conversion. Moreover, solar power generation systems such as photovoltaics generate DC inherently. Modern wind turbines generate AC at a frequency determined by wind power conversion efficiency, but they then convert it to DC before a second conversion to AC at the frequency the grid requires. And because solar and wind power generators are geographically dispersed, they’re often situated near their load sites, as for example, a photovoltaic array on the roof of a home would be. Therefore, the losses involved in transmission from generation site to load site are much less important than they would be if the generation sites were few, concentrated, and at great distances from the loads they serve.
Our current AC grid architecture is likely to become a net disadvantage in the not-too-distant future. If that does happen, we could come to regard the current AC grid, and the devices that are designed for it, as manifesting technical debt. However, localizing that debt in each device and each component of the AC grid would make little sense. The technical debt in question would reside in the grid architecture, as a whole. It would be inherently non-localizable.
Addressing localizable technical debt
As noted above, we can often retire localizable technical debt incrementally—instance-by-instance. In many cases, this enables engineers to address the debt at times and in sequences that are compatible with organizational priorities and within the organization’s resource capacity in any given fiscal period. Although this isn’t always possible for localizable technical debt, and although engineers are often justifiably averse to the temporary non-uniformity that results from incremental debt retirement, exploiting localizability when planning debt retirement is often a useful strategy for retiring technical debt economically.
Incremental retirement of localizable technical debt does present some problems. During the retirement process, for any given instance, it might be necessary to install temporary structures to enable continued operation with minimal service disruption. For example, with the Acela tracks, an alternate line might be needed while the new track is installed, or the new track might need to be installed at some distance from the existing track while trains continue using the existing track. Either approach requires investment beyond the investment required for the new track itself. Some managers have little appetite for such temporary investments. But temporary investments are in a real sense part of the MICs on that debt. They’re unusual, as MICs go, in the sense that they’re incurred as part of the debt retirement effort, but they’re still MICs. In a way, they’re analogous to the charges that might appear when terminating an auto lease.
Another consideration when addressing localizable technical debt is its entanglement with other forms of technical debt. With respect to the effort to retire one kind of localizable technical debt, these other forms of technical debt are what I’ve called auxiliary technical debt (ATD). Consider carefully the time order of efforts to retire the localizable technical debt and one or more forms of its ATD. Because retiring localizable technical debt can seem deceptively straightforward, the temptation to deal with it before addressing some of the ATD can be difficult to resist. But dealing with some of the ATD first might actually be the wiser course when, for example, doing so eliminates numerous instances of the localizable technical debt.
One note of caution
Within the category of localizable technical debt are some kinds of debt that are so widespread in the asset that retiring them affects a large part of the asset, if not all of the asset. While it’s true that each instance of such debt is identifiable and localized, the instances are so widespread that they collectively have the properties of non-localizable debt as far as retirement efforts are concerned. Incremental retirement might still be possible, but a more global retirement effort might be safer and less disruptive. One approach, usually favored by the technologists, is to suspend all other work while the debt in question undergoes retirement. While that approach might indeed be safest, all stakeholders must accept and understand the technical issues, and the technologists must understand the concerns of all stakeholders. A joint decision about the retirement strategy among all stakeholders, including technologists, is recommended.
Last words
In the context of debt retirement projects, localizable technical debt provides needed flexibility. Often, the non-uniformity that results from retiring localizable technical debt instance-by-instance can be reduced before the debt retirement project is completed. In the meantime, the team can be relatively free to retire the localizable debt in whichever order is most fitting.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
Decisions to retire the legacy technical debt carried by irreplaceable assets are not to be taken lightly. As decision makers gather information and recommendations from all around the organization, most will discover that information and recommendations aren’t sufficient for making sound decisions about technical debt retirement. The issues are complex. Education is also needed. It’s entirely possible that in some organizations, confronted with a set of decisions regarding legacy technical debt retirement in irreplaceable assets, the existing executive team might be out of its depth. To understand how this situation can arise, let’s explore the nature of legacy technical debt retirement decisions.
A common technical debt retirement scenario
What compels the leaders of a large enterprise to consider retiring the technical debt encumbering one of its irreplaceable assets is fairly simple: cost. Decision makers usually begin by investigating the cost of replacing the asset—the option I’ve oh-so-cleverly called “Replace the Asset.” They then typically conclude that replacement isn’t affordable. At this point, many decision makers choose the option I’ve called “Do nothing.” Time passes. A succession of incidents occurs, in which repairs to the asset or enhancements of the asset are required. And I use the term required here to mean “essential to the viability of the business.”
Two alternatives to retiring legacy technical debt in irreplaceable assets. Neither one works very well.
Engineers then do their best to meet the need, but the cost is high, and the work takes too long. The engineers explain that the problems are due, in part, to the heavy burden technical debt in this particular asset. Eventually the engineers are asked to estimate the cost of “cleaning things up.” Decision makers receive the estimates and conclude that it’s “unaffordable right now.” They ask the engineers to “make do.” In other words, they stick with the Do Nothing option.
After a number of cycles repeating this pattern, decision makers finally agree to provide time and resources for technical debt retirement, but only because it’s the least bad alternative. The other alternatives—Replace the Asset, and Do Nothing—clearly won’t work and haven’t worked, respectively.
So there we are. The organization has been forced by events to address the technical debt problems in this irreplaceable asset. And that’s where the trouble begins.
Decisions about retiring legacy technical debt
In scenarios like the one above, the fundamental decision has already been made: the enterprise will be retiring legacy technical debt from an irreplaceable asset. But that’s just the first ripple of waves of decisions to come, made by many people in a variety of roles throughout the enterprise. Let’s now have a look at a short catalog of what’s in store for such an enterprise.
Recall that most large technical debt retirement projects probably exhibit a high degree of wickedness in the sense of Rittel and Webber [Rittel 1973]. One consequence of this property is the need to avoid do-overs. That is, once we make a decision about how to proceed to the next bit of the work, we want that decision to be correct, or at least, good enough. It should not leave the enterprise in a state that’s more difficult to resolve than the state in which we found it. Since another property of wicked problems is the prevalence of surprises, most decisions must be made in a collaborative context, which affords the greatest possibility of opening the decision process to diverse perspectives. We must therefore regard collaborative decision-making at every level as a highly valued competency.
What follows is the promised catalog of decision types.
This decision category leads the list because it provides the highest leverage potential for changing enterprise behavior vis-à-vis technical debt. Organizations that are confronting the problem of technical debt retirement from irreplaceable assets would do well to begin by acknowledging that although they might be able to devise tactics for dealing with the debt burdening these assets right now, they must make a strategic change if they want to avoid a recurrence. Accumulating debt to a level sufficient to compel chartering a major debt retirement project took time. It took years of deferring the inevitable. A significant change of enterprise strategy is necessary.
When changing complex social systems, applying the concept of leverage provides a critical advantage. In this instance, following the work of Meadows [Meadows 1997] [Meadows 1999] [Meadows 2008], we can devise interventions at several points that can have great impact on both the level of technical debt and its rate of accumulation. The leverage points of greatest interest are Feedback Loops, Information Flows, Rules, and Goals. For example, the enterprise can set a strategic goal of a specific volume of incremental technical debt incurred per project, normalized by project budget, as I discussed in the post, “Leverage points for technical debt management.”
One might reasonably ask why enterprise strategy must change; wouldn’t a change in technology strategy suffice? Changing how engineers go about their work would help—indeed in most cases it’s necessary. But because the conditions and processes that lead to technical debt formation and persistence transcend engineering activities, additional changes are required to achieve the objective of controlling technical debt.
Some technical debt is strategic—it’s incurred as the result of a conscious business decision. But some is non-strategic. We might even be unaware of how it occurred. However, both kinds of technical debt can arise as a result of non-technical factors. Read a review of non-technical precursors of non-strategic technical debt.
Organizational decisions
Before chartering a technical debt retirement project (DRP) for an irreplaceable asset, or a group of irreplaceable assets, it’s wise to consider how to embed that project in the enterprise.
The default organizational form for debt retirement projects concerned with an asset A is usually the same form that would be used for major projects focused on asset A. If the Information Technology (IT) unit would normally address issues in A, the debt retirement effort usually would be organized under IT. If A is a software product normally attended to in a product group, that same group would likely have responsibility for the DRP for asset A.
Although these default organizational structures are somewhat sensible, both technically and politically, there’s an alternative approach worth investigating. It entails establishing a technical debt retirement function that becomes a center of excellence for executing technical debt retirement projects, and for developing and injecting sound technical debt management practice into the enterprise. Such an approach is especially useful if multiple debt retirement projects are needed.
The fundamental concept that makes the center-of-excellence approach necessary is the wickedness of the technical debt retirement problem. To address the problem at scale requires capabilities beyond what IT can provide; beyond what product units can provide; indeed, beyond what any of the conventional organizational elements can provide. The reason for this is that the explosion of technical debt in most organizations is an emergent phenomenon. Every organizational unit contributed to the formation of the problem. And every organizational unit must contribute to its resolution.
A technical debt center of excellence is an approach that might be capable not only of synthesizing the expertise of all elements of the enterprise, but also might be capable of bringing new approaches into the enterprise from external sources.
Engineering decisions
Engineers have a tendency to identify and classify technical debt items on technical grounds. Further, they tend to set technical debt retirement priorities on a similar basis. That is, they’re inclined to set priorities highest for those debt items that they (a) recognize as debt items and (b) see as imposing high levels of MICs charged to engineering accounts. Engineers are less likely to assign high priorities to technical debt that generates MICs that are charged to revenue, or to other accounts, because those MICs are less evident—and in many cases invisible—to engineers.
Decisions regarding recognition of technical debt items and setting priorities for retiring them must take technological imperatives into account, but they must also account for MICs of all forms. Priorities must be consistent with enterprise imperatives.
Decisions about pace
Paraphrasing Albert Einstein, technical debt retirement projects should be executed as rapidly as possible, and no faster. The tendency among non-engineers and non-technical decision-makers is to push for rapid completion of debt retirement projects, for three reasons. First, everyone, like the engineers, wants the results that debt retirement will bring. Second, everyone, like the engineers, wants an end to the inevitable disruptions debt retirement projects cause. And finally, the longer the project is underway, the more it might cost.
For these reasons, once the decision to retire the debt is firmly in hand, the enterprise might have a tendency to apply financial resources at a rate that exceeds the ability of the project team to execute the project responsibly. When that happens, rework results. And for wicked problems like debt retirement, rework is the path to catastrophe.
Decisions about pace and team scale need to be regarded as tentative. Regular reviews can ensure that the resource level is neither too low nor too high. Even when the engineers are given control over these decisions, they must be reviewed, because pressures for rapid completion can be so severe that they can compromise the judgment of engineers about how well they can manage the resources applied to the project.
Resource decisions
Debt retirement projects concerned with legacy irreplaceable assets are different from most other projects the enterprise undertakes. Estimates of the labor hours required are more likely to be incorrect on the low side than are analogous estimates for other projects, because so much of the work involves pieces of assets with which few engineering staff have any experience. But with respect to resources, underestimating labor requirements isn’t the real problem. Non-labor resources are the real problem.
Because the assets are irreplaceable, it’s likely that they’re needed for ongoing operations. In some cases, the assets are needed continuously. Many organizations have kept such assets operational by exploiting hours of downtime during periods of low demand, usually scheduled and announced in advance. While these practices are likely sufficient for the relatively minor and infrequent changes usually associated with routine maintenance and enhancement, debt retirement imposes much more severe burdens on the organization than these short access windows can support. Effective debt retirement projects need far more access to the asset—a level of access that continuous delivery practices can provide [Humble 2010].
However, assets whose designs predate the widespread use of modern practices such as continuous delivery might not be compatible with the infrastructure that these practices require. And in organizations that haven’t yet adopted such practices, staff familiar with them might be in short supply. For these reasons, we must regard as developmental any early projects whose objectives are retiring technical debt from irreplaceable assets. They’re retiring the technical debt, of course, but they’re also developing the practices and infrastructure needed to support technical debt retirement projects. This dual purpose is what drives the surprisingly high non-labor costs and investments associated with early technical debt retirement projects.
The investments required might include such “items” as a staging environment, which “is a testing environment identical to the production environment” [Humble 2010]; extensive test automation, including results analysis; blue-green deployment infrastructure; automation-assisted rollback; and zero-downtime release infrastructure. Decisions to make investments require an appreciation of their value to the enterprise. They enable the enterprise to deal effectively with the wicked problem of technical debt retirement.
Last words
Because every situation and every organization is unique, few general guidelines are available for making these decisions. The criteria most organizations have been using for dealing with (or avoiding) the issue of technical debt have produced the problems they now face. So, to succeed from this point, whatever criteria they use in the future must be different. My own view is that short-term thinking is at the heart of the problem, but it’s a wicked problem. The long-term solution will not be simple.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
As noted in an earlier post, a technical debt retirement project (DRP) is a project whose primary objective is retirement of a particular kind of technical debt—or particular kinds of technical debt—from a specified set of assets. But those assets might also carry other kinds of technical debt. With respect to a given DRP, we can call these other kinds of technical debt Auxiliary Technical Debt(ATD). Because the presence of ATD can defocus debt retirement projects, it presents a risk that must be anticipated and well understood, if it is to be mitigated.
This post explores concepts and approaches for mitigating the risks associated with the auxiliary technical debt (ATD) of a given technical debt retirement project (DRP). As might already be evident, these initialisms (ATD, DRP, and one more to come) can be difficult to keep straight. Here’s a quick guide: T always means Technical, D always means Debt, R always means Retirement, and P always means Project. Also, if you have a pointing device, and you hover the cursor over the first mention of each initialism in each paragraph, your browser displays the expansion of the term. Touch screen users and keyboarders: sorry, I haven’t yet figured out how to help you in an analogous way, so let me know if you have an idea.
The temptation to retire auxiliary technical debt
Guardrails in a track bed as a rail line crosses a bridge. The guardrails are the inner pair of rails. The rails outside the inner pair are the running rails. Guardrails (also known as check rails) function to keep the wheels of derailed cars from straying too far from their proper locations. This is a useful risk mitigation function in high-risk geometries such as curves. It’s also advantageous even if the probability of risk events is low, as in this straight section of track. It’s a worthwhile measure when the consequences of risk events are extremely costly, as in this case. A derailment on a railway bridge or in steep terrain can result in rail vehicles falling to the earth below, which can cause them to pull other vehicles with them. Derailments under highway overpasses can also be problematic. Such derailments can result in damage to rail or highway bridge structures, resulting in loss of service for periods extending far beyond the time needed to clear the derailment. For this same reason, guardrails are also used in tunnels and tunnel approaches. Because uncontrolled scope expansion can have such devastating effects, we need policy guardrails to control scope expansion when retiring technical debt from assets that contain auxiliary technical debt.
I’ve been using the term TDIQ—Technical Debt In Question—to denote the kinds of technical debt whose retirement is the objective of a given DRP. The ATD of that DRP, then, is the collection of instances of any other kinds of technical debt, of types differing from the TDIQ of the DRP, and which are present in the assets being modified by the DRP. Notice that the property of being auxiliary technical debt is relative. It’s relative to the objectives of a given DRP. A particular instance of technical debt might be ATD for one DRP, and TDIQ for another DRP, depending on the respective objectives of each DRP. Notice also that the ATD of a given DRP can include several different kinds of technical debt.
Let’s now examine a scenario in which ATD can generate risk for a DRP. In this scenario, we’ll consider only one kind of ATD; call it ATD0.
Suppose that several members of the DRP team undertake work to retire the DRP’s TDIQ in a portion of one of the debt-bearing assets. In performing this work, they encounter some instances of ATD0. Studying these instances of ATD0 carefully, they conclude that “fixing” the ATD0 along with the TDIQ in that portion of the asset would be easier and less risky than leaving the ATD0 in place and attending only to the TDIQ. Let’s call their approach the ATD approach. And let’s say that the TDIQ approach is one in which the team addresses only the TDIQ, and leaves in place the ATD0 and all other ATD it finds.
Compared to the TDIQ approach, the advantages of the ATD approach are fairly clear. After the work is complete, in either approach, the asset must be tested and re-certified. In the TDIQ approach, when a subsequent DRP is chartered to retire ATD0, that second DRP team will need to test and re-certify the asset again when it completes its work. In the ATD approach, we can avoid modifying, re-testing, and re-certifying the asset a second time, if we’ve already retired all instances of ATD0 from the asset. Thus, in the ATD approach we can avoid a second round of modification, testing, and re-certification.
Risks associated with retiring auxiliary technical debt
But the ATD approach also has some serious disadvantages.
Enterprise assets might be left in a mixed state
Unless the team plans to retire all instances of ATD0, then upon completion of the DRP, enterprise assets will be in a mixed state. Some will be free of both the TDIQ and ATD0; some will be free of the TDIQ but continue to harbor ATD0. This non-uniformity can create complications for subsequent maintenance, documentation, testing, training, enhancement, automation assist development, and so on.
Complications in testing and re-certification
If test results for the modified assets indicate the possibility of new defects, the cause might be associated with the TDIQ work, or the ATD work, or both. Resolving any issues in the test results is thus more complicated under the ATD approach than it is under the TDIQ approach. Similar considerations affect re-certification. Thus, there is a risk that the ATD approach will complicate interpretation of test and re-certification results.
Questions about the reliability of technical debt inventory data
As noted in an earlier post, for any given DRP, the DRP team needs to know which assets bear that project’s TDIQ. In the TDIQ approach, any data previously or concurrently gathered about the location of instances of ATD0 remains valid, because the TDIQ approach doesn’t retire any instances of ATD0. However, in the ATD approach, such inventory data must be corrected to account for the retirement of whatever instances of ATD0 are retired in the ATD approach. Thus, if ATD0 inventory data has already been collected, or if it’s being collected in parallel with the DRP, the DRP team must take steps to adjust the inventory data regarding locations of ATD0 as it retires instances thereof. There is of course a risk that this will not occur as needed, which can create problems for any subsequent DRP for which the ATD0 is contained in its TDIQ. This can be especially challenging if there are multiple DRPs in process simultaneously, each working on different TDIQs, potentially in different debt-bearing assets, but all encountering and retiring instances of ATD0.
Unconstrained scope creep
Suppose there is a DRP whose objective is retiring its TDIQ, and that it has decided to also retire some (or all) instances of a particular kind of ATD, say ATD0. Although that activity would represent an expansion of scope beyond retiring the TDIQ, it might be acceptable and it might even be prudent. But as the team undertakes to retire ATD0, it might confront a similar quandary relative to the relationship between the ATD0 and yet another kind of ATD, which we might call ATD1. The DRP team might then decide to expand scope again. And so on. In general, there is no self-evident stopping point for such a chain of scope expansion. In these circumstances, scope creep can become an unmitigated risk, threatening the coherence and focus of the DRP, with consequences for its budget and schedule.
Last words
In some cases, some of the ATD might be so intertwined with the TDIQ that retiring some instances of the TDIQ necessarily retires some of the ATD. And in other cases, leaving the ATD in place severely complicates retiring the TDIQ. In still other cases, leaving the ATD in place leaves the assets in a complex state that makes ongoing maintenance or enhancement work more difficult. In these cases, what I called the ATD approach above is plainly the wiser course, compared to the TDIQ approach.
Policymakers have a role to play here. They can develop guidance for DRP teams to apply as they come upon these difficult situations to help them decide whether to take the ATD approach or the TDIQ approach. The military calls this guidance “rules of engagement,” while politicians call it “guardrails.”
Deciding between the ATD and TDIQ approaches on a whim, or on what feels right at the moment, inevitably leads to a chaos of inconsistency and scope creep. The safest course is to adopt wise policy—rules of engagement—and to adjust them as the organization learns more and more about retiring technical debt from its assets.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
When we first set out to plan a large technical debt retirement project (DRP), a question that arises very early in the planning process is this: Which assets are carrying the kind of technical debt we want to retire? And a second question is: Which operations will be affected—and when—by the debt retirement work? Although these questions are clear, and easily expressed, the answers might not be. And the answers are important. So where is the technical debt?
Determining which of the enterprise’s many technological assets might be carrying the Technical Debt In Question (TDIQ) can be a complex exercise in itself, because inspecting the asset might be necessary. Inspection might require temporarily suspending operations, or determining windows of time during which inspection can be performed safely and without interfering with operations. Further, inspection might require knowledge of the asset that the DRP team doesn’t possess. Moreover, access to the asset might be restricted in some way. In these cases, staff from the unit responsible for the asset must be available to assist with the inspection.
Although asset inspection might be necessary or preferable, it might not be sufficient for determining which assets are carrying the TDIQ. This is easy to understand for physical assets, like, say, determining the release version of the firmware of the hydraulic controller electronics of a tunnel boring machine. But asset inspection might also be insufficient for purely software assets. For determining the presence of the TDIQ in software assets, reading source code might not be sufficient or efficient. It might be easier, faster, and more accurate to operate the asset under special conditions. For example, an inspector might want to provide specific inputs to the asset and then examine its responses. As a second example, we might use automation assistance to examine the internal structure of the asset, searching for instances of the TDIQ. And as with other assets, the assistance of the staff of the unit responsible for the asset might be necessary for the inspection.
Which enterprise operations depend on debt-bearing assets?
Knowing which assets bear the TDIQ is useful to the DRP team as it plans the work to retire the TDIQ. But part of that plan could include service disruptions. If so, it’s also necessary to determine how those disruptions might affect operations, to enable the team to control the effects of the disruptions, and negotiate with affected parties. Thus for each asset that bears the TDIQ, we need to determine what operations would be affected if the asset is removed from service temporarily.
Observing actual operations in conditions in which the asset is out of service in whole or in part might be the only economical way to discover which enterprise functions depend on the assets that carry the TDIQ. Other techniques include examining historical data such as trouble reports and outstanding defect lists, and correlating them across multiple asset histories and operations histories.
In some cases, these investigations produce results that have a limited validity lifetime, owing to ongoing evolution of the debt-bearing assets and the assets that interact with them. For that reason, the actual work of retiring the TDIQ must begin as soon as possible after the inventory is complete, and possibly even before that. This suggests that the size of the DRP team is a critical success factor, because size enables the team to complete the inventory inspections rapidly, before the end of the validity lifetime of the team’s research results.
Managing teams of great size is a notoriously difficult problem. For this reason, delegating some of the DRP research effort directly to the business units that own the assets in question can provide the labor hours and expertise needed for the research. In this way, the DRP can deploy a team-of-teams structure, known as a Multi-Team System (MTS) [Mathieu 2001] [Marks 2005]. The DRP team can then bring to bear a large force in a way that renders the overall MTS manageable.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
[Marks 2005] Michelle A. Marks, Leslie A. DeChurch, John E. Mathieu, Frederick J. Panzer, and Alexander Alonso. “Teamwork in multiteam systems,” Journal of Applied Psychology 90:5, 964-971, 2005.
[Mathieu 2001] John E. Mathieu, Michelle A. Marks and Stephen J. Zaccaro. “Multi-team systems”, in Neil Anderson, Deniz S. Ones, Handan Kepir Sinangil, and Chockalingam Viswesvaran, eds., Handbook of Industrial, Work, and Organizational Psychology Volume 2: Organizational Psychology, London: Sage Publications, 2001, 289–313.
Before designing a project to retire some portion of the technical debt borne by a critical, irreplaceable asset, it’s best to acknowledge that the project design problem is very likely a wicked problem in the sense of Rittel and Webber [Rittel 1973]. (See my post “Retiring technical debt can be a wicked problem”) In the series of posts of which this is the first, I suggest some basic preparations that form a necessary foundation for success in approaching the problem of designing projects to retire technical debt in irreplaceable assets.
A map of the U.S. Interstate Highway System. The map shows primary roadways, omitting most of the urban loop and spur roads that are actually part of the system. In 2016, the total length of highways in the system was about 50,000 miles (about 80,000 km). About 25% of all vehicle miles driven in the U.S. are driven on this system. The cost to build it was about USD 500 billion in 2016 currency. Given the advances since the 1950s in technologies such as rail, electronics, data management, and artificial intelligence, and given the effects of petroleum combustion on global climate, one wonders whether such a system would be the right choice if construction were to begin today. If alternatives would be better, then this system might be regarded as technical debt. But replacing it might not be practical. Finding a way to retire the technical debt without replacing the entire asset might be the most viable solution. Image by SPUI courtesy Wikipedia.
As I’ve noted in previous posts, the problems associated with retiring technical debt can be wicked problems. And if some of these problems aren’t strictly wicked problems, they can possess many of the attributes of wicked problems in degrees sufficient to challenge the best of us. That’s why approaching a technical debt retirement project as you would any other project is a high-risk way to proceed.
For convenience and to avoid confusion, in my last post I adopted the following terminology:
DRP is the Debt Retirement Project
DDRP is the effort to design the DRP
DBA be the set of Debt Bearing Assets undergoing modification in the context of the DRP
IA is the set of assets, excluding the DBA assets, that interact directly or indirectly with assets in the DBA
In the posts in this thread, convenience demands that we add at least one more shorthand term:
TDIQ is the Technical Debt In Question. That is, it’s the kind of technical debt we’re trying to retire from the assets among the DBA. Other instances of the TDIQ might also be found elsewhere, in other assets, but retiring those instances of the TDIQ is beyond the scope of the DRP.
Know when and why we need to retire technical debt
For those technical debt retirement projects that exhibit a high degree of wickedness, clearly communicating the mission of the DRP is essential to success. The DRP team will be dealing with many stakeholders who are in the early stages of familiarity with the term technical debt. Some of them might be cooperating reluctantly. Expressing the objectives and benefits of the DRP in a clear and inspiring way will be very helpful. With that in mind, I offer the following reminder of the reasons for tackling such a large and risky project that produces so few results immediately visible to customers.
Examining alternatives to retiring the TDIQ is a good place to begin. One alternative is simply letting the TDIQ remain in place. Call this alternative “Do Nothing.” A second alternative to retiring the TDIQ is replacing the debt-bearing asset with something fresh and clean and debt-free. Call this alternative “Replace the Asset.” The problem many organizations face is that they cannot always rely on these alternatives. In some circumstances, the only viable option is debt retirement. And because these two alternatives to debt retirement aren’t always practical, some organizations must develop the expertise and assets necessary to retire widespread technical debt in large, critical, irreplaceable systems. Below is a high-level discussion of these alternatives to debt retirement.
Do Nothing
The first alternative is to find ways to accept that the DBA will continue to operate in their current condition, bearing the technical debt that they now bear. This alternative might be acceptable for some assets, including those that are relatively static and which need no further enhancement or extension. This category also includes those assets the organization can afford to live without.
One disadvantage of the “Do Nothing” approach is that technology moves rapidly. What seems acceptable today might be, in the very near future, old-fashioned, behind the times, or non-compliant with future laws or regulations. Styles, fashions, technologies, laws, regulations, markets, and customer expectations all change rapidly. And even if the asset doesn’t change what it does, the organization might need to enhance it in ways that become very expensive to accomplish due to the technical debt the asset carries.
For these reasons, Do Nothing can be a high-risk strategy.
Replace the Asset
The second alternative to retiring the TDIQ is to replace the entire asset. For this option, the question of affordability arises. In some instances this alternative is practical, but for many assets, the organization simply cannot afford to purchase or design and construct replacements. And for those assets that “learn”, and which contain data gathered from experience over a long period of time, retiring the asset can require developing some means of recovering the experience data and migrating it to the replacement asset—a potentially daunting effort in itself.
Replacement is especially problematic when the asset is proprietary. If the organization created the asset itself, they might have constructed it over an extended period of time. Replacement with commercial products will require extensive adaptation of those products, or adaptation of organizational processes. Replacement with assets of its own making will likely be costly.
Thus, when organizations depend on assets that they must enhance or extend, and which they cannot afford to replace in their entirety, they must develop the expertise and resources needed to address the technical debt that such assets inevitably accumulate.
This series of posts explores the issues that arise when an organization undertakes to retire the technical debt that its irreplaceable assets are carrying. Below, I’ll be inserting links to the subsequent posts in this series.
[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
[Marks 2005] Michelle A. Marks, Leslie A. DeChurch, John E. Mathieu, Frederick J. Panzer, and Alexander Alonso. “Teamwork in multiteam systems,” Journal of Applied Psychology 90:5, 964-971, 2005.
[Mathieu 2001] John E. Mathieu, Michelle A. Marks and Stephen J. Zaccaro. “Multi-team systems”, in Neil Anderson, Deniz S. Ones, Handan Kepir Sinangil, and Chockalingam Viswesvaran, eds., Handbook of Industrial, Work, and Organizational Psychology Volume 2: Organizational Psychology, London: Sage Publications, 2001, 289–313.
Several properties of the problem of designing technical debt retirement projects tend to make those design problems more likely to be wicked problems—that is, more likely to satisfy all ten of the criteria of Rittel and Webber [Rittel 1973]. I call these properties indicators of wickedness.
The interchange between Interstate 35 and U.S. Route 30, just outside Ames, Iowa. The new flyover ramp is indicated in red. It replaces the cloverleaf ramp in the upper right quadrant of the cloverleaf. A construction error has forced a delay, while the piers of the flyover ramp bridge are corrected. The cloverleaf, which was designed with curves a bit too tight, was a high-accident area, and thus constituted a technical debt, with the ongoing vehicle accidents comprising the metaphorical interest charges. The construction error in the flyover ramp piers necessitated a rollback. Rollbacks often indicate wickedness in the projects to design technical debt retirement projects, but in this case, the indicated wickedness is not much greater than was anticipated by the project designers. Construction drawing by Iowa Department of Transportation [Iowa DOT 2016].Although we usually have some notion of the degree of wickedness of a given design effort for a technical debt retirement project, actually executing the debt retirement project can reveal unanticipated issues and complexity. Some of what’s revealed can cause us to adjust our estimate of the degree of wickedness of the design effort. If we know in advance what kinds of revelations are most likely to cause such adjustments, we can reduce the incidence of unanticipated revelations.
As I noted in my post, “Degrees of wickedness,” we can regard all problems as lying on a Tame/Wicked spectrum, with wicked problems lying at the extreme Wicked end of the spectrum, and the tamest of the tame lying at the opposite end. As for the ten criteria of wickedness developed by Rittel and Webber, I proposed that they could be satisfied in degrees, with the most wicked problems satisfying all ten criteria absolutely.
As a quick review, here are the attributes of wicked problems as Rittel and Webber see them [Rittel 1973], rephrased for brevity:
There is no clear problem statement
There’s no way to tell when you’ve “solved” it
Solutions aren’t right/wrong, but good/bad
There’s no ultimate test of a solution
You can’t learn by trial-and-error
There’s no way to describe the set of possible solutions
Every problem is unique
Every problem can be seen as a symptom of another problem
How you explain the problem determines what solutions you investigate
The planner (or designer) is accountable for the consequences of trying a solution
Below is a sample of conditions or situations that tend to increase the wickedness of the problem of designing a technical debt retirement project. I have no data to support these conjectured effects. But the principles I used to generate them are three:
If a phenomenon expands the set of stakeholders in a debt retirement project, it tends to enhance the wickedness of the design problem.
If a phenomenon increases the number or heterogeneity of the assets or processes that must be considered, it tends to enhance the wickedness of the design problem.
If a phenomenon creates a need for a rollback of work performed as part of the debt retirement project, and that rollback creates a need to re-design the debt retirement project, it tends thereby to enhance the wickedness of the design problem.
In what follows, I use the term “DRP” to indicate the Debt Retirement Project itself, as distinguished from the effort to design the DRP, which I refer to as “DDRP.” The problem whose wickedness we’re considering is not the DRP itself, but the DDRP. Also, let DBA (for debt-bearing assets) be the set of assets undergoing modification in the context of the DRP, and let IA (for interacting assets) be the set of assets, excluding the DBA assets, that interact directly or indirectly with assets in the DBA.
With all this in mind, I offer the following nine examples of indicators of wickedness of the DDRP.
1. A previous attempt to retire this debt was abandoned
Perhaps the most significant indicator of the wickedness of the DDRP is either the failure of a previous attempt to execute a DRP with similar objectives, or the failure of a previous attempt to execute a DDRP for a DRP with those objectives. There are two reasons why such failures are significant indicators of wickedness.
First, it’s reasonable to assume that these previous attempts weren’t founded on any recognition of the wickedness of the DRP or the DDRP. Few such efforts are. (A Google search for the two phrases “technical debt” and “wicked problem” yields less than 1000 results) (update 12 Nov 2018: 1160 results) Consider first the DDRP. If it is a wicked problem, proceeding as if it were not would very likely fail. If the designers of the previous DDRP did assume that it was a wicked problem, investigating their approach could prove invaluable, and save much time and effort. An analogous argument applies for the DRP itself.
Second, if the previous attempt to execute a DRP with similar objectives has left traces of itself in the DBA, and if those traces must be taken into account while executing the DDRP, they might complicate the DRP, and they might be incompletely addressed in the DDRP. To the extent that these conditions prevail, Criterion 5 is satisfied, and the DDRP exhibits wickedness.
2. Some revenue streams need to be interrupted
If the work of the DRP entails temporary interruption of revenue streams, executing the DRP can have significant and long lasting effects on the organization. In estimating the cost of the DRP, it’s clearly necessary to account for the financial impact of any revenue shifted into the future, and any revenue irretrievably lost as well. And in some cases, market share might also suffer. All of these factors tend to increase the wickedness of the DDRP.
When these effects are expected, political opposition to the DRP can develop. Senior management can prevent this opposition from halting the DDRP inappropriately by requiring that the business case for the DRP include these financial factors and demonstrate clearly the need to proceed despite them. Involving potential political opponents of the DRP in business case development can be an effective means of ensuring the strength of the business case.
The ability to model all these financial effects is an important organizational asset that can be developed and maintained, for deployment across multiple DDRPs. The organization can monitor DRPs, gathering actual experience data for comparison to the effects projected in the respective business cases of the DRPs. Those comparisons are useful for enhancing the modeling capability.
3. Some assets not directly touched need to be re-tested
The DDRP is more likely to be a wicked problem if, as a result of the changes executed in the DRP, any of the assets in IA need to be re-tested after or during DRP execution. The need to re-test any assets in IA typically arises when there’s some risk that the DRP’s changes in the DBA could somehow affect the performance of the assets in IA, and when the consequences of such a risk event are severe.
This scenario enhances of the wickedness of the DDRP for at least five possible reasons.
Baseline testing of IA is necessary to enable the DRP team to recognize the effects of the DRP on IA behavior. But this baseline testing can reveal pre-existing and unaddressed faults. Because leaving those faults in place can seriously complicate interpretation of anomalies that appear in IA assets after DRP work has begun, the DDRP team might insist that the owners of the IA assets in question address some of these faults. With regard to these issues, political differences between the DDRP team and the owners of IA assets are possible.
The additional testing of IA assets tends to expand dramatically the set of stakeholders affected by the DRP, to include the owners, users, and maintainers of the IA assets.
The additional testing of IA assets can increase the need to interrupt revenue streams temporarily, and increase the number, duration, and frequency of such interruptions.
The additional testing of IA assets can require expertise and staffing beyond the DRP project team, which can disrupt other elements of the organization as the people needed are temporarily assigned to IA testing.
The additional testing of IA assets can reveal unanticipated consequences of the DRP alterations, which can trigger re-planning or re-design of the DRP during its execution. That re-planning or re-design, in turn, can trigger alterations in the DDRP.
The need to re-test assets not directly touched in the DRP is more likely when the DRP alters the external behavior of any of the DBA assets. The goal of many DRPs is improvement of the internals of assets without altering their external behavior, except possibly for performance improvements. This goal is desirable because it limits the need for re-testing and re-certification of IA assets. However, some kinds of technical debt appear in the externals of the DBA assets, including their architecture, behavior, appearance, or interfaces. Compared to DRPs that do not alter the externals of DBA assets, retiring the kinds of technical debt that alter the externals of DBA assets is inherently more difficult and more risky because it requires more extensive re-testing and re-certification of both DBA and IA assets.
4. Multiple sites are directly touched
When the DRP entails modification of technological assets of geographically dispersed organizations, one consequence is a tendency to increase the wickedness of the DDRP. This comes about because of factors including the following:
Sites might be dispersed not only geographically, but they might also be separated by language boundaries, legal jurisdictions, cultural divides, time zones, and much more. The required work can vary from site to site for technical reasons and because of variations arising from these non-technical factors.
The multiple sites might have different landlords, with different lease agreements governing the organization’s occupancy of the property. This is just one of many factors that increase the numbers of stakeholders and exacerbate their heterogeneity. And the leases might constrain the kind of work that can be performed according to the day of the week or time of day.
If local vendors provide services such as communications or Internet connections to some of the sites, and if the work of the DRP involves these technologies and the local vendors, the task of coordinating all the different players can be complex and riddled with unanticipated obstacles.
For example, if the work involves networking hardware and software, work that we might prefer to perform at night or on a weekend might need to be carried out during business hours for some of the sites. For a global enterprise, there might not be a time of day when no sites are conducting regular business.
As a second example, consider a network upgrade for the retail branch offices of a global bank. If that upgrade requires trenching for new cable connections, the project design must take into account local regulations governing the trenches, including how they must be permitted, dug, re-filled, covered, and marked while still open. These regulations vary with national and sometimes local jurisdiction. The complexity causes most organizations to rely on local vendors, but even then, the vendor selection process must include reliable vendor assessment and evaluation. Scheduling becomes a complex and risky endeavor.
For these reasons, a DDRP that involves technological assets housed at multiple sites geographically dispersed has an elevated probability of exhibiting the properties of a wicked problem.
5. Government agencies and/or industrial standards organizations must re-certify assets
Another driver of stakeholder expansion is the need for re-certification of assets after they’ve been modified in the course of executing the DRP. The certification agencies can range from local and municipal regulators to national regulators and pan-industrial standards organizations. The sheer number of possibilities is itself a contributor to increased wickedness, but the nature of the operating style of these organizations merits special notice.
For the most part, these agencies operate without competitors, whether they are government elements or private organizations. Perhaps for this reason, “customer service” might not be their strength, and gaining timely cooperation from them might be a challenging undertaking. Even though re-certification might be a small part of the DRP, it can easily become a blocking obstacle. Researching these requirements and their associated lead times, and maintaining a current knowledge base about them, can be a non-trivial task of the DDRP.
6. Non-technical stakeholders must change their behavior
Generally, people don’t like to change how they do their jobs. There are exceptions, of course, if they recognize a benefit that arrives in some immediate and visible way. But unless there is a recognizable benefit, requiring people to change their work patterns as part of a DRP is likely to increase the wickedness of the DDRP. And the difficulty is more problematic if the people affected are technically unsophisticated, because they’re less likely to appreciate the value of managing technical debt, and less likely to accept explanations of that value when those explanations are offered.
DDRP wickedness increases in this case because, in addition to retiring the technical debt, it must address the tasks of motivating and training the affected population. That requires preparing materials, scheduling and accounting for the time spent in training, and monitoring training effectiveness. The business case must also address these issues, but in addition, it must provide evidence required to defuse any political opposition that might otherwise develop.
7. Discovery of major unanticipated complexity triggers re-design
Unanticipated complexity happens in almost every project of almost any kind. But for DDRPs, unanticipated complexity significant enough to trigger significant adjustment or re-design of the DRP during execution of the DRP is another matter. Such a discovery can mean that the DBA assets or their connections to the IA assets have changed since the plan was developed. Or it can mean that the design team had an incomplete or incorrect understanding of the problem they were trying to solve. These events can occur for a number of reasons.
Technical causes are perhaps more easily imagined, so I’ll focus on non-technical causes, which can actually be more serious. For example, suppose that a political alliance enabled the VP of Sales and the VP of Engineering to reach a deal that enabled the DRP team to work on some DBA assets critical to the Sales function, by taking them off line for defined periods. If that political alliance weakens, or if the deal between the two VPs collapses for some other reason, the scheduled downtime of those assets might vanish or be shortened dramatically. This pattern is more likely to arise in situations in which the DDRP team is not a party to such agreements. The DDRP team must be a party to any agreements regarding access to assets by the DRP team.
As a second example, consider what happens when the enterprise undertakes an acquisition that wasn’t revealed to the DDRP team during their design effort. Because chances are good that the DDRP would have a significant amount of re-work to do in such cases, the DDRP team must be kept informed of any organizational changes that could affect the DRP, for the active life of the DRP.
Re-designing the DDRP can take time. Elements of the DDRP that have short shelf lives must be revisited during re-design. And the need to re-design can also indicate gaps in the DDRP team’s understanding of the problem.
All of these conditions tend to move the DDRP in the direction of increased wickedness.
8. Weekend or middle-of-the-night work periods are required
The need to perform critical operations on weekends or in nighttime hours suggests three things. First, the work is risky in the sense that undetected faults that go into production can lead to costly operational errors. Second, the organization lacks a simulated operating environment that emulates the actual operating environment faithfully enough to enable detection of errors before deployment. Third, and finally, the organization lacks a rapid rollback mechanism that can restore the original state of an asset if the new modified state proves problematic when deployed.
These last two factors—the lack of a simulated operating environment and the lack of a rapid rollback mechanism—should be corrected if multiple DRPs are anticipated. Cost is usually the blocking issue. However, that cost must be compared against the cost of retarding all future DRPs, and the cost of any operational failures arising from deploying faulty systems.
Continued refusal to provide a simulated operating environment with rapid rollback increases the wickedness of this and any future DDRPs.
9. Rollback(s) of attempted changes triggers re-design
A rollback is an incident in which, in the course of executing the DRP, it has become necessary to revert some (or all) of the work that has already been performed. Minor rollbacks do happen. But a major rollback, or a rollback the necessity of which is discovered long after completion of the work in question, could be an indication of a deep misunderstanding of the consequences of the work involved. Because that misunderstanding could have consequences not yet recognized, such a rollback could suggest that the wickedness if the DDRP had been underestimated.
Let DBAf (faulty DBA) be the set of assets in the DBA that formerly contained some of the debt being retired, and which had been altered as part of the DRP, and whose alterations contained or led to exposure of some kind of fault(s) that forced a rollback after they were deployed. Let DBAfw (wicked-faulty DBA) represent the subset of DBAf for which that rollback did trigger a re-design of the DDRP. Then wickedness of the DDRP is correlated with the size of DBAfw and the extent of the DDRP re-design that the rollback triggered.
For example, let Efw be a member of DBAfw. And suppose that Efw is a modular element of a system that monitors the click-through behavior of users of a Web site. It records data for later analysis, and because of the fault it does so incorrectly. When the errors are discovered, the module is withdrawn and replaced by the original, unaltered, debt-bearing form. Because Efw contaminated the original database, data rollback is impossible. The error was discovered, but there is no way to re-capture the data that has been lost. That’s why the DDRP (and after that, possibly the DRP) must be re-designed. This scenario is an example of Criterion 5. If there are political consequences for the loss of data, it could be an example of Criterion 10.
This example suggests how the frequency of incidents that trigger re-design of the DDRP can be an indicator of the wickedness of the DDRP.
Example of a non-indicator: the I-35 SR-30 interchange near Ames, Iowa
Just outside Ames, Iowa, is an interchange between Interstate 35 (a four-lane, divided, limited-access roadway) and U.S. Route 30 (also four-lane, divided, but not limited-access). The interchange is a conventional cloverleaf design. The “leaves” are rather tight, though, and consequently, there have been numerous rollovers and crashes at this interchange. We can regard these tight cloverleaf ramps as technical debt in the highway system, and the rollovers and crashes as metaphorical interest charges on that debt.
In 2016, construction began on a new “flyover” exit ramp from northbound Interstate 35 onto westbound U.S. Route 30. The objective was to reduce the number of accidents at the interchange by replacing the current tight-curvature cloverleaf ramp with a flyover exit ramp with a longer radius of curvature. We can regard this project as a Debt Retirement Project (DRP). The project that planned that DRP was an effort to Design a Debt Retirement Project (DDRP).
Completion of the DRP was scheduled for November 2018. When completed, the new ramp will replace the northeast leaf of the cloverleaf. Like most civil engineering projects, this project does have some elements of wickedness, but they were dealt with effectively. Nevertheless, a construction error is delaying completion [Magel 2018] [Iowa DOT 2018]. The error involves the height and position of the bolt anchors where steel bridge beams will connect to the concrete piers of the new flyover ramp. Six piers have been constructed to support the flyover. Those piers are being corrected by jackhammering the concrete tops, leaving the steel reinforcement in place. Then the beam anchors are positioned correctly, and concrete re-poured. The length of the delay in completion hasn’t yet been announced.
This effort, which constitutes a rollback and re-deployment, is a significant project in itself. It requires scheduling the work to be performed, but it also requires scheduling highway lane closures and lane shifts, working around high-volume traffic periods, and possibly pouring concrete in winter conditions. And after the piers are corrected, the bridge beam placement and bridge roadbed work must proceed on a new schedule.
Consequently, the construction error triggered a redesign of the flyover project’s DRP. But it probably did not trigger a significant re-design of the DDRP. The construction error is therefore unlikely to be an indicator of significant additional wickedness for the DDRP.
Last words
You can become better managers of the risk of unanticipated wickedness. If your organization is embarking upon a long-term program of technical debt retirement, you’ll be executing many DDRPs and DRPs. Gathering data about incidents of unanticipated wickedness in DDRPs can be a useful practice, if you use that data when you design new technical debt retirement projects.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.
[Fowler 1999] Martin Fowler, Kent Beck (Contributor), John Brant (Contributor), William Opdyke, Don Robert, Erich Gamma (Foreword). Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional; first edition (July 8, 1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
[Iowa DOT 2018] “Work Continues on the Northbound I-35 Flyover Ramp at U.S. 30 Near Ames,” Iowa Department of Transportation News Release, June 27, 2018.
[Marks 2005] Michelle A. Marks, Leslie A. DeChurch, John E. Mathieu, Frederick J. Panzer, and Alexander Alonso. “Teamwork in multiteam systems,” Journal of Applied Psychology 90:5, 964-971, 2005.
[Mathieu 2001] John E. Mathieu, Michelle A. Marks and Stephen J. Zaccaro. “Multi-team systems”, in Neil Anderson, Deniz S. Ones, Handan Kepir Sinangil, and Chockalingam Viswesvaran, eds., Handbook of Industrial, Work, and Organizational Psychology Volume 2: Organizational Psychology, London: Sage Publications, 2001, 289–313.