When we transform assets in order to retire some of the technical debt they carry, service disruptions are sometimes necessary. To minimize service disruptions while technical debt retirement efforts are underway, it’s advantageous to automate some procedures. Automation-assisted technical debt retirement provides two important benefits: limited disruption of operations and error avoidance.
I’m using the concept of automation a bit loosely here. I don’t mean to imply that these procedures are autonomous. What I mean is that engineers working on technical debt retirement projects have available an array of tools wide enough to enable them to perform many operations with a minimum of thought. For example, when a test is automated in this sense of automated, an engineer can issue a command such as, “Test Module Alpha Using Test Suite Delta,” which results in the execution of a set of predefined tests. Following execution, the appropriate engineers are notified of the results, and the results are recorded in the proper places. If the results are anomalous, engineers can then take appropriate action.
Benefits of automation-assisted technical debt retirement
The I-35W Bridge collapse, day 4, Minneapolis, Minnesota, August 5, 2007. The proximate cause of the collapse was underweight gusset plate design, which made the bridge vulnerable to the increased static load due to concrete road surfacing additions over the years, and to the weight of construction equipment and supplies during a repair project that was then underway [NTSB 2008]. When we conduct maintenance or technical debt retirement projects involving assets that must remain operational during project execution, we risk stressing the asset in ways that extend beyond its safe operation envelope. The National Transportation Safety Board found that this occurred in the case of the I-35W bridge collapse. These effects are more difficult to imagine in software systems, but they can occur when load is shifted from the systems undergoing modification to other systems that can then become overloaded. Or these effects can occur when load is shifted not from one asset to another, but from one time window to another on the same asset, resulting in high loads in some time windows. Photo by Kevin Rofidal, United States Coast Guard, courtesy Wikimedia Commons.The more obvious benefit of automated procedures is speed. For example, an asset removed from service for testing can be returned to service more quickly if the testing is automated. And if trouble erupts during operations when a newly transformed asset is placed into service, the untransformed asset can be swapped back into its place quickly if insertion procedures and removal procedures (roll-out and roll-back) are automated. Tools for releasing newly transformed assets, and rolling back to the previous release if necessary, provide another example of automation assist. These tools are just a few of the many elements of a set of practices collectively known as continuous delivery [Humble 2010].
The second benefit of this kind of automation is error avoidance. For example, inconsistent or incomplete testing can fail to find errors and defects, and that leads to rework and further disruptions. Another way to generate trouble: performing tests incorrectly, and therefore finding “defects” that aren’t there. Automated procedures are much less prone to error, if they’re periodically maintained, tested, and certified. For example, if testing a module at a certain level requires running a suite of tests, engineers needn’t remember (or take time to look up) how to prepare the asset for tests, how to run the tests, or what the members of the test suite are. Long advocated as an essential element of sound engineering practice, test automation can avoid some of these problems. But it’s far short of a panacea [Bach 1999].
Other automation opportunities
Sometimes debt retirement itself can be automated. When we can retire instances of the technical debt in question by performing an automated transformation on an asset, the transformation is faster and more reliable.
The most important form of automation associated with technical debt retirement is automation-assisted regression testing. Investments in thorough and focused regression testing have potentially shockingly high returns in the debt retirement context, and in the contexts of development and routine maintenance.
To perform a regression test on an asset that has undergone some kind of change (or whose context has undergone change) is to operate, employ, measure, or inspect the asset under a specified set of conditions to determine whether those changes caused the asset to fail to meet some standard that it had previously met before the change. That is, a regression test is a procedure that determines whether the asset has regressed as a result of the change. Automated or automation-assisted regression tests enable the members of the debt retirement project team to detect problems in assets that they’ve transformed before the business units that depend on those assets encounter problems during their potentially expensive operations [Ge 2014].
Many of these same regression tests can also be useful during enhancement and ongoing maintenance of the asset. In many instances, investing in automated regression tests well in advance of the debt retirement project can enhance development and maintenance performance relative to those assets. Later, when the debt retirement project begins, the previously obtained results of regression tests will already be available.
For some debt retirement projects, specially created automated regression tests might be beneficial. Assigning engineers to automation tool development for debt retirement projects is probably the best way to support these needs.
Last words
These automation capabilities are unlikely to be available commercially, because they’re so specialized to the asset being tested. And because general applicability is unnecessary, building them in-house is both practical and economical, if the necessary skills are available. These investments can be justified economically if we take into account the savings resulting from reduced service disruptions for the debt-bearing assets.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
Decisions to retire the legacy technical debt carried by irreplaceable assets are not to be taken lightly. As decision makers gather information and recommendations from all around the organization, most will discover that information and recommendations aren’t sufficient for making sound decisions about technical debt retirement. The issues are complex. Education is also needed. It’s entirely possible that in some organizations, confronted with a set of decisions regarding legacy technical debt retirement in irreplaceable assets, the existing executive team might be out of its depth. To understand how this situation can arise, let’s explore the nature of legacy technical debt retirement decisions.
A common technical debt retirement scenario
What compels the leaders of a large enterprise to consider retiring the technical debt encumbering one of its irreplaceable assets is fairly simple: cost. Decision makers usually begin by investigating the cost of replacing the asset—the option I’ve oh-so-cleverly called “Replace the Asset.” They then typically conclude that replacement isn’t affordable. At this point, many decision makers choose the option I’ve called “Do nothing.” Time passes. A succession of incidents occurs, in which repairs to the asset or enhancements of the asset are required. And I use the term required here to mean “essential to the viability of the business.”
Two alternatives to retiring legacy technical debt in irreplaceable assets. Neither one works very well.
Engineers then do their best to meet the need, but the cost is high, and the work takes too long. The engineers explain that the problems are due, in part, to the heavy burden technical debt in this particular asset. Eventually the engineers are asked to estimate the cost of “cleaning things up.” Decision makers receive the estimates and conclude that it’s “unaffordable right now.” They ask the engineers to “make do.” In other words, they stick with the Do Nothing option.
After a number of cycles repeating this pattern, decision makers finally agree to provide time and resources for technical debt retirement, but only because it’s the least bad alternative. The other alternatives—Replace the Asset, and Do Nothing—clearly won’t work and haven’t worked, respectively.
So there we are. The organization has been forced by events to address the technical debt problems in this irreplaceable asset. And that’s where the trouble begins.
Decisions about retiring legacy technical debt
In scenarios like the one above, the fundamental decision has already been made: the enterprise will be retiring legacy technical debt from an irreplaceable asset. But that’s just the first ripple of waves of decisions to come, made by many people in a variety of roles throughout the enterprise. Let’s now have a look at a short catalog of what’s in store for such an enterprise.
Recall that most large technical debt retirement projects probably exhibit a high degree of wickedness in the sense of Rittel and Webber [Rittel 1973]. One consequence of this property is the need to avoid do-overs. That is, once we make a decision about how to proceed to the next bit of the work, we want that decision to be correct, or at least, good enough. It should not leave the enterprise in a state that’s more difficult to resolve than the state in which we found it. Since another property of wicked problems is the prevalence of surprises, most decisions must be made in a collaborative context, which affords the greatest possibility of opening the decision process to diverse perspectives. We must therefore regard collaborative decision-making at every level as a highly valued competency.
What follows is the promised catalog of decision types.
This decision category leads the list because it provides the highest leverage potential for changing enterprise behavior vis-à-vis technical debt. Organizations that are confronting the problem of technical debt retirement from irreplaceable assets would do well to begin by acknowledging that although they might be able to devise tactics for dealing with the debt burdening these assets right now, they must make a strategic change if they want to avoid a recurrence. Accumulating debt to a level sufficient to compel chartering a major debt retirement project took time. It took years of deferring the inevitable. A significant change of enterprise strategy is necessary.
When changing complex social systems, applying the concept of leverage provides a critical advantage. In this instance, following the work of Meadows [Meadows 1997] [Meadows 1999] [Meadows 2008], we can devise interventions at several points that can have great impact on both the level of technical debt and its rate of accumulation. The leverage points of greatest interest are Feedback Loops, Information Flows, Rules, and Goals. For example, the enterprise can set a strategic goal of a specific volume of incremental technical debt incurred per project, normalized by project budget, as I discussed in the post, “Leverage points for technical debt management.”
One might reasonably ask why enterprise strategy must change; wouldn’t a change in technology strategy suffice? Changing how engineers go about their work would help—indeed in most cases it’s necessary. But because the conditions and processes that lead to technical debt formation and persistence transcend engineering activities, additional changes are required to achieve the objective of controlling technical debt.
Some technical debt is strategic—it’s incurred as the result of a conscious business decision. But some is non-strategic. We might even be unaware of how it occurred. However, both kinds of technical debt can arise as a result of non-technical factors. Read a review of non-technical precursors of non-strategic technical debt.
Organizational decisions
Before chartering a technical debt retirement project (DRP) for an irreplaceable asset, or a group of irreplaceable assets, it’s wise to consider how to embed that project in the enterprise.
The default organizational form for debt retirement projects concerned with an asset A is usually the same form that would be used for major projects focused on asset A. If the Information Technology (IT) unit would normally address issues in A, the debt retirement effort usually would be organized under IT. If A is a software product normally attended to in a product group, that same group would likely have responsibility for the DRP for asset A.
Although these default organizational structures are somewhat sensible, both technically and politically, there’s an alternative approach worth investigating. It entails establishing a technical debt retirement function that becomes a center of excellence for executing technical debt retirement projects, and for developing and injecting sound technical debt management practice into the enterprise. Such an approach is especially useful if multiple debt retirement projects are needed.
The fundamental concept that makes the center-of-excellence approach necessary is the wickedness of the technical debt retirement problem. To address the problem at scale requires capabilities beyond what IT can provide; beyond what product units can provide; indeed, beyond what any of the conventional organizational elements can provide. The reason for this is that the explosion of technical debt in most organizations is an emergent phenomenon. Every organizational unit contributed to the formation of the problem. And every organizational unit must contribute to its resolution.
A technical debt center of excellence is an approach that might be capable not only of synthesizing the expertise of all elements of the enterprise, but also might be capable of bringing new approaches into the enterprise from external sources.
Engineering decisions
Engineers have a tendency to identify and classify technical debt items on technical grounds. Further, they tend to set technical debt retirement priorities on a similar basis. That is, they’re inclined to set priorities highest for those debt items that they (a) recognize as debt items and (b) see as imposing high levels of MICs charged to engineering accounts. Engineers are less likely to assign high priorities to technical debt that generates MICs that are charged to revenue, or to other accounts, because those MICs are less evident—and in many cases invisible—to engineers.
Decisions regarding recognition of technical debt items and setting priorities for retiring them must take technological imperatives into account, but they must also account for MICs of all forms. Priorities must be consistent with enterprise imperatives.
Decisions about pace
Paraphrasing Albert Einstein, technical debt retirement projects should be executed as rapidly as possible, and no faster. The tendency among non-engineers and non-technical decision-makers is to push for rapid completion of debt retirement projects, for three reasons. First, everyone, like the engineers, wants the results that debt retirement will bring. Second, everyone, like the engineers, wants an end to the inevitable disruptions debt retirement projects cause. And finally, the longer the project is underway, the more it might cost.
For these reasons, once the decision to retire the debt is firmly in hand, the enterprise might have a tendency to apply financial resources at a rate that exceeds the ability of the project team to execute the project responsibly. When that happens, rework results. And for wicked problems like debt retirement, rework is the path to catastrophe.
Decisions about pace and team scale need to be regarded as tentative. Regular reviews can ensure that the resource level is neither too low nor too high. Even when the engineers are given control over these decisions, they must be reviewed, because pressures for rapid completion can be so severe that they can compromise the judgment of engineers about how well they can manage the resources applied to the project.
Resource decisions
Debt retirement projects concerned with legacy irreplaceable assets are different from most other projects the enterprise undertakes. Estimates of the labor hours required are more likely to be incorrect on the low side than are analogous estimates for other projects, because so much of the work involves pieces of assets with which few engineering staff have any experience. But with respect to resources, underestimating labor requirements isn’t the real problem. Non-labor resources are the real problem.
Because the assets are irreplaceable, it’s likely that they’re needed for ongoing operations. In some cases, the assets are needed continuously. Many organizations have kept such assets operational by exploiting hours of downtime during periods of low demand, usually scheduled and announced in advance. While these practices are likely sufficient for the relatively minor and infrequent changes usually associated with routine maintenance and enhancement, debt retirement imposes much more severe burdens on the organization than these short access windows can support. Effective debt retirement projects need far more access to the asset—a level of access that continuous delivery practices can provide [Humble 2010].
However, assets whose designs predate the widespread use of modern practices such as continuous delivery might not be compatible with the infrastructure that these practices require. And in organizations that haven’t yet adopted such practices, staff familiar with them might be in short supply. For these reasons, we must regard as developmental any early projects whose objectives are retiring technical debt from irreplaceable assets. They’re retiring the technical debt, of course, but they’re also developing the practices and infrastructure needed to support technical debt retirement projects. This dual purpose is what drives the surprisingly high non-labor costs and investments associated with early technical debt retirement projects.
The investments required might include such “items” as a staging environment, which “is a testing environment identical to the production environment” [Humble 2010]; extensive test automation, including results analysis; blue-green deployment infrastructure; automation-assisted rollback; and zero-downtime release infrastructure. Decisions to make investments require an appreciation of their value to the enterprise. They enable the enterprise to deal effectively with the wicked problem of technical debt retirement.
Last words
Because every situation and every organization is unique, few general guidelines are available for making these decisions. The criteria most organizations have been using for dealing with (or avoiding) the issue of technical debt have produced the problems they now face. So, to succeed from this point, whatever criteria they use in the future must be different. My own view is that short-term thinking is at the heart of the problem, but it’s a wicked problem. The long-term solution will not be simple.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.