Automation-assisted technical debt retirement

When we transform assets in order to retire some of the technical debt they carry, service disruptions are sometimes necessary. To minimize service disruptions while technical debt retirement efforts are underway, it’s advantageous to automate some procedures. Automation-assisted technical debt retirement provides two important benefits: limited disruption of operations and error avoidance.

I’m using the concept of automation a bit loosely here. I don’t mean to imply that these procedures are autonomous. What I mean is that engineers working on technical debt retirement projects have available an array of tools wide enough to enable them to perform many operations with a minimum of thought. For example, when a test is automated in this sense of automated, an engineer can issue a command such as, “Test Module Alpha Using Test Suite Delta,” which results in the execution of a set of predefined tests. Following execution, the appropriate engineers are notified of the results, and the results are recorded in the proper places. If the results are anomalous, engineers can then take appropriate action.

Benefits of automation-assisted technical debt retirement

Collapse of the I-35W bridge in Minneapolis, Minnesota
The I-35W Bridge collapse, day 4, Minneapolis, Minnesota, August 5, 2007. The proximate cause of the collapse was underweight gusset plate design, which made the bridge vulnerable to the increased static load due to concrete road surfacing additions over the years, and to the weight of construction equipment and supplies during a repair project that was then underway [NTSB 2008].
When we conduct maintenance or technical debt retirement projects involving assets that must remain operational during project execution, we risk stressing the asset in ways that extend beyond its safe operation envelope. The National Transportation Safety Board found that this occurred in the case of the I-35W bridge collapse. These effects are more difficult to imagine in software systems, but they can occur when load is shifted from the systems undergoing modification to other systems that can then become overloaded. Or these effects can occur when load is shifted not from one asset to another, but from one time window to another on the same asset, resulting in high loads in some time windows.
Photo by Kevin Rofidal, United States Coast Guard, courtesy Wikimedia Commons.
The more obvious benefit of automated procedures is speed. For example, an asset removed from service for testing can be returned to service more quickly if the testing is automated. And if trouble erupts during operations when a newly transformed asset is placed into service, the untransformed asset can be swapped back into its place quickly if insertion procedures and removal procedures (roll-out and roll-back) are automated. Tools for releasing newly transformed assets, and rolling back to the previous release if necessary, provide another example of automation assist. These tools are just a few of the many elements of a set of practices collectively known as continuous delivery [Humble 2010].

The second benefit of this kind of automation is error avoidance. For example, inconsistent or incomplete testing can fail to find errors and defects, and that leads to rework and further disruptions. Another way to generate trouble: performing tests incorrectly, and therefore finding “defects” that aren’t there. Automated procedures are much less prone to error, if they’re periodically maintained, tested, and certified. For example, if testing a module at a certain level requires running a suite of tests, engineers needn’t remember (or take time to look up) how to prepare the asset for tests, how to run the tests, or what the members of the test suite are. Long advocated as an essential element of sound engineering practice, test automation can avoid some of these problems. But it’s far short of a panacea [Bach 1999].

Other automation opportunities

Sometimes debt retirement itself can be automated. When we can retire instances of the technical debt in question by performing an automated transformation on an asset, the transformation is faster and more reliable.

The most important form of automation associated with technical debt retirement is automation-assisted regression testing. Investments in thorough and focused regression testing have potentially shockingly high returns in the debt retirement context, and in the contexts of development and routine maintenance.

To perform a regression test on an asset that has undergone some kind of change (or whose context has undergone change) is to operate, employ, measure, or inspect the asset under a specified set of conditions to determine whether those changes caused the asset to fail to meet some standard that it had previously met before the change. That is, a regression test is a procedure that determines whether the asset has regressed as a result of the change. Automated or automation-assisted regression tests enable the members of the debt retirement project team to detect problems in assets that they’ve transformed before the business units that depend on those assets encounter problems during their potentially expensive operations [Ge 2014].

Many of these same regression tests can also be useful during enhancement and ongoing maintenance of the asset. In many instances, investing in automated regression tests well in advance of the debt retirement project can enhance development and maintenance performance relative to those assets. Later, when the debt retirement project begins, the previously obtained results of regression tests will already be available.

For some debt retirement projects, specially created automated regression tests might be beneficial. Assigning engineers to automation tool development for debt retirement projects is probably the best way to support these needs.

Last words

These automation capabilities are unlikely to be available commercially, because they’re so specialized to the asset being tested. And because general applicability is unnecessary, building them in-house is both practical and economical, if the necessary skills are available. These investments can be justified economically if we take into account the savings resulting from reduced service disruptions for the debt-bearing assets.

References

[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).

Available: here; Retrieved: January 2, 2019

Cited in:

[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.

Available: here; Retrieved: January 1, 2019

Cited in:

[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.

Cited in:

[NTSB 2008] National Transportation Safety Board. “Board Meeting Executive Summary: Collapse of I-35W Highway Bridge, Minneapolis, Minnesota, August 1, 2007,”, November 13, 2008.

Available: here; Retrieved: January 3, 2019.

Cited in:

Retiring localizable technical debt

When technical debt appears as discrete chunks—that is, when it’s localizable—we can often retire the debt incrementally, system-by-system, module-by-module, or even instance-by-instance. These approaches offer great flexibility, both technically and financially, which makes retiring localizable technical debt a particularly manageable challenge.

Localizable technical debt

Electricity pylons, Hamilton Beach, Ontario, Canada
Electricity pylons, Hamilton Beach, Ontario, Canada: a small part of the AC power grid, which seems destined one day to manifest a great deal of non-localizable technical debt. Photo by Ibagli courtesy Wikimedia Commons
Pylons in the same line are visible in Google Street View.

In “Technical debt in a rail system,” I explored the case of Amtrak’s Acela Express. In that example, I explained that Acela’s passenger cars are designed to tilt to compensate for centrifugal forces that appear when the train rounds curves. The technical debt is in the form of tracks that were too close together to permit the trains to tilt as much as they’re designed to, which limits the trains’ speed rounding curves. The instances of this debt are the curves in which the tracks are too close together. These instances are thus inherently localizable.

In “How technical debt can create more technical debt,” I described an example in which an organization is unable to upgrade its desktop computers from Windows 8 to Windows 10. In this case, each computer running Windows 8 is an instance of this form of technical debt.

Both of these examples illustrate the sorts of technical debt in which the instances are localizable—each instance is self-contained, and we can “point” to it as an instance of the debt in question. But localizable technical debt need not be associated with hardware. In software systems, for example, localizable technical debt can exist in a module interface, which might have been designed to meet a requirement that’s no longer relevant. That module and any other modules that interact with that interface manifest that technical debt.

Non-localizable technical debt

Non-localizable technical debt is debt for which the instances are amorphous or system-wide, or span the bulk of the system, if not all of it. Retiring non-localizable technical debt typically requires extensive re-engineering of the assets that manifest it.

For the most part, non-localizable technical debt arises at the level of system architecture or above. One can easily imagine this occurring in software systems, where physical constraints have little meaning, but let’s consider a hardware system to illustrate the importance of this concept.

Until relatively recently in the United States, most electric power consumers used power for incandescent lamps, heating, or for electric motors in elevators, refrigeration, home appliances, pumps, and so on. These applications are compatible with an alternating current power distribution system (AC grid). The AC grid is more efficient than an equivalent direct-current architecture (DC grid) when power generation plants are few and relatively distant from power load sites, because transmission losses are lower for AC than for DC.

However, advances in electronics and in distributed power generation technologies are eroding the advantages of the AC grid [Dragičević 2016]. Most electronic devices, including phones, computers, rechargeable power tools, LED lighting, and electric vehicles use DC internally. To access the AC grid, they use converters that change AC power into DC power, which entails efficiency losses due to the conversion. Moreover, solar power generation systems such as photovoltaics generate DC inherently. Modern wind turbines generate AC at a frequency determined by wind power conversion efficiency, but they then convert it to DC before a second conversion to AC at the frequency the grid requires. And because solar and wind power generators are geographically dispersed, they’re often situated near their load sites, as for example, a photovoltaic array on the roof of a home would be. Therefore, the losses involved in transmission from generation site to load site are much less important than they would be if the generation sites were few, concentrated, and at great distances from the loads they serve.

Our current AC grid architecture is likely to become a net disadvantage in the not-too-distant future. If that does happen, we could come to regard the current AC grid, and the devices that are designed for it, as manifesting technical debt. However, localizing that debt in each device and each component of the AC grid would make little sense. The technical debt in question would reside in the grid architecture, as a whole. It would be inherently non-localizable.

Addressing localizable technical debt

As noted above, we can often retire localizable technical debt incrementally—instance-by-instance. In many cases, this enables engineers to address the debt at times and in sequences that are compatible with organizational priorities and within the organization’s resource capacity in any given fiscal period. Although this isn’t always possible for localizable technical debt, and although engineers are often justifiably averse to the temporary non-uniformity that results from incremental debt retirement, exploiting localizability when planning debt retirement is often a useful strategy for retiring technical debt economically.

Incremental retirement of localizable technical debt does present some problems. During the retirement process, for any given instance, it might be necessary to install temporary structures to enable continued operation with minimal service disruption. For example, with the Acela tracks, an alternate line might be needed while the new track is installed, or the new track might need to be installed at some distance from the existing track while trains continue using the existing track. Either approach requires investment beyond the investment required for the new track itself. Some managers have little appetite for such temporary investments. But they are in a real sense part of the MICs on that debt. They’re unusual, as MICs go, in the sense that they’re incurred as part of the debt retirement effort, but they’re still MICs. In a way, they’re analogous to the charges that might appear when terminating an auto lease.

Another consideration when addressing localizable technical debt is its entanglement with other forms of technical debt. With respect to the effort to retire one kind of localizable technical debt, these other forms of technical debt are what I’ve called auxiliary technical debt (ATD). Consider carefully the time order of efforts to retire the localizable technical debt and one or more forms of its ATD. Because retiring localizable technical debt can seem deceptively straightforward, the temptation to deal with it before addressing some of the ATD can be difficult to resist. But dealing with some of the ATD first might actually be the wiser course when, for example, doing so eliminates numerous instances of the localizable technical debt.

One note of caution

Within the category of localizable technical debt are some kinds of debt that are so widespread in the asset that retiring them affects a large part of the asset, if not all of the asset. While it’s true that each instance of such debt is identifiable and localized, the instances are so widespread that they collectively have the properties of non-localizable debt as far as retirement efforts are concerned. Incremental retirement might still be possible, but a more global retirement effort might be safer and less disruptive. One approach, usually favored by the technologists, is to suspend all other work while the debt in question undergoes retirement. While that approach might indeed be safest, all stakeholders must accept and understand the technical issues, and the technologists must understand the concerns of all stakeholders. A joint decision about the retirement strategy among all stakeholders, including technologists, is recommended.

Last words

In the context of debt retirement projects, localizable technical debt provides needed flexibility. Often, the non-uniformity that results from retiring localizable technical debt instance-by-instance can be reduced before the debt retirement project is completed. In the meantime, the team can be relatively free to retire the localizable debt in whichever order is most fitting.

References

[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).

Available: here; Retrieved: January 2, 2019

Cited in:

[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.

Cited in:

[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.

Available: here; Retrieved: January 1, 2019

Cited in:

[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.

Cited in:

[NTSB 2008] National Transportation Safety Board. “Board Meeting Executive Summary: Collapse of I-35W Highway Bridge, Minneapolis, Minnesota, August 1, 2007,”, November 13, 2008.

Available: here; Retrieved: January 3, 2019.

Cited in:

Other posts in this thread

Legacy technical debt retirement decisions

Last updated on January 5th, 2019 at 10:32 am

Decisions to retire the legacy technical debt carried by irreplaceable assets are not to be taken lightly. As decision makers gather information and recommendations from all around the organization, most will discover that information and recommendations aren’t sufficient for making sound decisions about technical debt retirement. The issues are complex. Education is also needed. It’s entirely possible that in some organizations, confronted with a set of decisions regarding legacy technical debt retirement in irreplaceable assets, the existing executive team might be out of its depth. To understand how this situation can arise, let’s explore the nature of legacy technical debt retirement decisions.

A common technical debt retirement scenario

What compels the leaders of a large enterprise to consider retiring the technical debt encumbering one of its irreplaceable assets is fairly simple: cost. Decision makers usually begin by investigating the cost of replacing the asset—the option I’ve oh-so-cleverly called “Replace the Asset.” They then typically conclude that replacement isn’t affordable. At this point, many decision makers choose the option I’ve called “Do nothing.” Time passes. A succession of incidents occurs, in which repairs to the asset or enhancements of the asset are required. And I use the term required here to mean “essential to the viability of the business.”

Two alternatives to retiring legacy technical debt in irreplaceable assets
Two alternatives to retiring legacy technical debt in irreplaceable assets. Neither one works very well.

Engineers then do their best to meet the need, but the cost is high, and the work takes too long. The engineers explain that the problems are due, in part, to the heavy burden technical debt in this particular asset. Eventually the engineers are asked to estimate the cost of “cleaning things up.” Decision makers receive the estimates and conclude that it’s “unaffordable right now.” They ask the engineers to “make do.” In other words, they stick with the Do Nothing option.

After a number of cycles repeating this pattern, decision makers finally agree to provide time and resources for technical debt retirement, but only because it’s the least bad alternative. The other alternatives—Replace the Asset, and Do Nothing—clearly won’t work and haven’t worked, respectively.

So there we are. The organization has been forced by events to address the technical debt problems in this irreplaceable asset. And that’s where the trouble begins.

Decisions about retiring legacy technical debt

In scenarios like the one above, the fundamental decision has already been made: the enterprise will be retiring legacy technical debt from an irreplaceable asset. But that’s just the first ripple of waves of decisions to come, made by many people in a variety of roles throughout the enterprise. Let’s now have a look at a short catalog of what’s in store for such an enterprise.

Recall that most large technical debt retirement projects probably exhibit a high degree of wickedness in the sense of Rittel and Webber [Rittel 1973]. One consequence of this property is the need to avoid do-overs. That is, once we make a decision about how to proceed to the next bit of the work, we want that decision to be correct, or at least, good enough. It should not leave the enterprise in a state that’s more difficult to resolve than the state in which we found it. Since another property of wicked problems is the prevalence of surprises, most decisions must be made in a collaborative context, which affords the greatest possibility of opening the decision process to diverse perspectives. We must therefore regard collaborative decision-making at every level as a highly valued competency.

What follows is the promised catalog of decision types.

Strategic decisions

This decision category leads the list because it provides the highest leverage potential for changing enterprise behavior vis-à-vis technical debt. Organizations that are confronting the problem of technical debt retirement from irreplaceable assets would do well to begin by acknowledging that although they might be able to devise tactics for dealing with the debt burdening these assets right now, they must make a strategic change if they want to avoid a recurrence. Accumulating debt to a level sufficient to compel chartering a major debt retirement project took time. It took years of deferring the inevitable. A significant change of enterprise strategy is necessary.

When changing complex social systems, applying the concept of leverage provides a critical advantage. In this instance, following the work of Meadows [Meadows 1997] [Meadows 1999] [Meadows 2008], we can devise interventions at several points that can have great impact on both the level of technical debt and its rate of accumulation. The leverage points of greatest interest are Feedback Loops, Information Flows, Rules, and Goals. For example, the enterprise can set a strategic goal of a specific volume of incremental technical debt incurred per project, normalized by project budget, as I discussed in the post, “Leverage points for technical debt management.”

One might reasonably ask why enterprise strategy must change; wouldn’t a change in technology strategy suffice? Changing how engineers go about their work would help—indeed in most cases it’s necessary. But because the conditions and processes that lead to technical debt formation and persistence transcend engineering activities, additional changes are required to achieve the objective of controlling technical debt.

Some technical debt is strategic—it’s incurred as the result of a conscious business decision. But some is non-strategic. We might even be unaware of how it occurred. However, both kinds of technical debt can arise as a result of non-technical factors. Read a review of non-technical precursors of non-strategic technical debt.

Organizational decisions

Before chartering a technical debt retirement project (DRP) for an irreplaceable asset, or a group of irreplaceable assets, it’s wise to consider how to embed that project in the enterprise.

The default organizational form for debt retirement projects concerned with an asset A is usually the same form that would be used for major projects focused on asset A. If the Information Technology (IT) unit would normally address issues in A, the debt retirement effort usually would be organized under IT. If A is a software product normally attended to in a product group, that same group would likely have responsibility for the DRP for asset A.

Although these default organizational structures are somewhat sensible, both technically and politically, there’s an alternative approach worth investigating. It entails establishing a technical debt retirement function that becomes a center of excellence for executing technical debt retirement projects, and for developing and injecting sound technical debt management practice into the enterprise. Such an approach is especially useful if multiple debt retirement projects are needed.

The fundamental concept that makes the center-of-excellence approach necessary is the wickedness of the technical debt retirement problem. To address the problem at scale requires capabilities beyond what IT can provide; beyond what product units can provide; indeed, beyond what any of the conventional organizational elements can provide. The reason for this is that the explosion of technical debt in most organizations is an emergent phenomenon. Every organizational unit contributed to the formation of the problem. And every organizational unit must contribute to its resolution.

A technical debt center of excellence is an approach that might be capable not only of synthesizing the expertise of all elements of the enterprise, but also might be capable of bringing new approaches into the enterprise from external sources.

Engineering decisions

Engineers have a tendency to identify and classify technical debt items on technical grounds. Further, they tend to set technical debt retirement priorities on a similar basis. That is, they’re inclined to set priorities highest for those debt items that they (a) recognize as debt items and (b) see as imposing high levels of MICs charged to engineering accounts. Engineers are less likely to assign high priorities to technical debt that generates MICs that are charged to revenue, or to other accounts, because those MICs are less evident—and in many cases invisible—to engineers.

Decisions regarding recognition of technical debt items and setting priorities for retiring them must take technological imperatives into account, but they must also account for MICs of all forms. Priorities must be consistent with enterprise imperatives.

Decisions about pace

Paraphrasing Albert Einstein, technical debt retirement projects should be executed as rapidly as possible, and no faster. The tendency among non-engineers and non-technical decision-makers is to push for rapid completion of debt retirement projects, for three reasons. First, everyone, like the engineers, wants the results that debt retirement will bring. Second, everyone, like the engineers, wants an end to the inevitable disruptions debt retirement projects cause. And finally, the longer the project is underway, the more it might cost.

For these reasons, once the decision to retire the debt is firmly in hand, the enterprise might have a tendency to apply financial resources at a rate that exceeds the ability of the project team to execute the project responsibly. When that happens, rework results. And for wicked problems like debt retirement, rework is the path to catastrophe.

Decisions about pace and team scale need to be regarded as tentative. Regular reviews can ensure that the resource level is neither too low nor too high. Even when the engineers are given control over these decisions, they must be reviewed, because pressures for rapid completion can be so severe that they can compromise the judgment of engineers about how well they can manage the resources applied to the project.

Resource decisions

Debt retirement projects concerned with legacy irreplaceable assets are different from most other projects the enterprise undertakes. Estimates of the labor hours required are more likely to be incorrect on the low side than are analogous estimates for other projects, because so much of the work involves pieces of assets with which few engineering staff have any experience. But with respect to resources, underestimating labor requirements isn’t the real problem. Non-labor resources are the real problem.

Because the assets are irreplaceable, it’s likely that they’re needed for ongoing operations. In some cases, the assets are needed continuously. Many organizations have kept such assets operational by exploiting hours of downtime during periods of low demand, usually scheduled and announced in advance. While these practices are likely sufficient for the relatively minor and infrequent changes usually associated with routine maintenance and enhancement, debt retirement imposes much more severe burdens on the organization than these short access windows can support. Effective debt retirement projects need far more access to the asset—a level of access that continuous delivery practices can provide [Humble 2010].

However, assets whose designs predate the widespread use of modern practices such as continuous delivery might not be compatible with the infrastructure that these practices require. And in organizations that haven’t yet adopted such practices, staff familiar with them might be in short supply. For these reasons, we must regard as developmental any early projects whose objectives are retiring technical debt from irreplaceable assets. They’re retiring the technical debt, of course, but they’re also developing the practices and infrastructure needed to support technical debt retirement projects. This dual purpose is what drives the surprisingly high non-labor costs and investments associated with early technical debt retirement projects.

The investments required might include such “items” as a staging environment, which “is a testing environment identical to the production environment” [Humble 2010]; extensive test automation, including results analysis; blue-green deployment infrastructure; automation-assisted rollback; and zero-downtime release infrastructure. Decisions to make investments require an appreciation of their value to the enterprise. They enable the enterprise to deal effectively with the wicked problem of technical debt retirement.

Last words

Because every situation and every organization is unique, few general guidelines are available for making these decisions. The criteria most organizations have been using for dealing with (or avoiding) the issue of technical debt have produced the problems they now face. So, to succeed from this point, whatever criteria they use in the future must be different. My own view is that short-term thinking is at the heart of the problem, but it’s a wicked problem. The long-term solution will not be simple.

References

[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).

Available: here; Retrieved: January 2, 2019

Cited in:

[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.

Cited in:

[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.

Available: here; Retrieved: January 1, 2019

Cited in:

[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.

Cited in:

[Meadows 1997] Donella H. Meadows. “Places to Intervene in a System,” Whole Earth, Winter 1997.

Available: here; Retrieved: June 28, 2018

Cited in:

[Meadows 1999] Donella H. Meadows. “Leverage Points: Places to Intervene in a System,” Hartland VT: The Sustainability Institute, 1999.

Available: here; Retrieved: June 2, 2018.

Cited in:

[Meadows 2008] Donella H. Meadows and Diana Wright. Thinking in Systems: A Primer. White River Junction, VT: Chelsea Green Publishing, 2008.

Order from Amazon

Cited in:

[NTSB 2008] National Transportation Safety Board. “Board Meeting Executive Summary: Collapse of I-35W Highway Bridge, Minneapolis, Minnesota, August 1, 2007,”, November 13, 2008.

Available: here; Retrieved: January 3, 2019.

Cited in:

[Rittel 1973] Horst W. J. Rittel and Melvin M. Webber. “Dilemmas in a General Theory of Planning”, Policy Sciences 4, 1973, 155-169.

Available: here; Retrieved: October 16, 2018

Cited in:

Other posts in this thread

Rules of engagement for auxiliary technical debt

As noted in an earlier post, a technical debt retirement project (DRP) is a project whose primary objective is retirement of a particular kind of technical debt—or particular kinds of technical debt—from a specified set of assets. But those assets might also carry other kinds of technical debt. With respect to a given DRP, we can call these other kinds of technical debt Auxiliary Technical Debt(ATD). Because the presence of ATD can defocus debt retirement projects, it presents a risk that must be anticipated and well understood, if it is to be mitigated.

This post explores concepts and approaches for mitigating the risks associated with the auxiliary technical debt (ATD) of a given technical debt retirement project (DRP). As might already be evident, these initialisms (ATD, DRP, and one more to come) can be difficult to keep straight. Here’s a quick guide: T always means Technical, D always means Debt, R always means Retirement, and P always means Project. Also, if you have a pointing device, and you hover the cursor over the first mention of each initialism in each paragraph, your browser displays the expansion of the term. Touch screen users and keyboarders: sorry, I haven’t yet figured out how to help you in an analogous way, so let me know if you have an idea.

The temptation to retire auxiliary technical debt

Guardrails in a track bed as a rail line crosses a bridge
Guardrails in a track bed as a rail line crosses a bridge. The guardrails are the inner pair of rails. The rails outside the inner pair are the running rails. Guardrails (also known as check rails) function to keep the wheels of derailed cars from straying too far from their proper locations. This is a useful risk mitigation function in high-risk geometries such as curves. It’s also advantageous even if the probability of risk events is low, as in this straight section of track. It’s a worthwhile measure when the consequences of risk events are extremely costly, as in this case. A derailment on a railway bridge or in steep terrain can result in rail vehicles falling to the earth below, which can cause them to pull other vehicles with them. Derailments under highway overpasses can also be problematic. Such derailments can result in damage to rail or highway bridge structures, resulting in loss of service for periods extending far beyond the time needed to clear the derailment. For this same reason, guardrails are also used in tunnels and tunnel approaches. Because uncontrolled scope expansion can have such devastating effects, we need policy guardrails to control scope expansion when retiring technical debt from assets that contain auxiliary technical debt.

I’ve been using the term TDIQ—Technical Debt In Question—to denote the kinds of technical debt whose retirement is the objective of a given DRP. The ATD of that DRP, then, is the collection of instances of any other kinds of technical debt, of types differing from the TDIQ of the DRP, and which are present in the assets being modified by the DRP. Notice that the property of being auxiliary technical debt is relative. It’s relative to the objectives of a given DRP. A particular instance of technical debt might be ATD for one DRP, and TDIQ for another DRP, depending on the respective objectives of each DRP. Notice also that the ATD of a given DRP can include several different kinds of technical debt.

Let’s now examine a scenario in which ATD can generate risk for a DRP. In this scenario, we’ll consider only one kind of ATD; call it ATD0.

Suppose that several members of the DRP team undertake work to retire the DRP’s TDIQ in a portion of one of the debt-bearing assets. In performing this work, they encounter some instances of ATD0. Studying these instances of ATD0 carefully, they conclude that “fixing” the ATD0 along with the TDIQ in that portion of the asset would be easier and less risky than leaving the ATD0 in place and attending only to the TDIQ. Let’s call their approach the ATD approach. And let’s say that the TDIQ approach is one in which the team addresses only the TDIQ, and leaves in place the ATD0 and all other ATD it finds.

Compared to the TDIQ approach, the advantages of the ATD approach are fairly clear. After the work is complete, in either approach, the asset must be tested and re-certified. In the TDIQ approach, when a subsequent DRP is chartered to retire ATD0, that second DRP team will need to test and re-certify the asset again when it completes its work. In the ATD approach, we can avoid modifying, re-testing, and re-certifying the asset a second time, if we’ve already retired all instances of ATD0 from the asset. Thus, in the ATD approach we can avoid a second round of modification, testing, and re-certification.

Risks associated with retiring auxiliary technical debt

But the ATD approach also has some serious disadvantages.

Enterprise assets might be left in a mixed state

Unless the team plans to retire all instances of ATD0, then upon completion of the DRP, enterprise assets will be in a mixed state. Some will be free of both the TDIQ and ATD0; some will be free of the TDIQ but continue to harbor ATD0. This non-uniformity can create complications for subsequent maintenance, documentation, testing, training, enhancement, automation assist development, and so on.

Complications in testing and re-certification

If test results for the modified assets indicate the possibility of new defects, the cause might be associated with the TDIQ work, or the ATD work, or both. Resolving any issues in the test results is thus more complicated under the ATD approach than it is under the TDIQ approach. Similar considerations affect re-certification. Thus, there is a risk that the ATD approach will complicate interpretation of test and re-certification results.

Questions about the reliability of technical debt inventory data

As noted in an earlier post, for any given DRP, the DRP team needs to know which assets bear that project’s TDIQ. In the TDIQ approach, any data previously or concurrently gathered about the location of instances of ATD0 remains valid, because the TDIQ approach doesn’t retire any instances of ATD0. However, in the ATD approach, such inventory data must be corrected to account for the retirement of whatever instances of ATD0 are retired in the ATD approach. Thus, if ATD0 inventory data has already been collected, or if it’s being collected in parallel with the DRP, the DRP team must take steps to adjust the inventory data regarding locations of ATD0 as it retires instances thereof. There is of course a risk that this will not occur as needed, which can create problems for any subsequent DRP for which the ATD0 is contained in its TDIQ. This can be especially challenging if there are multiple DRPs in process simultaneously, each working on different TDIQs, potentially in different debt-bearing assets, but all encountering and retiring instances of ATD0.

Unconstrained scope creep

Suppose there is a DRP whose objective is retiring its TDIQ, and that it has decided to also retire some (or all) instances of a particular kind of ATD, say ATD0. Although that activity would represent an expansion of scope beyond retiring the TDIQ, it might be acceptable and it might even be prudent. But as the team undertakes to retire ATD0, it might confront a similar quandary relative to the relationship between the ATD0 and yet another kind of ATD, which we might call ATD1. The DRP team might then decide to expand scope again. And so on. In general, there is no self-evident stopping point for such a chain of scope expansion. In these circumstances, scope creep can become an unmitigated risk, threatening the coherence and focus of the DRP, with consequences for its budget and schedule.

Last words

In some cases, some of the ATD might be so intertwined with the TDIQ that retiring some instances of the TDIQ necessarily retires some of the ATD. And in other cases, leaving the ATD in place severely complicates retiring the TDIQ. In still other cases, leaving the ATD in place leaves the assets in a complex state that makes ongoing maintenance or enhancement work more difficult. In these cases, what I called the ATD approach above is plainly the wiser course, compared to the TDIQ approach.

Policymakers have a role to play here. They can develop guidance for DRP teams to apply as they come upon these difficult situations to help them decide whether to take the ATD approach or the TDIQ approach. The military calls this guidance “rules of engagement,” while politicians call it “guardrails.”

Deciding between the ATD and TDIQ approaches on a whim, or on what feels right at the moment, inevitably leads to a chaos of inconsistency and scope creep. The safest course is to adopt wise policy—rules of engagement—and to adjust them as the organization learns more and more about retiring technical debt from its assets.

References

[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).

Available: here; Retrieved: January 2, 2019

Cited in:

[Dragičević 2016] Tomislav Dragičević, Xiaonan Lu, Juan C. Vasquez, and Josep M. Guerrero. “DC Microgrids–Part II: A Review of Power Architectures, Applications and Standardization Issues,” IEEE Transactions on Power Electronics, vol 31:5, 3528-3549, 2016.

Cited in:

[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.

Available: here; Retrieved: January 1, 2019

Cited in:

[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.

Cited in:

[Meadows 1997] Donella H. Meadows. “Places to Intervene in a System,” Whole Earth, Winter 1997.

Available: here; Retrieved: June 28, 2018

Cited in:

[Meadows 1999] Donella H. Meadows. “Leverage Points: Places to Intervene in a System,” Hartland VT: The Sustainability Institute, 1999.

Available: here; Retrieved: June 2, 2018.

Cited in:

[Meadows 2008] Donella H. Meadows and Diana Wright. Thinking in Systems: A Primer. White River Junction, VT: Chelsea Green Publishing, 2008.

Order from Amazon

Cited in:

[NTSB 2008] National Transportation Safety Board. “Board Meeting Executive Summary: Collapse of I-35W Highway Bridge, Minneapolis, Minnesota, August 1, 2007,”, November 13, 2008.

Available: here; Retrieved: January 3, 2019.

Cited in:

[Rittel 1973] Horst W. J. Rittel and Melvin M. Webber. “Dilemmas in a General Theory of Planning”, Policy Sciences 4, 1973, 155-169.

Available: here; Retrieved: October 16, 2018

Cited in:

Other posts in this thread