Designing a project to retire some portion of the technical debt from a critical, irreplaceable asset, can be a daunting task. It’s best to acknowledge that the project design problem is very likely a wicked problem in the sense of Rittel and Webber [Rittel 1973]. See my post “Retiring technical debt can be a wicked problem” for more. In this thread, of which this is the first post, I suggest some basic preparations for dealing with irreplaceable assets. They form a necessary foundation for success in approaching the debt retirement problem for irreplaceable assets.
As I’ve noted in previous posts, the problems associated with retiring technical debt can be wicked problems. And if some of these problems aren’t strictly wicked problems, they can possess many of the attributes of wicked problems in degrees sufficient to challenge the best of us. That’s why approaching a technical debt retirement project as you would any other project is risky.
For convenience and to avoid confusion, in my last post I adopted the following terminology:
DRP is the Debt Retirement Project
DDRP is the effort to design the DRP
DBA is the set of Debt Bearing Assets undergoing modification in the context of the DRP
IA is the set of assets, excluding the DBA assets, that interact directly or indirectly with assets in the DBA
In the posts in this thread, convenience demands that we add at least one more shorthand term:
TDIQ is the Technical Debt In Question. That is, it’s the kind of technical debt we’re trying to retire from the DBA assets. Other instances of the TDIQ might also be found elsewhere, in other assets, but retiring those instances of the TDIQ is beyond the scope of the DRP.
Know when and why we must retire technical debt
For those technical debt retirement projects (DRPs) that exhibit a high degree of wickedness, a critical success factor is clear communication of the mission of the DRP. Clear communication is important because the DRP team must deal with many stakeholders who are in the early stages of familiarity with the concept of technical debt. Some of them might be cooperating reluctantly. Expressing the objectives and benefits of the DRP in a clear and inspiring way is very helpful. With that in mind, I offer the following reminder of the reasons for tackling such a large and risky project that produces so few results immediately visible to customers.
Examining alternatives to retiring the TDIQ is a good place to begin. One alternative is simply letting the TDIQ remain in place. Call this alternative “Do Nothing.” A second alternative to retiring the TDIQ is replacing the debt-bearing asset with something fresh and clean and debt-free. Call this alternative “Replace the Asset.” The problem many organizations face is that they cannot always rely on these alternatives. And because these two alternatives to debt retirement aren’t always practical, some organizations must develop the expertise and assets necessary to retire widespread technical debt in large, critical, irreplaceable systems. Below is a high-level discussion of these two alternatives to debt retirement.
The first alternative is to find ways to accept that the DBA will continue to operate in their current condition, carrying the technical debt that they now bear. This alternative might be acceptable for some assets, including those that are relatively static and which need no further enhancement or extension. This category also includes those assets the organization can afford to live without.
One disadvantage of the “Do Nothing” approach is that technology moves rapidly. What seems acceptable today might not be acceptable in the very near future. It might become old-fashioned, behind the times, or non-compliant with future laws or regulations. Styles, fashions, technologies, laws, regulations, markets, and customer expectations all change rapidly. And even if the asset doesn’t change what it does, the organization might need to enhance the asset. The enhancements might become very expensive to accomplish due to the technical debt the asset carries.
An especially troubling scenario takes shape when the DBA contains portions that are severely out of date. When that happens the organization might no longer be able to find qualified candidates who can perform needed work on the DBA. This situation can also arise when portions of the DBA were developed in-house. In that case, there might not be any qualified candidates outside the organization. When everyone who understands the DBA has departed the organization, work can proceed only if the DBA is properly documented and a training and mentoring program is healthy and current.
For these reasons, Do Nothing can be a high-risk strategy.
Replace the Asset
The second alternative to retiring the TDIQ is to replace the entire asset. For this option, the question of affordability arises. In some instances this alternative is practical, but for many assets, the organization simply cannot afford to purchase or design and construct replacements.
Pay special attention to those assets that “learn.” They might contain data gathered from experience over a long period of time. Retiring the asset can require developing some means of recovering the experience data and migrating it to the replacement asset. That task is a potentially daunting effort in itself.
Replacement is especially problematic when the asset is proprietary. If the organization created the asset itself, they might have constructed it over an extended period of time. Replacement with commercial products could require extensive adaptation of those products, or adaptation of organizational processes. Worse yet, replacement with assets of its own making will likely be costly.
When organizations depend on assets that they must enhance or extend, and which they cannot afford to replace, they face a daunting problem. They must develop the expertise and resources needed to address the technical debt that such assets inevitably accumulate.
This series of posts explores the issues that arise when an organization undertakes to retire the technical debt that its irreplaceable assets are carrying. Below, I’ll be inserting links to the subsequent posts in this series.
Several properties of the problem of designing technical debt retirement projects tend to make those design problems more likely to be wicked problems. These properties make these projects more likely to satisfy all ten of the criteria of Rittel and Webber [Rittel 1973]. I call these properties indicators of wickedness.
We usually have some notion of the degree of wickedness of a given design effort for a technical debt retirement project. But actually executing the debt retirement project can reveal unanticipated issues and complexity. Some of what’s revealed can cause us to adjust our estimate of the degree of wickedness of the design effort. If we know in advance what kinds of revelations are most likely to cause such adjustments, we can reduce the incidence of unanticipated revelations.
Criteria for wickedness of wicked problems
In my post, “Degrees of wickedness,” I noted that we can regard all problems as lying on a Tame/Wicked spectrum, with wicked problems lying at the extreme Wicked end of the spectrum, and the tamest of the tame lying at the opposite end. As for the ten criteria of wickedness developed by Rittel and Webber, I proposed that they could be satisfied in degrees, with the most wicked problems satisfying all ten criteria absolutely.
As a quick review, here are the attributes of wicked problems as Rittel and Webber see them [Rittel 1973], rephrased for brevity:
There is no clear problem statement
There’s no way to tell when you’ve “solved” it
Solutions aren’t right/wrong, but good/bad
There’s no ultimate test of a solution
You can’t learn by trial-and-error
There’s no way to describe the set of possible solutions
Every problem is unique
Every problem can be seen as a symptom of another problem
How you explain the problem determines what solutions you investigate
The planner (or designer) is accountable for the consequences of trying a solution
Conditions or situations that tend to increase wickedness
Below is a sample of conditions or situations that tend to increase the wickedness of the problem of designing a technical debt retirement project. I have no data to support these conjectured effects. But the principles I used to generate them are three. If a condition tends to…
…expand the set of stakeholders in a debt retirement project, it tends to enhance the wickedness of the design problem.
…increase the number or heterogeneity of the assets or processes that we must consider, it tends to enhance the wickedness of the design problem.
…create a need for a rollback of work performed as part of the debt retirement project, and that rollback creates a need to redesign the debt retirement project, it tends to enhance the wickedness of the design problem.
In what follows, I use the term “DRP” to indicate the Debt Retirement Project itself. The effort to design the DRP is the “DDRP.” The problem whose wickedness we’re considering isn’t the DRP itself. Rather, it is the DDRP. Also, let DBA (for debt-bearing assets) be the set of assets undergoing modification in the context of the DRP. And let IA (for interacting assets) be the set of assets, excluding the DBA assets, that interact directly or indirectly with the DBA assets.
With all this in mind, I offer the following nine examples of indicators of wickedness of the DDRP.
1. A previous attempt to retire this debt was abandoned
Two indicators of the wickedness of the DDRP are perhaps most significant. The first is the failure of a previous attempt to execute a DRP with similar objectives. And the second is the failure of a previous attempt to execute a DDRP for a DRP with those objectives. There are two reasons why such failures are significant indicators of wickedness.
First, it’s reasonable to assume that these previous attempts weren’t founded on any recognition of the wickedness of the DRP or the DDRP. Few such efforts are. (A Google search for the two phrases “technical debt” and “wicked problem” yields less than 1000 results) (update 12 Nov 2018: 1160 results; 24 May 2021: 246,000 results) Consider first the DDRP. If it is a wicked problem, proceeding as if it were not would very likely fail. If the designers of the previous DDRP did assume that it was a wicked problem, investigating their approach could prove invaluable, and save much time and effort. An analogous argument applies for the DRP itself.
Second, if the previous attempt to execute a DRP with similar objectives has left traces of itself in the DBA, and if those traces must be taken into account while executing the DDRP, they might complicate the DRP, and they might be incompletely addressed in the DDRP. To the extent that these conditions prevail, Criterion 5 is satisfied, and the DDRP exhibits wickedness.
2. The Debt Retirement Project (DRP) will interrupt some revenue streams
If the work of the DRP entails temporary interruption of revenue streams, executing the DRP can have significant and long lasting effects on the organization. In estimating the cost of the DRP, it’s clearly necessary to account for the financial impact of any revenue shifted into the future, and any revenue irretrievably lost as well. And in some cases, market share might also suffer. All of these factors tend to increase the wickedness of the DDRP.
When these effects are expected, political opposition to the DRP can develop. Senior management can prevent this opposition from halting the DDRP inappropriately by requiring that the business case for the DRP include these financial factors and demonstrate clearly the need to proceed despite them. For example, including these factors might entail adjusting revenue targets downward to account for the interruptions due to the DRP. Involving potential political opponents of the DRP in business case development can be an effective means of ensuring the strength of the business case.
The ability to model all these financial effects is an important organizational asset that can be developed and maintained, for deployment across multiple DDRPs. The organization can monitor DRPs, gathering actual experience data for comparison to the effects projected in the respective business cases of the DRPs. Those comparisons are useful for enhancing the modeling capability.
3. We need to re-certify some assets the DRP doesn’t directly touch
A DDRP is more likely to be a wicked problem if, as a result of the changes executed in the DRP, any of the assets in IA need to be re-tested after or during DRP execution. The need to re-test any assets in IA typically arises when one of two conditions occurs. One condition occurs when there’s some risk that the DRP’s changes in the DBA could somehow affect the performance of the assets in IA. The second is when the consequences of such a risk event are severe.
Five ways this need to re-certify increases wickedness
This scenario enhances the wickedness of the DDRP for at least five possible reasons.
Baseline testing of IA is necessary to enable the DRP team to recognize the effects of the DRP on IA behavior. But this baseline testing can reveal pre-existing and unaddressed faults. Leaving those faults in place can seriously complicate interpreting anomalies that appear in IA assets after DRP work has begun. That’s why the DDRP team might insist that the owners of the IA assets in question address some of these faults. With regard to these issues, political differences between the DDRP team and the owners of IA assets are possible. The additional testing of IA assets can…
…expand dramatically the set of stakeholders affected by the DRP, to include the owners, users, and maintainers of the IA assets.
…increase the need to interrupt revenue streams temporarily, and increase the number, duration, and frequency of such interruptions.
…require expertise and staffing beyond the DRP project team, which can disrupt other elements of the organization as the people needed are temporarily assigned to IA testing.
…reveal unanticipated consequences of the DRP alterations, which can trigger re-planning or redesign of the DRP during its execution. That re-planning or redesign, in turn, can trigger alterations in the DDRP.
The need to re-test assets not directly touched in the DRP is more likely when the DRP alters the external behavior of any of the DBA assets. The goal of many DRPs is improvement of the internals of assets without altering their external behavior, except possibly for performance improvements. This goal is desirable. It limits the need for re-testing and re-certification of IA assets.
Two classes of debt affect re-testing and re-certifying
The need to re-test and re-certify IA assets distinguishes two classes of debt in the DBA assets. Externally detectable debt in the DBA assets is debt that can be detected in the externals of the DBA assets. It includes their architecture, behavior, appearance, or interfaces. Externally undetectable debt in the DBA assets is any other debt not facially evident in the DBA assets. Retiring externally undetectable debt from the DBA assets is relatively straightforward. Only the DBA assets require re-testing and re-certification. Retiring externally detectable debt from the DBA assets is inherently more difficult and riskier. It requires more extensive re-testing and re-certification of both DBA and IA assets.
4. The DRP directly touches multiple sites
DRPs that entail modifying technological assets of geographically dispersed organizations tend to be more wicked. This comes about because of factors including the following:
Sites might be geographically dispersed. But they might also be separated by language boundaries, legal jurisdictions, cultural divides, time zones, financial reporting practices, and much more. The required work of actually retiring the debt can vary from site to site for both technical and nontechnical reasons.
The multiple sites might have different landlords, with different lease agreements governing the organization’s occupancy of the property. This is just one of many factors that increase the numbers of stakeholders involved. It also exacerbates their heterogeneity. And the leases might constrain the kind of work that is permissible according to the day of the week or time of day.
If local vendors provide services such as communications or Internet connections to some of the sites, the DRP can be more complicated. If the work of the DRP involves these technologies and the local vendors, the task of coordinating all the different players can be complex and can encounter unanticipated obstacles.
Examples of unanticipated obstacles
For example, consider a case in which the work of the DRP involves networking hardware and software. That is work that we might prefer to perform at night or on a weekend, when it is less disruptive for users. For a global enterprise, there might not be a suitable time of day for such work.
As a second example, consider a network upgrade for retail branch offices of a global bank. If that upgrade requires trenching for new cable connections, the project design must take into account local regulations governing the trenches. Factors to consider include the permitting process and trench requirements. Trench requirements include specifications for filling, covering, and marking while still open. These regulations vary with national and sometimes local jurisdiction. The complexity causes most organizations to rely on local vendors. But even then, the vendor selection process must include reliable vendor assessment and evaluation. Scheduling becomes a complex and risky endeavor.
For these reasons, a DDRP that involves technological assets housed at multiple geographically dispersed sites has an elevated probability of exhibiting the properties of a wicked problem.
5. Government agencies and/or industrial standards organizations must re-certify assets
Another driver of stakeholder expansion is the need for re-certifying assets after the DRP has modified them. The certification agencies can range from local and municipal regulators to national regulators and pan-industrial standards organizations. The number of possible agencies itself contributes to increased wickedness. But the operating style of these organizations merits special notice.
Many of these agencies operate without competitors. Perhaps for this reason, “customer service” might not be their strength. Gaining timely cooperation from them might be a challenging undertaking. Even though re-certification might be a small part of the DRP, it can become a blocking obstacle. Researching these requirements and their associated lead times, and maintaining a current knowledge base about them, can be an important task of the DDRP.
To acquire experience and information about their performance, consider using a pilot approach. Try to gain certification for an asset similar to the target of the DRP.
6. Nontechnical stakeholders must change their behavior
Generally, people don’t like to change how they work. There are exceptions, of course, if they recognize a benefit that arrives in some direct way. But unless there is a direct benefit, requiring people to change how they work as part of a DRP is likely to increase the wickedness of the DDRP. And the difficulty is more problematic if the people affected are technically unsophisticated. They’re less likely to appreciate the value of managing technical debt, and less likely to accept explanations of that value.
DDRP wickedness increases in this case for another reason. In addition to retiring the technical debt, the DRP must address the tasks of motivating and training the affected population. That requires preparing materials, scheduling and accounting for the time spent in training, and monitoring training effectiveness. The business case must also address these issues. It must also provide the evidence required to defuse any political opposition that might otherwise develop.
7. Major unanticipated complexity triggers redesign of the DRP
Unanticipated complexity happens in almost every project of almost any kind. But for DDRPs, unanticipated complexity that triggers adjustment of the DRP is especially unpleasant. Such a discovery can mean that the DBA assets or their connections to the IA assets have changed since the design team devised the plan. Or it can mean that the design team had an incomplete or incorrect understanding of the problem at hand. These events can occur for a number of reasons.
Examples of nontechnical causes
Imagining technical causes might be easier. So I’ll focus on nontechnical causes, which can actually be more serious. For example, suppose a political alliance enabled the VP of Sales and the VP of Engineering to reach a deal. They agreed that the DRP team would work on some important DBA assets, taking them off line for defined periods. If that political alliance weakens, or if the deal between the two VPs collapses for other reasons, the scheduled downtime of those assets might vanish. This pattern is more likely to arise in situations in which the DDRP team isn’t a party to such agreements. The DDRP team must be a party to any agreements regarding access to assets by the DRP team.
As a second example, consider what happens when the enterprise undertakes an acquisition of another enterprise. And suppose the acquisition team doesn’t inform the DDRP team during their design effort. Because chances are good that the DDRP would have a significant amount of rework to do for the acquisition, this scenario is illustrates a problem. The DDRP team must be aware of any organizational changes that could affect the DRP, for the active life of the DRP.
Redesigning the DDRP can take time. The DDRP team must periodically revisit elements of the DDRP that have short shelf lives during the design period. And the need to redesign can also indicate gaps in the DDRP team’s understanding of the problem.
All of these conditions tend to move the DDRP in the direction of increased wickedness.
8. The DRP requires weekend or middle-of-the-night work periods
The need to perform critical operations on weekends or in nighttime hours suggests three things. First, the work is risky in the sense that undetected faults that go into production can lead to costly operational errors. Second, the organization lacks a simulated operating environment that emulates the actual operating environment faithfully enough to enable defect detection before deployment. Such environments are also known as staging environments. Third, and finally, the organization lacks a rapid rollback mechanism that can restore the original state of an asset if the new modified state proves problematic when deployed.
If you anticipate multiple DRPs,before undertaking a DRP, it’s wise to construct a staging environment and devise a rollback mechanism. Cost is usually the blocking issue. But compare that cost to the cost of retarding all future DRPs, and the cost of any operational failures arising from deploying faulty systems. Staging and rollback capabilities are usually good investments.
Continued refusal to provide a staging environment with rapid rollback increases the wickedness of this and any future DDRPs.
9. Rollback of attempted changes triggers redesign
In the course of executing the DRP, if reverting some (or all) of the work performed becomes necessary, we say that we’re ordering a rollback. Minor rollbacks do happen. But if we discover the need for a rollback long after completion of the work in question, the damage can be catastrophic. When these incidents occur, they can indicate a deep misunderstanding of the consequences of the work of the DRP. Because that misunderstanding could have consequences not yet recognized, such a rollback could suggest that the DDRP team underestimated the wickedness of the DDRP.
Let DBAf (faulty DBA) be the set of assets in the DBA that formerly contained some of the debt being retired. And suppose the DRP alterations contained or led to exposure of some kind of fault(s). Suppose further that the faults forced a rollback after they were deployed. Let DBAfw (wicked-faulty DBA) represent the subset of DBAf for which that rollback did trigger a redesign of the DDRP. Then wickedness of the DDRP is correlated with the size of DBAfw and the extent of the DDRP redesign that the rollback triggered.
An illustration of the effect of defects
For example, let Efw be a member of DBAfw. And suppose Efw is a modular element of a system that monitors the clicks of users of a Web site. It records data for later analysis, and because of the fault it does so incorrectly. When the site operators discover the errors, the DRP orders the rollback of Efw. They replace Efw with its original, unaltered, debt-bearing form. Because Efw contaminated the original database, data rollback is impossible. The site operators did discover the error, but they can’t re-capture lost data. That’s why the DDRP team must re-design the DRP. This scenario is an example of Criterion 5. If there are political consequences for the loss of data, this scenario could be an example of Criterion 10.
This example suggests how the frequency of incidents that trigger redesign of the DDRP can be an indicator of the wickedness of the DDRP.
Example of a non-indicator: the I-35 SR-30 interchange near Ames, Iowa
Just outside Ames, Iowa, is an interchange between Interstate 35 (a four-lane, divided, limited-access roadway) and U.S. Route 30 (four-lane, divided, not limited-access). The interchange is a conventional cloverleaf design. The “leaves” are rather tight, though, and consequently, there have been numerous rollovers and crashes at this interchange. We can regard these tight cloverleaf ramps as technical debt in the highway system, and the rollovers and crashes as metaphorical interest charges on that debt.
In 2016, construction began on a new “flyover” exit ramp from northbound Interstate 35 onto westbound U.S. Route 30. The objective was to reduce the number of accidents at the interchange by replacing the current tight-curvature cloverleaf ramp with a flyover exit ramp with a longer radius of curvature. We can regard this project as a Debt Retirement Project (DRP). The project that planned that DRP was an effort to Design a Debt Retirement Project (DDRP).
Much went well, but an error occurred
Completion of the DRP was scheduled for November 2018. When completed, the new ramp will replace the northeast leaf of the cloverleaf. Like most civil engineering projects, this project does have some elements of wickedness. But the project dealt with those elements effectively. Nevertheless, a construction error is delaying completion [Magel 2018] [Iowa DOT 2018].
The error involves the height and position of the bolt anchors where steel bridge beams will connect to the concrete piers of the new flyover ramp. The contractor constructed six piers to support the flyover. Correcting the piers involves jackhammering the concrete tops, leaving the steel reinforcement in place. After positioning the beam anchors correctly, and re-pouring the concrete, the piers will be ready to support the beams. At this writing, the contractor has not yet announced the new completion date.
How they’re correcting the error
This effort, which includes a rollback and re-deployment, is a significant project in itself. It requires scheduling the work. But it also requires scheduling highway lane closures and lane shifts, and working around high-volume traffic periods. Depending on the schedule, they will possibly pour concrete in winter conditions. And after correcting the piers, the bridge beam placement and bridge roadbed work must proceed on a new schedule.
Consequently, the construction error triggered a redesign of the flyover project’s DRP. But it probably did not trigger a significant redesign of the DDRP. The construction error is therefore unlikely to be an indicator of significant additional wickedness for the DDRP.
You can become better managers of the risk of unanticipated wickedness. If your organization is embarking upon a long-term program of technical debt retirement, you’ll be executing many DDRPs and DRPs. Gathering data about incidents of unanticipated wickedness in DDRPs can be a useful practice, if you use that data when you design new technical debt retirement projects.
[Iowa DOT 2016] “Construction drawing for the Northbound I-35 Flyover Ramp at U.S. 30 Near Ames,” Iowa Department of Transportation, February 2, 2016.