When we transform assets to retire some of the technical debt they carry, service disruptions are sometimes necessary. To minimize service disruptions while technical debt retirement efforts are underway, it’s advantageous to automate some procedures. Automation-assisted technical debt retirement provides two important benefits: reduced disruption of operations and reduced incidence of errors.
The meaning of automation
I’m using the concept of automation a bit loosely here. I don’t mean to imply that these procedures are autonomous. What I mean is that engineers have available tools for performing many operations with a minimum of thought. For example, in this sense of automated, an engineer can issue a command such as, “Test Module Alpha Using Test Suite Delta.” That command executes a predefined set of tests. Following execution, the appropriate engineers receive the results. The tool also archives those results appropriately. If the results are anomalous, engineers can then take appropriate action.
Benefits of automation-assisted technical debt retirement
The more obvious benefit of automated procedures is speed. For example, an asset removed from service for testing can be returned to service more quickly if the testing is automated. And if trouble erupts during operations of a newly transformed asset, engineers can swap the untransformed asset back into place quickly. So-called roll-out and roll-back tools are just a few of the many elements of a set of practices collectively known as continuous delivery [Humble 2010].
The second benefit of this kind of automation is error avoidance. For example, inconsistent or incomplete testing can fail to find errors and defects, and that leads to rework and further disruptions. Performing tests incorrectly, finding “defects” that aren’t there, is another way to generate trouble. Automated procedures are much less prone to error if we maintain, test, and certify them periodically. For example, consider subjecting a module to a particular test suite. With automation assistance, engineers needn’t remember (or take time to look up) how to prepare the asset for tests. They needn’t remember (or take time to look up) how to run the tests, or what the members of the test suite are. Long advocated as an essential element of sound engineering practice, test automation can avoid some of these problems. But it’s far short of a panacea [Bach 1999].
Other automation opportunities
In some situations, we can automate debt retirement itself. When we can retire instances of the technical debt in question by performing an automated transformation on an asset, the transformation is faster and more reliable.
A most important practice associated with automation-assisted technical debt retirement is automation-assisted regression testing. Investments in thorough and focused regression testing have potentially shockingly high returns in the debt retirement context. They can be just as valuable during development and routine maintenance.
To perform a regression test on an asset that has undergone a change is to examine its behavior under a specified set of conditions. Such investigations can determine whether those changes caused the asset to misbehave. So a regression test determines whether the asset has regressed as a result of the change. Automated or automation-assisted regression tests help the project team detect problems in assets that they’ve transformed. And that’s much better than having the business units that depend on those assets encounter problems during operations [Ge 2014].
Many of these same regression tests can also be useful during enhancement and ongoing maintenance of the asset. Often, investing in automated regression tests in advance of the debt retirement project can enhance development and maintenance performance relative to those assets. Later, when the debt retirement project begins, the previously obtained results of regression tests will already be available.
Last words
For some debt retirement projects, specially created automated regression tests might be beneficial. Assign engineers to continual automation tool development for debt retirement projects. That’s probably the best way to support these needs.
These automation capabilities are unlikely to be available commercially, because they’re so specialized to the asset being tested. Because general applicability is unnecessary, building them in-house is both practical and economical. If people with the necessary skills are unavailable, acquire them. We can justify these investments economically if we take into account the savings from reduced service disruptions during technical debt retirement projects.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
Managing technical debt is something few organizations now do. And fewer do well. Several issues make managing technical debt difficult and they’re discussed elsewhere in this blog. This thread explores tactics for dealing with those issues from a variety of initial conditions. For example, tactics that work well for an organization that already has control of its technical debt, and which wants to keep it under control, might not work at all for an organization that’s just beginning to address a vast portfolio of runaway technical debt. The needs of these two organizations differ. The approaches they must take might then also differ.
What’s in this thread
The first three posts in this thread illustrate the differences among organizations in different stages of developing technical debt management practices. In “Leverage points for technical debt management,” I begin to address the needs of strategists working in an organization just beginning to manage its technical debt. They ask the question, “Where do we begin?” In “Undercounting nonexistent debt items,” I offer an observation about a risk that accompanies most attempts to assess the volume of technical debt. Such assessments are frequently undertaken in organizations at early stages of the technical debt management effort. In “Crowdsourcing debt identification,” I discuss a method for maintaining the contents of a database of technical debt items. Data maintenance is something we might undertake in the context of a more advanced technical debt management program.
Obstacles we must address
Whatever approach is adopted, it must address factors that include technology, business objectives, politics, culture, psychology, and organizational behavior. So what you’ll find in this thread are insights, observations, and recommendations that address one or more of the issues related to these fields. “Demodularization can help control technical debt” considers mostly technical strategies. “Undercounting nonexistent debt items” is an exploration of a psychological phenomenon. “Leverage points for technical debt management” considers the organization as a system and discusses tactics for altering it. And “Legacy debt incurred intentionally” explores how existing technical debt can grow as long as it remains outstanding.
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
Adopting a technical debt management programs entails significant organizational change. The problem can seem so daunting that we don’t know where to begin. The places to begin are the places where the change agents have greatest leverage—what systems analysts call leverage points. Consider this scenario:
You’re sitting in the kickoff meeting of the new Technical Debt Management Task Force. The CEO is talking about how she realized that the company had a technical debt problem. It was when the Marigold project went through delay after delay, and was finally declared done, with multiple objectives waived. She’s saying something about, “we were trying to do backflips with millstones around our necks. So I want this task force to show us how to get rid of the millstones, and then get rid of them.”
OK, you think. But how? We’re a global enterprise with thousands of engineers and operations on every continent. Except maybe Antarctica. No wait, we’re there, too. McMurdo I think. We have software we don’t even know much about, acquired long ago along with the companies that built it. And we’re building new systems or modifying old ones all the time, trying to move everything to the cloud while enhancing data security. Where do we begin to look for the millstones of technical debt?
Have you been in that meeting? If not, can you imagine being in that meeting? Meetings like that are happening around the globe. We’re all in the same soup.
Leverage points: how to get rid of the millstones
It turns out that the answers to the millstone questions are available, but the pioneers and deep thinkers who have shown the way aren’t working on technical debt. Their field is called systems analysis. They work on problems like the collapse of the North Atlantic fishery, urban deterioration, unemployment, poverty, climate change, and the causes of the Great Recession of 2008—really difficult problems. Although the technical debt problem isn’t quite that challenging, it’s challenging enough to justify taking a look at the methods of systems analysis.
And when we do that, we immediately encounter a concept many call leverage points.
What are leverage points?
Leverage points are places in complex systems where a small change in one thing can produce big changes in system behavior. In a brilliant 1997 article, Donella Meadows describes what she calls “places to intervene in a system.” [Meadows 1997] She followed this article, making improvements each time, in 1999 [Meadows 1999] and 2008 [Meadows 2008]. Let me summarize Meadows’ work here.
To alter the behavior of a complex system, intervene at one or more of 12 categories of leverage points. For example, one category is called “Rules.” It consists of the incentives, punishments, and constraints that govern the behavior of the people and institutions that comprise the system. By adjusting the system’s rules, we can alter overall system behavior.
One more thing: the leverage points form an ordered hierarchy, ordered by effectiveness. Acting at a higher-level leverage point is more effective than acting at a lower-level leverage point. And more difficult, too. The ordering of the categories is a bit fuzzy, because every situation has its own quirks, but generally, the order is as given in the list below.
The twelve leverage points
In a moment I’ll give an example of using leverage point #9, Delays, to bring about change in the way the enterprise deals with technical debt. But first, here’s a brief summary of the leverage points in increasing order of leverage; not enough to truly understand what they are, but probably enough to pique your interest. As I write posts that illustrate interventions at these leverage points, I’ll link to them from here.
Numbers: Constants and parameters such as subsidies, taxes, and standards
Buffers: The sizes of stabilizing stocks relative to their flows
Stock-and-Flow Structures: Physical systems and their nodes of intersection
Delays in feedback loops
Balancing Feedback Loops: The strength of the feedbacks relative to the impacts they are trying to correct
Reinforcing Feedback Loops: The strength of the gain of driving loops
Information Flows: The structure of who does and does not have access to information
Rules: Incentives, punishments, and constraints
Self-Organization: The power to add, change, or evolve system structure
Goals: The purpose or function of the system
Paradigms: The mind-set out of which the system—its goals, structure, rules, delays, parameters—arises
Transcending Paradigms
Delays in feedback loops
When we use feedback to control systems, and there are delays in the feedback, we can potentially create destructive system behavior. And that can happen when we try to control technical debt.
Whenever we try to control a quantity in an enterprise process, we must (a) set a target value for that quantity; then (b) measure its current value; and then (c) take action as appropriate to move the current value toward the target value. Systems analysts (and control theorists) call that arrangement a feedback loop. The action taken to move the current value to the target value is sometimes called the control signal. Under certain conditions, the feedback works as expected.
For example, to control the profitability of the enterprise, we can examine its net income, say, quarterly. And at the end of each quarter we can make adjustments if net income isn’t in the target range.
Feedback loops generally work pretty well, but under some conditions, oscillations can develop. One of those troublesome situations occurs when there’s a delay in the loop that’s of the same order as (or longer than) the time the system takes to respond to adjustments. Meadows uses the example of adjusting the water temperature of a shower when there’s a long delay between making the adjustment and feeling its effects. Overcorrection is almost inevitable, and that’s what causes system oscillation.
How controlling technical debt can create feedback loops
So let’s suppose that we’re trying to control the rate of accumulation of technical debt. One approach is to set a target for TDnew, the new technical debt generated in a project. To be fair to all projects, we decide to normalize this quantity according to the project budget B. So we set targets for each project’s N = TDnew/B, and we require that projects estimate N, on an ongoing basis, with a goal of having N in some target range when the project is complete.
Identifying technical debt isn’t straightforward
One problem with this approach is that we rarely identify accurately all the technical debt we’ve incurred until some time has passed after project delivery. With time, as the newly produced assets go into production and learning accumulates, we acquire the wisdom needed to identify more of the technical debt we created. This is one source of delay in this feedback loop.
So let’s assume that this happens for several projects, and management decides that delayed recognition of incurred technical debt is a common occurrence. To account for this, management lowers the target ranges for N for future projects. This causes project managers and project sponsors to include in their project plans additional effort directed at retiring more of their incremental technical debt before their projects complete, to enable them to project lower values of N. They must therefore identify as much of the incremental technical debt as they can, and retire it, to meet the lower targets for N.
How oscillations set in
But recall that technical debt identification sometimes requires time and experience using the newly produced asset. And the reverse process also occurs. Technical artifacts that we thought were technical debt prove to be useful in unexpected ways, and actually turn out not to be debt items after all. As a result, some of the incremental technical debt that got retired before the project was completed actually should not have been retired. Eventually, people realize that this happens with uncomfortable frequency, and so the targets for N are raised once more.
Oscillations thus set in. Long delays inevitably cause them. To prevent oscillations, shorten the delays.
How to shorten delays in feedback controlling technical debt
When we use feedback to control a system, delays in that feedback can lead to instability. Trying to control technical debt is no exception. With technical debt we can shorten delays in several ways.
If the asset is meant for human use, involve representatives of the user population in the development and design process as soon as practical. Have them exercise the asset, or prototypes, early. Listen to their suggestions. Observe how they use the asset.
If the asset must interact with non-human assets, exercise it early and often. Don’t think of this as testing, though it might look very much like testing. What you’re actually doing is searching for shortcomings in how the asset interacts with non-human assets, in design and implementation in an asset that already works.
Subject the asset to multiple reviews all along the development trajectory. Don’t wait for final release to review it.
These practices expose technical debt items early—potentially, during initial design—thereby reducing delays in identifying what is and what isn’t technical debt. They help to advance the date at which we uncover missing capabilities or capabilities designed or implemented in awkward ways. No surprise, I’m sure, but these practices are consistent with Agile approaches to technological development.
Indirect effects can add to delayed recognition of technical debt
Most of the argument above assumed that the incremental technical debt associated with the project was incurred within the asset undergoing development or maintenance. But technical debt can occur in other assets as well. When the development team is unaware of such “remote” or “indirect” incremental technical debt, recognition of that new incremental technical debt can be significantly delayed. The project’s N (the ratio of incremental technical debt to project budget) will appear to be smaller that it actually is, until that remote incremental technical debt is recognized.
This form of delay is likely to occur when the debt incurred is asset-exogenous. Recall the example of line extension of mobile phones. In that example, the enterprise incurs technical debt in one set of products as a result of the introduction of a different product. In some cases, the newly incurred technical debt is immediately evident. When it is not, delays can be substantial.
This effect is by no means rare. Any organizational change can potentially add to the technical debt portfolio—reorganizations, acquisitions, expansions, wholly new products, and much more.
Last words
Interventions at the leverage points of an organization can produce the changes we want with a minimum of effort. Some subtlety is involved, because Meadows’ leverage points are expressed at a high level of abstraction. But applying them to the problem of technical debt management is a promising approach.
Bookmark this post. I’ll be linking to more examples of using leverage points to manage technical debt. So far:
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
Many an enterprise culture includes, perhaps tacitly, an unrealistic definition of done for projects. Some enterprise cultures assume definitions of done that fail to adequately acknowledge attributes related to sustainability. For such cultures, technical debt expands inexorably. In most organizations, the definition of done includes meeting the attributes that most internal customers understand and care about. These attributes might not include sustainability [Guo 2011]. Indeed, even among technologists, the definition of done might not enjoy precise consensus [Wake 2002].
Why retiring technical debt isn’t included in “done”
Internal customers understand less well the attributes of deliverables related to sustainability. It’s therefore perhaps unsurprising that sustainability might not receive the attention it needs. Applying scarce resources to enhance attributes the customer doesn’t understand, and cares about less, will always be difficult.
To gain control of technical debt, we must redefine done to include addressing sustainability of deliverables. Although there may be many ways to accomplish this, none will be easy. Resolution will necessarily involve educating internal customers to understand enough about sustainability to enable them to justify paying for it.
Redefining “done”
The typical definition of done for most projects ensures only that the deliverables meet the requirements. Because requirements usually omit reference to retiring newly incurred nonstrategic technical debt, we often declare projects complete with incremental technical debt still in place. A similar problem prevails with respect to legacy technical debt.
A more insidious form of this problem is intentional shifting of the definition of done. This can happen when the organization has adopted a reasonable definition of done that allows for addressing sustainability. But under severe time pressure, the definition is “temporarily” amended to allow the team to declare the effort complete, even though sustainability issues remain unaddressed.
For most projects, three conditions conspire to create steadily increasing levels of nonstrategic technical debt. First, for most tasks, the definition of done is that the deliverables meet the project objectives, or at least, they meet them well enough. Second, typical project objectives don’t restrict levels of newly incurred nonstrategic technical debt, nor do they demand retirement of incidentally discovered legacy technical debt. Third, budget authority usually terminates upon acceptance of delivery. These three conditions, taken together, restrain engineering teams from immediately retiring any debt they incur. Nor can they retire—or document or report—any legacy technical debt they encounter while fulfilling other requirements.
For example, for one kind of incremental technical debt—what Fowler calls [Fowler 2009] Inadvertent/Prudent (“Now we know how we should have done it”)—the realization that we’ve incurred new debt often occurs after the task is “done.” If budget authority has terminated, there are no resources available—financial or human—to retire that form of technical debt.
Last words
Unless team members document the technical debt they create or encounter, there is risk of lost knowledge. After team members move on to their next assignments the enterprise is likely to lose track of the location and nature of that debt. A more realistic definition of done would enable the team to continue working post-delivery to retire or document any newly incurred nonstrategic technical debt. They could also note any incidentally encountered legacy technical debt. Moreover, teams most likely leave in place any strategic technical debt—technical debt incurred intentionally for strategic reasons. Although the enterprise must eventually address such debt as well, the widespread definition of done doesn’t address it.
Policymakers are well positioned to advocate for the culture transformation needed to redefine done.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Guo 2011] Yuepu Guo, Carolyn Seaman, Rebeka Gomes, Antonio Cavalcanti, Graziela Tonin, Fabio Q. B. Da Silva, André L. M. Santos, and Clauirton Siebra. “Tracking Technical Debt: An Exploratory Case Study,” 27th IEEE International Conference on Software Maintenance (ICSM), 2011, 528-531.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.
The behavior of internal customers and users of enterprise technological assets can contribute to technical debt formation and persistence. Because of these contributions, introducing effective technical debt management practices requires widespread behavioral changes on the part of those internal customers and users. Accepting these changes, and the initiative and creativity they require, is possible only if people understand the technical debt concept. When they do, they can appreciate the benefits of controlling technical debt, and the consequences of failing to control it. Similarly, when they do not understand or accept the technical debt concept, progress toward effective technical debt management is unlikely. Policymakers can contribute to the planning and execution of the required organizational transformation.
Even when engineering teams are aware of the technical debt concept, and when they do try to manage technical debt, progress can be elusive. Significant progress requires the support and understanding of engineering management, internal customers, and customers’ managements. Everyone must understand that controlling technical debt—and retiring it—is a necessary engineering activity that has a business purpose. Everyone must understand that technical debt arises as a result of everyone’s behavior—not just the behavior of technologists.
Part of the job of Management is to ensure that engineers have what they need to avoid incurring technical debt unnecessarily. Management must also ensure that they have what they need to retire elements of legacy technical debt on a regular basis. Internal customers must understand that communicating their long-term business strategies to Engineering is essential for limiting unnecessary creation of artifacts that become nonstrategic technical debt. Only by understanding the technical debt concept can internal customers learn to avoid the behaviors that lead to nonstrategic technical debt, and adopt behaviors that limit new technical debt.
The tensegrity structure metaphor for technical debt management
Tensegrity structures provide a metaphor for organizations that have mastered the technical debt concept. Tensegrity structures use isolated rigid components in compression, held by a network of strings or cables in tension. The rigid components are usually struts or masts, and they aren’t in contact with each other.
The struts correspond to the users or customers of technological assets. The cables correspond to the engineering activities required to support the customers. The organization is stable relative to technical debt only when the two kinds of elements (struts and cables) work together, each playing its own role, but each appreciating the role of the other.
Advocating for cultural transformation
Advocates of any change to organizational culture are often seen as acting in their own self-interest. That’s a common risk associated with cultural transformation. It’s a risk that can lead to failure when inserting practices related to technical debt management into the culture. The risk is greatest when advocates for change are drawn exclusively from the technical elements of the enterprise. The ideal advocates for these ideas and practices are the internal customers of the technical organizations, and senior management.
References
[Bach 1999] James Bach. “Test Automation Snake Oil!” (1999).
[Ge 2014] Xi Ge and Emerson Murphy-Hill. “Manual Refactoring Changes with Automated Refactoring Validation,” Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.
[Guo 2011] Yuepu Guo, Carolyn Seaman, Rebeka Gomes, Antonio Cavalcanti, Graziela Tonin, Fabio Q. B. Da Silva, André L. M. Santos, and Clauirton Siebra. “Tracking Technical Debt: An Exploratory Case Study,” 27th IEEE International Conference on Software Maintenance (ICSM), 2011, 528-531.
[Humble 2010] Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation, Pearson Education, 2010.