In my previous posts on architectural debt, I focused on the engineering side. This post focuses more on the management side, or the organisational “gearing” factors that prevent systematic technical debt reduction. You don’t have to know what distributed system architecture debt is to get any value out of this post, but if you want to know more, read this post first.
When you reduce architectural debt, you are not doing a simple down payment with a currency like dollars. You need to do the right thing technically given your future plans. If not, you add to the debt, not reduce it. Or, you can also end up wasting time on reducing debt that you could live with. So, you need three things to do debt reduction: a target for where you need to go (based on your business objectives), a plan for how to get there, and the ability to stay the course, while also adjusting both target and plan as you go.
In my experience, there is no lack of good ideas for where you need to go architecturally. This can be in the form of short-term refactoring, an architectural model (borrowed or built), or longer-term standardisation work. The key problem is that the fundamental nature of distributed system architecture debt is that you can rarely fix anything through a quick-fix. You need a plan for how to evolve the architecture over multiple iterations, releases, and possibly over years. The more steps you need to execute, the more resilient to change you have to make the target architecture and the plan, because things change all the time, you have to adjust to achieve what you set out to do, and you will thus never end up where you thought you would be.
Let’s have a look at the typical steps you go through:
- Identify key business objectives
- Set long-term architectural targets that allow execution towards objectives
- Make a stepwise plan for how to get to the targets (both transitioning for old products and a prioritised backlog for new)
- Operationalise the plan with details on step 1 of the plan, and do feasibility validation of step 2, maybe also step 3
- Execute step 1
- Adjust target and plan for changes by starting at the top of this list…
You can choose your methodology, agile or waterfall, and you can choose your time line, short-term or long-term, but the steps are typically the same, but they are typically more or less formal dependent on your approach.
So, what can possibly go wrong?
1. Lack of clear business objectives and priorities, and architectural decisions cannot be made.
Often, management and product management will be able to define the high-level business objectives. But once these have been set, there are thousands of decisions that shape the overall architecture target (and thus the end product or service). Start-ups have typically been getting better at honing in on one set of customers and solve their problems (customer development, product/market fit, and lean startup are terms you may be familiar with). However, in a big company, there is a tendency to want to do too much in step 1. There is an inability to get to an agreement on a shared priority and say: “hey, forget about those big customers now, in step 1 we only want to solve the problem for these 20% of our customers, and then we focus on the next 40% of the customers in step 2.”
I often refer to the people doing this systematic planning and operationalisation of business objectives as the “translation layer”. Top-level managers lack the detailed knowledge, while the product managers and product architects typically lack the overall picture as they focus on their part of the business. The directors and technical leaders need to take ownership of the overall picture and establish clear priorities and objectives that both fit into the overall business objectives, but that also make sense for a specific product. Weak leadership in this “translation layer” leads to lack of guidance and constant adjustments in-flight when people discover that something is going in the wrong direction. You get the feeling of a ship with constant course adjustments, it looses speed, and if it gets bad enough, it will not have enough speed to even steer anymore.
Very often, the organisation fails to identify what I call the “pivotal assumptions” and make explicit decisions on these. When translation does not happen, people need to make their own assumptions in order to prioritise. Different teams will have different assumptions, and often the assumptions are not compatible. By identifying these underlying, pivotal assumptions and make the assumptions explicit and shared, the organisation will both execute in one direction, as well as be able to quickly change course if the assumption is no longer true and a different assumption must be made.
2. Lack of a strong architectural framework and principles.
We want each product manager, architect, and engineer to be able to make local decisions. In order to do that, they need an architectural framework where some of key technology choices have already been made and where a set of very explicit principles guide the decisions. Of course, without operationalisation of business objectives, it is difficult to establish a clear framework and principles, however, technical leaders have a responsibility to work with the commercial side to do this operationalisation. In this process, trade-offs are crucial. Foundational architecture work takes time and resources, and you don’t build the architecture before you deliver features, you typically need to both build architecture and feature at the same time or with as little time lag as possible. Thus, the architectural foundation needs to be prioritised in a such way that the desired features can be delivered in step 1. Also, when doing agile, you typically start out with a fairly loosely defined framework and principles, and then you evolve them.
When making these architectural priorities, you open up or close the ability to deliver certain features, and the trade-offs in what you will get needs to be put in front of company leadership. (I have worked on a light-weight approach to this that I have called “rapid priority evaluation”. I might blog about that in another post.)
I have found that in a large organisation, a typical hindrance to architectural frameworks and principles is that more than one team deliver products or services within the same architectural function. There is thus no clear accountability and ownership of the type: “you own that particular customer problem or technical function, go run with it”. I have championed establishment of internal “centres of excellence” for specific architectural functions. In a organisation doing acquisitions, this is particularly important because you loose a lot of development speed (and add to architectural debt) by allowing multiple teams to own the same function. How to organise a centre of excellence and what authority to give the teams could probably be a good topic in another post.
3. Focus on the target, not the step-wise plan.
Engineers have a prioritised backlog. The order of what you code first is extremely important to the end result. Especially if you have many teams, e.g. use scrum of scrums, and they are distributed across time zone, and there are dependencies between their deliverables. In order to get a good result, also business leaders have to live and breath the backlog. Too often, though, business leaders focus on the target, and forget about the management of the backlog. If a leader orders some juicy spareribs in a restaurant, the spareribs will be bought, marinated over maybe days, put on the barbecue 3-4 hours before serving time, and a similar “backlog” of work items is created for the rest of the meal. If the business leader two hours before the meal declares that he wants fish as a main course, everything breaks down and most likely the meal has to be postponed. An engineering organisation will start hedging when their leadership team over and over again change their mind at the wrong time . They will buy both spareribs and fish, do foundational preparation for both, and waste a lot of resources just in case the leaders change their mind. Engineering and architecture is exactly the same thing. But for distributed architectural debt, debt reduction is not comparable to buying the spareribs, it’s comparable to buying spareribs for a whole chain of restaurants! Imagine the impact when suddenly leadership wants a fish restaurant, not a steakhouse…
4. Inability to plan beyond horizon 1.
When focusing on targets, and not the backlog, leaders will “collapse” the organisation’s planning model from a multi-step, iterative process into a single target: what do we sell in 9 months!?
This focus on horizon 1 is typically supported in larger organisations by internal funding and project approval processes where planning is focused on projects with time-limited scope as opposed to funding of teams with responsibilities. In an organisation like this, people will have limited time and focus on what is beyond the next 9-12 months, and it gets very difficult to create that multi-step plan that will get you to a certain architectural target or reduce your architectural debt.
This lack of activities and formal planning beyond horizon 1 is a vicious cycle. You don’t get time to validate the feasibility of step 2 and 3 of your plan (or worse, you don’t have a step 2 or 3), bad decisions will be made and work will have to be re-done in the next phase. Also, very often you are not able to finish up what you planned in one go, so you will need to finish up in the next. Well, if you didn’t plan and communicate that need, you disappoint customers and you may not get the allocated time in the next phase to finalise. The typical management response is to tighten the deadlines, cut features, and manage the horizon 1 project “better”, maybe add a program manager to drive through the project.
5. Lack of transparent processes when adjusting course.
In our industry there is constant change. So, any plan you make will be incomplete and wrong. But, this doesn’t mean that a plan is not necessary. A clearly communicated and understood plan is a wonderful tool to drive a transparent process for making change. This is regardless of the size of the team or the project. If everybody understands what we are supposed do, we can easily discuss the implications of a change and make a more informed decision on how to make a change. And equally important, once we decide, everybody understands the impact of the decision, and execution will be faster and less error-prone.
Some engineering organisations have two separate career tracks (including Cisco): a management track (you become a director, senior director, and then vice president) and a technical track (you become a technical leader, principal engineer, and distinguished engineer). This is great for engineers as they are not forced into management roles to get career advancement. However, in order to track progress in a technical career, you need to accomplish something as an individual contributor. So, if a technical leadership career is about technology and individual contributions, a management career is about managing teams, projects, and making decisions. The problem is that while the non-technical processes and decisions are carefully managed by MBAs and engineers turned commercial, the very technical decisions are often made by a bunch of individuals who are all individual contributors plus some managers who don’t understand the implications. The technical leaders who are competent on the technical side of the decision do not drive a process, they focus on contributing their knowledge and facts, so that managers can make decisions. However, the managers who are good at driving processes and getting to a decision are uncomfortable and do not have enough detailed knowledge to make a decision. Then there are some technical leaders who are both competent and have the authority to make a decision and they do. This works great if just a decision was needed, but is less successful if what you needed was involvement from many stakeholders, evaluation of pros and cons, and bringing people with you to a conclusion that they are comfortable with and can execute on.
6. Inability to stay the course.
Building great products and services take time. Building great teams take time. If key people in the organisation either lack the confidence in the strategy, the ability to execute, or whether key decisions where right, it is difficult to get the commitment and funding to stay the course when step 1 is done. Managers with lack of technical insight mistake the need to adjust the course as a reason to change the course every time one phase has been delivered. In the restaurant metaphor, a single meal the first night is comparable to the first phase. However, a great restaurant needs to build up relationships with suppliers, develop their menus and recipes, hire chefs that are good at a certain type of food and so on. That is how you excel. Same thing for products and services. You are rarely done when you have completed a feature. You refine, you iterate, you listen to your customers and users, you track and you measure.
Too often, when phase 2 is next, what was done in phase 1 is forgotten, even the things we had to postpone because we never got to them. And if this is true for a feature, what then for steps needed to reduce architectural debt? When somebody makes a customer promise, or a competitor comes out with a new feature, management jump up and re-prioritise. The more they do that, the less debt is reduced, more debt is added, and feature velocity goes down.
7. Inability to recognise commonalities and patterns and a desire to start from scratch.
In our industry, we are always transitioning. Transitioning from an old product to a new, from an old protocol to a new one, from an old licensing model to a new, or from buying products to buying services. There is this rule: engineers always underestimate the time and resources needed to build something existing in a new way, and they love the opportunity to start with a clean slate. Truth is that the messy code we did to fix thousands of bugs and implement tiny features embody the accumulated experience of many developers solving a real-life problem delivered in a certain very specific way. If you want to solve the same problem, but in a very different way, you could start from scratch (though it would be wise to use the same engineers, possibly with some fresh blood). However, if you are transitioning your customers from one product or service to another and they expect continuity, you need to bring them with you. If you do that by coding from scratch, it is a very painful process.
However, by isolating functionality, modularisation, and layering, you can isolate pieces of code, leave some of it alone, and then replace functionality piece by piece. A carefully planned transitioning is beautiful to watch (but rarely gets much attention). The failures though get far more attention. A success requires a well-functioning organisation with patience and ability to stay the course.
8. Inability to execute well.
In larger projects with many teams and products/services involved and where you throw in multiple time zones and languages, the actual execution and coordination of your phase 1 may also fail. Lack of clarity and thus multiple interpretations across engineers, limited understanding of overall direction, too late and too little integration testing, limited availability of lead engineers and product owners who are experienced in cross-site development, and a number of other factors can lead to poor execution. This reduces the confidence in the ability of the organisation to deliver, increases frustration, and increases the possibility of the feeling of failure. The patience and room for doing proper planning and adjustment is reduced, and we go further into a downward spiral.
In sum, these organisational “gearing factors” magnify the impact of distributed system architectural debt. These factors are important in any software or hardware project, not only when you are reducing architectural debt, but the larger the organisation, the more projects, the more magnification you get. You have here the potential of substantial value destruction. But there is also substantial room for value creation if you use architectural debt reduction not only as a way to optimise your architecture, but also as a tool to transition your customers from today’s solutions and services towards your next generation of solutions and services. Or you can just forget about architecture and try to get things done…