Engineers want to be more efficient and teams try to ship faster. While all of this makes sense it often comes with a push for speed that ends up being counterproductive over the long run. The saying slow is smooth and smooth is fast can be very true for software development.
It’s really easy to get in the mode where the only thing that matters is new features. While that provides a short-term speed boost over time it ends up slowing everything down. Most of the time and money that is spent on software is with maintaining it not building it in the first place.
All of this means that the ideal software team and the product they build look much more like a land cruiser than an f1 car. While in theory, the f1 car can go way faster than the land cruiser it needs an extensive support system and near-perfect conditions to actually maintain that speed. The land cruiser is slow but can go anywhere. Unfortunately, the actual practice of shipping software looks much more like the backcountry than the racetrack.
Software teams and products are complex systems. Many feedback loops can derail the best-laid plans. This means that deliberately pushing teams to just develop features can actually cause the system as a whole to lose velocity. This means that to actually increase speed there are two real options limiting focus and decreasing time through the loops that exist in the system.
Decreasing time through loops is the more tempting of these because it seems like it means not giving stuff up. This is true over the long run but in the short term than can be a lot of pain associated with trying to create or improve a process. This usually directly results in a much slower rate of development in the beginning. Even worse if it’s happening in a legacy system error rates can go up for a while until the new process gets worked out and fully implemented. Over the long run, these improvements are usually worth it but it’s not instantaneous.
Narrowing focus on the other hand does offer rapid speed increases. For it to actually happen though it takes hard choices and buy-in from senior leadership. Narrowing the focus of what a team does require empowering them to say no. It also means that priorities can’t shift all the time the most important thing has to stay the most important thing. This is hard for most companies to achieve but if your priorities are set correctly you can really knock it out of the park.
Keep it Simple Stupid
When I set out to write this section I was originally only going to talk about code. Simple code is a large part of maintaining velocity over the long term. It’s not the only thing though. Making operational choices throughout the system to make it easy to maintain is critical. Systems should be built around ease of maintenance instead of the initial speed of development if they are supposed to last.
Clever code and using the latest greatest system allow you to avoid feeling like you are missing out on what the rest of the industry is doing. However, it can easily come at the expense of building reliable systems. For something to exist over the long term, it has to be usable by people who are maintaining it.
Remember the goal is to build stuff that is useful for people. This means while there is value in going fast the much more important question is how far you’ll go instead of your velocity at a given point in time. With that as the goal, things like documentation and automation become much more valuable since over the long run they can make large classes of problems go away.
Building for the long term also increases the value of simple code. Murphy’s law means that things will break when other things in your life are going wrong. We all have bad days and rough patches in our lives and it’s worth making stuff simple enough to understand that we can debug it when we’re there.
Systems as a whole can be complicated and hard to reason about. Usually, the whole system isn’t truly understood and lots of stuff ends up being because the people who operate it know what’s going wrong (source). While complex systems almost always end up in this state spending the time to make as much of the system observable as possible lowers the amount of guessing that takes place.
Making things operationally simple is also important because the only place that really looks like production is production. This means that things are going to break when you go to prod even if you’ve got plenty of tests and a staging environment. This doesn’t mean giving up on testing it does have real value. However, there should also be a focus on what it’s like to actually operate your system in production.
To operate a system well in production requires making it easy to understand what is going on inside a system. Since we can’t see inside a computer that means systems must have instrumentation. Actually implementing instrumentation is a significant time sink. In the google SRE book, they talk about having an SRE per project being devoted to monitoring. This investment is worthwhile even if it slows things down over the short term because it makes it possible to run systems at a significant scale.
Ultimately no matter how automated your system is there will be people in the loop. This means that mistakes are going to happen and system design must account for human operators. Automation is great but acting like the entire system is automated is a cause of long-term problems and doesn’t account for the complexity that exists in most systems.
Moving quickly has real advantages in the marketplace. However, the goal should be to go as far as possible not to go as fast as possible at a given moment in time. Engineer systems to be resilient and to last for the long term.
To do this people need to be centered in any process and systems should be built for their operators. This means making choices that allow people to gain an understanding of complex systems and the ability to monitor and change them.
Software systems can be running for a long time. It’s worth making decisions based on the idea that what you built is actually going to last.