Maintenance can mean a lot of different things as an engineer. Reducing technical debt, future proofing code through testing and quality design patterns, and the importance of keeping libraries and tools up to date, certainly, but also something more fundamental.
According to The Maintainers, “a global research network interested in the concepts of maintenance, infrastructure, repair, and the myriad forms of labor and expertise that sustain our human-built world,” maintenance, at its core, is about keeping society and its processes going. What that means for us as software engineers, is, of course, less downtime. But also that we are building tools that are useful, a pleasure to use, and not liable to fail when we need them the most.
To value maintenance is to value “the mundane work that keeps society going” over “fanciful ideas—which we call innovation-speak—that are unsupported by evidence about how technology works, about the role new things play in society, and about how humans will benefit.” In other words, maintenance as an engineering philosophy means building things that are useful and dependable. But enough abstractions! Here is what valuing maintenance looks like in practice.
Adoption of DevOps Practices
The widespread adoption of DevOps practices is a promising development for software engineers interested in building more dependable systems. By combining the practices of developing software with maintaining its infrastructure, DevOps helps us better maintain software for two reasons:
- DevOps encourages understanding at all levels of the software development and deployment lifecycle. Without some insight into how the pieces of the system work together, planning for maintenance becomes extremely difficult. This doesn’t mean that everyone has to be an expert on all aspects of the system, but seeing how they fit next to each other is key for planning to prevent problems and developing a proactive attitude towards maintenance.
- These insights are enabled by simpler interfaces that are easy to understand and connect. In software terms, DevOps has built software with better abstractions and interchangeable interfaces.
DevOps culture isn’t without its risks to maintenance. For example, automated tooling can make it easier to take shortcuts. Companies can be tempted to over-burden engineers with “vertically integrated” knowledge, relying on fewer engineers to do the job because new tools free up their time. This would be an example of a failure to apply the maintenance lens to team processes. But I’ll get to that later, for now let’s take a quick tour through the development to deployment lifecycle and look at things through a maintenance mindset.
Be Consistent with Code
Since developers have been writing software, we’ve been searching for ways to make sure that our code is easier to read, easier to reuse, and less prone to bugs. Writing good tests, linting, stronger design patterns, dividing between monolithic apps versus microservices, these are just a few of the ways in which programmers try to simplify and standardize how we build useful and pleasing applications. By doing so, we hopefully reduce the likelihood of bugs, surface them faster, and allow them to be fixed more quickly.
With so many solutions to writing maintainable code, perhaps the most important thing to maintain is consistency. If we can’t agree on the patterns we’re using, or teach them properly to each other, it becomes more likely that we generate a mess of code that is harder to maintain. Often code problems are a manifestation of a process problem, a team failing to live up to or set its own standards.
Then there is the code outside of our own, that inevitably forms a large part of the project. When we pick third-party libraries, too often libraries are chosen for their ability to rapidly prototype a set of features. A maintenance mindset demands that we question the inclusion of these libraries and evaluate the risks to maintenance that they might pose. Are they frequently updated? Are we locked-in to certain development patterns that will limit us? Can we easily substitute another library should this one become obsolete? And what about the underlying dependencies of this library, are they maintainable? With modern web architectures depending on a confluence of open source libraries, deciding if and when to use a third party library constitutes one of the most important decisions a team can make.
Stress Testing Tools for Development and Deployment
Too often, when we think about tooling in a professional software context, we prioritize only productivity, i.e. what will enable our team to get this done the fastest? And indeed, many tools are written to speed up development time, to automate away boring tasks. However, when we think about the right tools for building scalable software, we must balance immediate expediency and keeping things online long term. Luckily a lot of the tooling today was built with maintenance in mind, even if it’s advertised to us as a faster way to get things done. This table provides some common tooling and what each tool brings to maintenance.
Tool type |
Maintenance view |
Version control |
Provides historical insight into how code developed, allowing us to find where things went wrong, or easily rollback to a historically functional version of a code base |
CI/CD |
Gives us confidence that we are releasing software without breaking changes, speeds up the ability to rollback a deployment |
Logging and Metrics |
Enables automatic alerting when things are not performing as expected; provides details that let us debug more quickly |
Containerization and orchestration |
Consistency between deployments, ability to switch deployment environments, simplified scaling, potential carbon savings |
Code-as-infrastructure |
Interoperability between cloud platforms, explicit declaration of relationships between infrastructure |
When we bring in tools to help in the development and deployment processes, it's important to think about what they bring to maintenance and to configure them to be useful in those situations. If we’re only learning how to use our tools in the best case development-deployment scenario, we’re probably not using them as the creators intended.
Balancing Data Collection & Long-Term Sustainability
As we develop and deploy, one thing we are certain to generate and depend on is data. The proliferation of complex ETL pipelines and the quantity of headache-inducing integration work shows us that maintenance of data often leaves a lot to be desired. With the rise of NoSQL databases, cheap cloud storage, and ubiquitousness of formats like JSON, it’s tempting to gather as much data as possible and punt structuring it to later, but that would be a mistake. This is not to say that these tools don’t have a place (quite the contrary!), but they should be chosen for qualities that allow us to sustain and maintain infrastructure, not what’s easiest for rolling out a new feature. Whenever we are generating data it’s important to think about whether or not it should be normalized, how standards will be enforced, and of course what access to it will look like (who needs it? how often? how much of it? how quickly do they need it?).
Similarly, maintenance demands preparation for the worst case scenarios. Automated backups are a given, of course. But things happen, and if a data center floods or catches on fire, what then? If there is a data breach, how will it be handled? The maintenance mind treats contingency planning as an active part of all development processes, not an afterthought.