Refactoring — a long story cut short
“You have a piece of functionality that you need to add to your system. You see two ways to do it, one is quick to do but is messy — you are sure that it will make further changes harder in the future. The other results in a cleaner design, but will take longer to put in place.” Martin Fowler
I doubt there are many coders out there who have not been asked by a customer at least once in their career about the necessity of refactoring or testing. They will also have encountered another question — “Can we do it later?”, where “later” in practice usually means “never”. It is not (that) hard to justify the necessity of test coverage to the customer, yet justification for refactoring seems to be less tangible, because — as the story goes — “if it works and is tested, why touch it?”. Typical arguments against refactoring are higher development cost and delays in delivery. What customers are often not aware of is not only that the arguments may not be true, but also that skipping refactoring may actually be incurring the so-called Technical Debt and thus laying the foundation for costs and delays further down the road. This debt will be paid sooner or later unless the project development stops. There are multiple high-level consequences of Technical Debt that keeps building up in the system, to name just a few:
- new features delivered at a much slower pace
- higher costs of new developers joining the project
- inaccurate estimations resulting in missed deadlines
- vendor lock-in where it becomes impossible to change the software vendor without a thorough system rewrite
The thing is that refactoring can be done with different purposes in mind and with different effects on technical debt. Some of the refactoring types can, in fact, be postponed or even ice-boxed without significant consequences, yet managing technical debt and mitigating its effects plays a big role in keeping our projects healthy. “Paying technical debt” should never appear in your backlog. If it happens it means that it was either a tolerated mess or it was not intended. Even if it was not conscious it should be named after identification.
Refactoring for Quality
“When you decide to take on a technical debt, you had better make sure that your code stays squeaky clean. Keeping the system clean is the only way you will pay down that debt.” Uncle Bob
Refactoring for quality is the only type of refactoring that should not even be mentioned or considered as something that might be a subject for extraction into a separate task. This type of refactoring does not prevent technical debt inherently; it prevents something much worse — a mess. These refactoring procedures/activities include but are not limited to:
- keeping code style consistent
- following established best practices
- keeping security in mind
- keeping code testable
- keeping code readable
- conscious use of libraries and design patterns
Why should we not mention refactoring for quality as something special? Because it needs to be an integral part of our development process. The more complex the code is the more readable it should be and the more time should be spent on planning, research and refactoring. Not only is refactoring the last step in Red-Green-Refactor loop of the TDD cycle but it is also much easier to apply it immediately rather than to be picked up later. Here are a couple of reasons why it is so:
- it is more efficient to refactor when one is in scope of the problem
- if refactoring is not done, it may affect testability and architecture when code is integrated with by other developers
- does not make developers procrastinate / defer the task forever
- refactoring tasks are difficult to be handed over to other developers
- refactoring early saves time spent on code-review and communication
“The decision to make a mess is never rational, is always based on laziness and unprofessionalism, and has no chance of paying off in the future. A mess is always a loss.” Uncle Bob
Hence keeping quality high should not be a matter of making a decision. It should be a matter of just doing it. It adds some overhead when the project starts but pays off many times over when development progresses.
Refactoring for Speed
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.” — Donald Knuth
This is the first type of refactoring that might be considered to be postponed. It might even not be considered to be refactoring because performance tuning sometimes happens at the cost of readability or code cleanliness or might even be regarded as counterproductive.
If the cost of making some performance optimization is low, it falls under the Refactoring for Quality category and should be addressed immediately. For instance, using database capabilities to search through a set of records instead doing it manually outside of database costs almost nothing in development effort while it may impact performance significantly.
In turn, when we identify some important performance problems, resolving those might involve some refactoring tasks (vs. scaling) that need to be scheduled and prioritized separately.
Refactoring for Reusability
Building code in a decomposed way with consciously applied design patterns inherently leads to codebase that is reusable. Sometimes though preparing a given solution in a reusable way requires a significant time investment. This may collide with business goal of delivering the feature fast. Even though such code might knowingly be intended to be reused in the future, such code may be an introduction of conscious and justified technical debt that may be addressed later if necessary.
Refactoring for Extraction
This is a similar situation to refactoring for reusability but it is performed on a larger scale. Extracting libraries, dividing monolithic applications into or augmenting them with micro-services, introducing architecture that relies on plugins will with all likelihood require some significant refactoring of the codebase, even well before the actual extraction happens. As the extraction itself is usually a task of its own, it is reasonable to treat correlated refactoring that renders extraction possible in the same way.
Refactoring for Metrics
Refactoring for metrics is a type of refactoring that should be applied with caution. Depending on the project and technology there might be multiple metrics measuring a variety of aspects of our code. They are often useful and indicate, for instance, missing code coverage, unhealthy code due to issues with its complexity or size, etc. In most cases, such matters can be addressed immediately, but if the time investment needed is significant, such refactoring can be postponed as well.
Refactoring for metrics also has its dark side. Sometimes the only result of such refactoring is a degradation of code quality, i.e. the degradation associated with the decomposition introduced, unnecessary tests, etc. We should always remember that automated code analysis may not capture our intentions behind conscious design decisions. In such cases, we should not only restrain from refactoring such code but also add proper exceptions that will stop those false alarms from harassing us.
Refactoring legacy code
Refactoring legacy code almost always concerns refactoring to raise code quality and reusability. While sometimes considered to be fun, this is the most tedious chore. Sometimes we can treat a chunk of code as legacy even if it was just added to codebase — this applies for example to low-quality code that was not refactored in an ongoing fashion and piled up in pull request that nobody wants to approve. In such a scenario, one can notice that this is where we may lose the most time in comparison to refactoring on the spot. Something that might be written in n hours incorporating refactoring all along the way may take much longer when refactored later on in the process. Still, it needs to be done if compromising code quality is the only alternative.
Refactoring genuine legacy code is a different story — if we take over some existing codebase with questionable quality and outdated libraries, it is something that definitely needs to be managed and usually requires a lot of planning. Refactoring everything in one run or rewriting the whole codebase rarely is an option. Instead, an iterative approach coupled with a focus on the most problematic areas as well as balancing refactoring costs and benefits should guide us in the process. There is usually no quick and simple way here.
Sooner or later we will be forced to provide a hotfix, where delivering a solution apparently has a bigger value than refactoring. In many cases, hotfixes are small changes that will not need any refactoring after all, but sometimes the situation can be different.
It might be a good idea to have some protocol for such circumstances, e.g opening a Pull Request that is geared towards refactoring right after integrating the hotfix with the codebase. People tend to spot long-running Pull Request more easily than some low-priority chores in the project management software. Such Pull Requests — tagged properly — might be mentioned during daily stand-ups until the necessary refactoring is applied and merged. This is just an example of a solution and each team might address this problem in a different way.
Refactoring, when applied on time, can help keep our code healthy and easy to work with. It will also benefit the team velocity — the latter will not degrade due to the accumulation of technical debt. On the one hand, optimizing for speed or reusability, improving our metrics or just handling old legacy code might all be managed separately and intentionally postponed. On the other hand, beside some well-justified exceptions, refactoring aimed at keeping code quality high should be applied as part of a regular, ongoing development process and not treated as some special, optional step that needs to be handled after delivering the feature. After all: “Preventing technical debt is what allows development to be agile in the long run”. Dan Radigan, Senior Agile Evangelist, Atlassian
Błażej has been the architect and leader of the development of most SCM applications we have provided. In doing so, he has found lots of opportunity to apply his expertise in application integration and leveraging.
- Selleo Ruby on Rails Testing Practices
- “If it works, why touch it?” — managing refactoring tasks
- The four indicators of a healthy Ruby On Rails project
- Weekly Developers Digest Vol.3
- How to be a healthy programmer. Some tips to relief / avoid pain
- Feedback system with monetary rewards
- RevYou – codebase peer reviews
- Client satisfaction surveys
- How to grow through 360-degree feedback
- “Key” issues internationalizing your app