LeadDoing the work

The Rewrite Decision

Whether you rewrite or refactor is a bet you're asking a room to fund — and most of us pitch it like it's a matter of taste.

Serhii Malyshev8 min read · Jul 3, 2026

#SoftwareEngineering #SoftwareArchitecture #Refactoring #TechnicalDebt #EngineeringLeadership

You can swap one brick — or gamble the whole wall.

We open the legacy repo, read four files, and decide it's unsalvageable. We estimate the rewrite at three months and, if we're honest about the ones we've lived through, ship it in eleven. We promise ourselves the new version will be clean — then spend the first quarter reimplementing bug fixes nobody documented, because we didn't know they were load-bearing until production reminded us.

Here's the part we skip: by the time two engineers are arguing rewrite-versus-refactor in a thread, the decision has already stopped being about the code. It's about risk, money, and whether the room believes you can land the thing. Ugly-versus-clean is the argument we have. Bet-we-can-afford is the argument that actually gets funded.

The code is rarely the hard part. The pitch is.

The Ugliness Trap

The most common reason we reach for a rewrite is that the code is ugly. And the most reliable thing about that judgment is that it's wrong more often than we admit.

Reading code is harder than writing it. That asymmetry is the whole trap. When you write code, you have the model in your head and the syntax is just transcription. When you read someone else's — or your own from two years ago — you're reverse-engineering a model that was never written down. Spolsky made this point a quarter century ago and it hasn't aged a day: programmers think old code is a mess mostly because it's hard to read, not because it's actually bad.

The crufty branches are the tell. That weird if nobody can explain, the retry with the oddly specific backoff, the special case for one customer — those aren't sloppiness. They're scar tissue. Each one is a bug someone hit in production and fixed under pressure. A rewrite throws all of it away and quietly signs you up to rediscover every one.

So before "rewrite or refactor" is even a real question, you have to answer a smaller one honestly: is this code bad, or is it just unread?

Refactor Is Not "Leave It Alone"

The other half of the confusion is that most people don't know what refactoring actually is.

Refactoring is a change to the internal structure of software that makes it easier to understand and cheaper to change — without altering its observable behavior. That last clause is the whole discipline. Same inputs, same outputs, same side effects. You're rearranging the room, not moving the house. Do it in small steps and the system stays working the entire time.

"Let's clean it up" is not that. Neither is "let's rewrite this module while we're in here." Both change behavior, usually by accident, usually at 4 p.m. on a Friday.

And then there's the objection that ends most refactor conversations before they start: we can't touch it, there are no tests. This is where I used to give up too. It turns out to be exactly backwards. Legacy code is just code without tests — which means it's un-refactored, not un-refactorable. You find a seam, a place where you can change behavior without editing the code in that place, and you pin the current behavior down with characterization tests: tests that assert what the code actually does today, bugs included, so you'll notice the instant you change it. Now the scary code has a safety net. Now you can move.

No tests isn't a reason to rewrite. It's the first thing a rewrite lets you avoid learning how to do.

The Rewrite Tax

Every rewrite bills you for things nobody puts on the slide.

You pay to relearn the domain — all those edge cases the first team bled for, now waiting to be rediscovered one incident at a time. You pay the feature-parity tax, where "done" doesn't mean better, it means finally does what the old thing already did. And you pay in market position: while you rebuild what you had, whoever you're competing with keeps shipping new things.

Early in my career I thought a rewrite was the bold, senior move — the one that showed you weren't afraid of hard problems. Now I read the same instinct as the expensive way to relearn what the old system already knew.

The bold move is usually the boring one.

And the tax has a nasty compounding clause. A rewrite starts as reduced output today in exchange for capacity tomorrow — a reasonable trade, until "tomorrow" slips and the old system is still in production because you can't turn it off yet. Now you're running two systems, paying two maintenance bills, and holding two mental models in every engineer's head.

That's not a transition. It's a tax with no end date.

None of this means never rewrite. It means the rewrite carries a bill you have to price before you walk into the room — because the room will find it whether you priced it or not.

The Strangler Default

Rewrite-versus-refactor is a false binary. The useful option lives between the two poles, and it should be your default.

You don't replace the system. You surround it. You put a façade in front of the old code, build the replacement piece next to it, and move behavior across one slice at a time — routing each request to old or new until the old paths are dead. Fowler named it the strangler fig, after the plant that grows around a host tree and gradually takes its place. The point isn't the botany. The point is that risk is bounded to the one slice you're currently moving, and every slice ships value on its own.

Make it concrete. Say the login path is the mess everyone wants to burn down. The strangler version: stand up a new auth service, put a proxy in front, and route read checks to the new service first while writes still go through the old path. Flip one endpoint. Watch it for a week. The routing flag stays reversible, so a bad slice rolls back in one deploy instead of one war room. Then flip the next endpoint.

You're rewriting the module — never the program.

The catch is real, so name it: incremental replacement is slower, less satisfying, and it means running two things at once for a while. Treat it like a product, not a chore — derisk the approach, build the tooling to move the easy ninety percent, then actually finish. Strangler migrations are famous for stalling at ninety percent because nobody set a hard date to kill the old system.

Half a migration is worse than none. It's just the two-system tax with better intentions.

When Rewrite Actually Wins

"Never rewrite" is as lazy as "just rewrite it." So here's the honest short list — the cases where a full rewrite is the right call. Notice the burden of proof runs one direction: incremental is the default, and a rewrite has to argue its way up to one of these. If your reason isn't on the list, you don't have a reason yet.

The runtime is actually dead — the language version is unsupported, the framework's abandoned, the vendor's gone. Incremental has a floor, and you've hit it.

You're not rewriting, you're reinventing — building a genuinely different product, not a cleaner version of the same one. The anti-rewrite argument is about throwing away working software to rebuild the same thing; it doesn't apply when the destination is somewhere new.

The slice is small, well understood, and well tested — a four-hundred-line service you own completely, where the rewrite is genuinely the cheap option. Don't let the anti-rewrite reflex turn a two-day job into a two-month migration out of principle.

Notice what these have in common. None of them is "the code is ugly." They're all statements about scope and knowledge, not taste. That's the tell that you're making the decision on the right axis.

Making the Case Without Losing the Room

You can be completely right about the technical call and still lose it. Because the rewrite isn't decided in the codebase — it's decided in the room, and the room doesn't buy purity. It buys bets it can afford.

So pitch it as one. Four moves.

Name the bet. Not "the auth code is a mess." A number and an outcome: "Six weeks to move login behind a new service, so we stop paying two days a sprint fighting it." Budget owners fund outcomes, not adjectives.

Bound the blast radius. Say exactly how much breaks if you're wrong. "Scoped to the login path, behind a flag — everything else is untouched" is a fundable sentence. "We'll rewrite the platform" is a request for blind faith, and skeptical rooms are skeptical because they've extended that faith before.

Buy an exit. Reversibility is the cheapest credibility you can bring. "If a slice misbehaves, we flip the flag back in one deploy" tells the room the downside is capped. A rewrite with no rollback is asking them to bet the quarter on your estimate — and they've seen your estimates.

Set a kill criterion. Name, in advance, what would make you stop. "If error rate on the new path isn't down by week three, we roll back and rethink." Nothing earns trust like showing up already willing to be wrong on a schedule.

That's the same script whether you're pitching a rewrite or talking someone out of one. The engineer who says "here's my bet, here's the blast radius, here's the exit, here's when I'll admit I'm wrong" wins the room. The one who says "trust me, it'll be clean" loses it — and deserves to.

Rewrite or refactor was never really the question. The question is whether you can turn a hunch about ugly code into a bet the room can afford to lose — and if you can't, the answer is refactor.

More in Lead

LeadDoing

ADRs Nobody Reads (And How to Fix That)

Architecture Decision Records that stay alive — what makes docs get read vs quietly rot

6 min · June 24, 2026

LeadDoing

Stop Measuring Technical Debt. Start Pricing It.

Seventeen engineering metrics and zero dollars kills proposals in thirty seconds — here's the carry-cost worksheet that turns one module into a line item.

4 min · June 12, 2026

LeadDoing

What Tech Leads Actually Do (It's Not Architecture)

The role before the title — what changes the day you become the lead and what doesn't

6 min · June 5, 2026

← Back to hub