When to rescue and when to rebuild: a decision framework

Denis Sheremetov

CTO · October 4, 2022

12 min read

Strategy

About half of our healthcare engagements start as rescues. A vendor went silent. A team shipped a non-compliant MVP and the audit is in 12 weeks. An EHR migration stalled and the executive sponsor is asking when, exactly, this will be done.

Founders ask us the same question every time: do we fix what's here, or burn it down and start over?

Almost always, the answer is neither — at least not how the question is framed. After running 30+ healthcare rescues, we've found that the decision lives in a specific structure. Here's the framework we use.

The 48-hour rule: don't decide before then

The first instinct on a rescue is to make a decision fast. Don't. We've watched teams sign multi-month rebuild contracts on day three of an engagement, only to discover on day fifteen that 80% of the existing code was actually fine — and the real problem was a single bottleneck nobody had isolated yet.

Our rule: no rescue-or-rebuild decision before 48 hours of audit time, regardless of how obvious it looks. The first 48 hours buy you the chance to see what's actually broken vs. what's loud.

The three questions that drive the decision

Every rescue audit answers three questions in this order. They map to three different remediation paths.

Is the architecture salvageable? — If the data model and trust boundaries are wrong, no amount of code cleanup fixes it. This is the only true rebuild trigger we've ever seen.
Is the team that built it still in place? — If yes, rescue is twice as fast (they have the context). If no, factor in 3-4 weeks of code archaeology to get to the same place.
Is the deadline real? — A 'real' deadline (regulator, contract, funding round) means scope must compress to what's achievable. A 'soft' deadline (internal target) means scope can stay broader.

Rescue indicators

Most projects that look unsalvageable are actually rescuable. The signals that point toward rescue:

Bug rates are high, but the bugs cluster — there's a single broken subsystem dragging everything down, not pervasive rot.
Tests exist (even partial coverage). Tests are an architectural signal: they mean the code was at some point shaped to be testable.
Auth and data model are intact. These are the two things that are genuinely hard to refactor; if they're correct, the rest is moveable.
Customer-facing UI is fine. Backend rot is fixable behind a stable UI; UI rot during an active customer relationship is much harder.
The original engineer or two is still reachable. Even one hour of context from the original author cuts rescue time by 30%+.

Rebuild indicators

Genuine rebuild triggers are rare, but specific. When we see them, we recommend rebuild without ambiguity:

Wrong data model at the root. If PHI is co-mingled with non-PHI, or the patient/encounter relationship is inverted, no surface fix works.
Compliance posture is structurally absent. No audit logging, no access control framework, no encryption strategy — this is rebuild territory.
Stack choice fights the requirements. A serverless app that needs long-running ML inference, or a monolith that needs HIPAA tenant isolation — patches won't get there.
Original team is fully gone AND there's no documentation. The code archaeology cost approaches rebuild cost, and you don't get a better outcome.

The hybrid path — what we usually recommend

In practice, ~80% of our rescues end up as a hybrid: keep the existing surface, replace one or two backend subsystems, harden the compliance posture, and ship. We call this the strangler-fig pattern, after Fowler's term — incrementally replacing parts of the system behind a stable interface, until the rewrite is invisible to users.

The hybrid path lets you ship to the deadline (regulator, contract, raise) on the existing surface while replacing the problem subsystems. Customers don't see a rewrite. The team has a real roadmap. And the surface that's actually fine doesn't get re-litigated.

The right question isn't 'rescue or rebuild?'. It's 'what's the smallest fix that lets us ship to the real deadline — and what's the medium-term path that gets us off the bad parts?'

One real example

A provider network came to us 4 months into a stalled portal build. The vendor had gone dark. The audit was 12 weeks out. The founder asked us in the first call: 'Do we burn this down and start over?'

Five-day audit. We found: auth was broken (rebuild), but the data model was correct (keep), the FHIR integration was 60% built and roughly right (rescue), and the audit logging was performative but not real (rebuild).

We rebuilt the auth and the audit logging. We finished the FHIR integration. We left the rest. Shipped in 9 weeks. The 95% drop in support tickets came from the auth fix alone — the rest of the code wasn't actually broken, it was just downstream of broken auth.

What we tell teams in the first call

Don't decide rescue-or-rebuild before the audit. The instinct to decide fast is the instinct to be wrong fast.
Map the deadline to the scope, not the other way around. A real deadline shrinks what's possible — but it also shrinks what's worth arguing about.
If the auth and data model are correct, you almost certainly have a rescue, not a rebuild.
If the team is intact, rescue is twice as fast. Keep them in the room.
The hybrid path is the most common right answer — but only after you've actually looked at the code.

We've seen $400K of rebuild work scoped on a project that turned out to need $80K of targeted rescue. We've also seen $80K of rescue work scoped on a project that genuinely needed a rebuild. The difference is the audit — and being willing to not decide in the first 48 hours.

Strategy

Most of these started as projects. Yours could too.

If something in this article sounds like the project you're scoping, send us the details. We'll come back within 1 business day.

Request the audit Email us