Routing is not a fallback

Date: April 16, 2026
By: Randy Aries Saputra

Most systems treat routing as a safety mechanism:

try provider A
if it fails ➜ use provider B

This model breaks under production conditions. Because failure is not an exception. It is part of the system.

A generation request is not a function call

A typical assumption:

input ➜ provider ➜ output

What actually happens:

request is authenticated and rate-limited
input is validated against model-specific schemas
provider order is resolved (user-defined or model default)
providers are filtered by capability
execution order is reordered based on health
credits are reserved before execution
state is persisted before any provider call
provider is executed (async, polling, webhook, or hybrid)
output is reconciled across storage and database
billing is finalized
webhook is delivered

This is not a call. It is a stateful execution system.

Routing is decided before execution

In the system, routing is not reactive. It is computed before the first provider runs.

TypeScript

const baseSequence = providerOrder
  .split(',')
  .filter(p => p in providerLookup)
  .map(p => providerLookup[p]);

const { reordered: sequence } =
  await reorderByHealth(baseSequence);

Two important properties:

Provider order is model-aware
Execution order is health-aware

This means:

the "first provider" is not fixed

Routing is already dynamic before execution begins.

Execution is not linear

A naive system assumes:

A ➜ fail ➜ B ➜ fail ➜ C ➜ success

The real system does not work like this.

Execution is interleaved with external signals:

provider responses
webhook deliveries
storage writes
cancellation acknowledgements

A provider can:

fail locally
succeed remotely
deliver output after timeout

So the system introduces a grace period:

TypeScript

await sleep(GRACE_PERIOD);

const { data: files } = await storage.list(path);

if (files.length > 0) {
  // previous provider actually succeeded
}

This is critical.

success is not determined by the API response
it is determined by state convergence

Storage becomes the source of truth

Providers are inconsistent. They disagree on:

status
timing
completion signals

So the system does not trust them. Instead, it treats:

storage + database = ground truth

Before failing over, it checks:

storage for output files
DB for final status

Only if both confirm failure does it continue. This prevents:

double execution
overwriting valid outputs
inconsistent state transitions

Failover is part of execution, not after it

Before moving to the next provider, the system does:

attempt cancellation of current prediction
log cost outcome
mark failover state in DB
wait for webhook resolution

TypeScript

const cancelResult = await cancelProviderPrediction(
  provider,
  predictionId,
  model
);

And:

TypeScript

await update({
  generation_failover_in_progress: true
});

That flag is not cosmetic. It prevents webhook handlers from:

marking the job as failed
refunding credits prematurely
firing incorrect events

Failover is coordinated across:

execution
storage
billing
webhooks

Cost is tightly coupled with routing

Before execution:

TypeScript

const { reserved } = await reserveCredits(...)

After execution:

success ➜ chargeCredits
failure ➜ refundCredits

During execution:

each provider attempt is logged
outcomes are resolved:
- used
- failed
- cancelled

Routing directly affects:

how many providers run
how long they run
whether they are canceled in time

routing decisions are financial decisions

Providers are fundamentally different systems

Each provider requires a different execution model (image):

Black Forest Labs ➜ async job + polling + signed URL retrieval
BytePlus ➜ synchronous response + manual finalization
Cloudflare ➜ edge-based synchronous execution + direct response
Fal ➜ queue submit + polling + webhook-backed completion
OpenAI ➜ synchronous generation/edit response + manual finalization
Replicate ➜ async submit + polling + webhook-backed completion
Runway ➜ async task + polling + file download

Even basic concepts differ:

what "completed" means
how output is delivered
how errors are structured

So routing is not switching endpoints. It is:

mapping multiple execution models into a single lifecycle

The real abstraction is execution

The system exposes:

generation_id
generation_status
generation_provider_order
generation_prediction_id
generation_output_file

Not:

provider APIs
provider schemas
provider lifecycle

The user interacts with:

a single execution state machine

Everything else is internal.

Why fallback logic fails

If routing is implemented as fallback:

providers execute without coordination
cancellation cannot be enforced consistently
cost leaks across retries
valid results are overwritten by later failures
duplicate or incorrect events are emitted
idempotency breaks under real load

Fallback assumes providers behave predictably. They do not.

What this system actually is

Not a proxy. Not a router. It is a stateful execution engine coordinating unreliable providers. Routing is one function inside that system.

Closing

Routing is not:

what happens when something fails

It is:

"how execution is defined under uncertainty"

If a system depends on:

a primary provider
a clean failure signal
a simple retry

it will fail in production. Because the real system is not:

request ➜ response

It is:

multiple inconsistent signals converging into a final state

And routing is the mechanism that makes that convergence possible.