Routing is not a fallback

Date: April 16, 2026
By: Randy Aries Saputra


Most systems treat routing as a safety mechanism:

try provider A
if it fails ➜ use provider B

This model breaks under production conditions. Because failure is not an exception. It is part of the system.

A generation request is not a function call

A typical assumption:

input ➜ provider ➜ output

What actually happens:

  • request is authenticated and rate-limited
  • input is validated against model-specific schemas
  • provider order is resolved (user-defined or model default)
  • providers are filtered by capability
  • execution order is reordered based on health
  • credits are reserved before execution
  • state is persisted before any provider call
  • provider is executed (async, polling, webhook, or hybrid)
  • output is reconciled across storage and database
  • billing is finalized
  • webhook is delivered

This is not a call. It is a stateful execution system.

Routing is decided before execution

In the system, routing is not reactive. It is computed before the first provider runs.

TypeScript
const baseSequence = providerOrder
  .split(',')
  .filter(p => p in providerLookup)
  .map(p => providerLookup[p]);

const { reordered: sequence } =
  await reorderByHealth(baseSequence);

Two important properties:

  1. Provider order is model-aware
  2. Execution order is health-aware

This means:

the "first provider" is not fixed

Routing is already dynamic before execution begins.

Execution is not linear

A naive system assumes:

A ➜ fail ➜ B ➜ fail ➜ C ➜ success

The real system does not work like this.

Execution is interleaved with external signals:

  • provider responses
  • webhook deliveries
  • storage writes
  • cancellation acknowledgements

A provider can:

  • fail locally
  • succeed remotely
  • deliver output after timeout

So the system introduces a grace period:

TypeScript
await sleep(GRACE_PERIOD);

const { data: files } = await storage.list(path);

if (files.length > 0) {
  // previous provider actually succeeded
}

This is critical.

success is not determined by the API response
it is determined by state convergence

Storage becomes the source of truth

Providers are inconsistent. They disagree on:

  • status
  • timing
  • completion signals

So the system does not trust them. Instead, it treats:

storage + database = ground truth

Before failing over, it checks:

  • storage for output files
  • DB for final status

Only if both confirm failure does it continue. This prevents:

  • double execution
  • overwriting valid outputs
  • inconsistent state transitions

Failover is part of execution, not after it

Before moving to the next provider, the system does:

  • attempt cancellation of current prediction
  • log cost outcome
  • mark failover state in DB
  • wait for webhook resolution
TypeScript
const cancelResult = await cancelProviderPrediction(
  provider,
  predictionId,
  model
);

And:

TypeScript
await update({
  generation_failover_in_progress: true
});

That flag is not cosmetic. It prevents webhook handlers from:

  • marking the job as failed
  • refunding credits prematurely
  • firing incorrect events

Failover is coordinated across:

  • execution
  • storage
  • billing
  • webhooks

Cost is tightly coupled with routing

Before execution:

TypeScript
const { reserved } = await reserveCredits(...)

After execution:

  • success ➜ chargeCredits
  • failure ➜ refundCredits

During execution:

  • each provider attempt is logged
  • outcomes are resolved:
    • used
    • failed
    • cancelled

Routing directly affects:

  • how many providers run
  • how long they run
  • whether they are canceled in time

routing decisions are financial decisions

Providers are fundamentally different systems

Each provider requires a different execution model (image):

  • Black Forest Labs ➜ async job + polling + signed URL retrieval
  • BytePlus ➜ synchronous response + manual finalization
  • Cloudflare ➜ edge-based synchronous execution + direct response
  • Fal ➜ queue submit + polling + webhook-backed completion
  • OpenAI ➜ synchronous generation/edit response + manual finalization
  • Replicate ➜ async submit + polling + webhook-backed completion
  • Runway ➜ async task + polling + file download

Even basic concepts differ:

  • what "completed" means
  • how output is delivered
  • how errors are structured

So routing is not switching endpoints. It is:

mapping multiple execution models into a single lifecycle

The real abstraction is execution

The system exposes:

  • generation_id
  • generation_status
  • generation_provider_order
  • generation_prediction_id
  • generation_output_file

Not:

  • provider APIs
  • provider schemas
  • provider lifecycle

The user interacts with:

a single execution state machine

Everything else is internal.

Why fallback logic fails

If routing is implemented as fallback:

  • providers execute without coordination
  • cancellation cannot be enforced consistently
  • cost leaks across retries
  • valid results are overwritten by later failures
  • duplicate or incorrect events are emitted
  • idempotency breaks under real load

Fallback assumes providers behave predictably. They do not.

What this system actually is

Not a proxy. Not a router. It is a stateful execution engine coordinating unreliable providers. Routing is one function inside that system.

Closing

Routing is not:

what happens when something fails

It is:

"how execution is defined under uncertainty"

If a system depends on:

  • a primary provider
  • a clean failure signal
  • a simple retry

it will fail in production. Because the real system is not:

request ➜ response

It is:

multiple inconsistent signals converging into a final state

And routing is the mechanism that makes that convergence possible.