Routing is not a fallback
Date: April 16, 2026
By: Randy Aries Saputra
Most systems treat routing as a safety mechanism:
try provider A
if it fails ➜ use provider B
This model breaks under production conditions. Because failure is not an exception. It is part of the system.
A generation request is not a function call
A typical assumption:
input ➜ provider ➜ output
What actually happens:
- request is authenticated and rate-limited
- input is validated against model-specific schemas
- provider order is resolved (user-defined or model default)
- providers are filtered by capability
- execution order is reordered based on health
- credits are reserved before execution
- state is persisted before any provider call
- provider is executed (async, polling, webhook, or hybrid)
- output is reconciled across storage and database
- billing is finalized
- webhook is delivered
This is not a call. It is a stateful execution system.
Routing is decided before execution
In the system, routing is not reactive. It is computed before the first provider runs.
const baseSequence = providerOrder
.split(',')
.filter(p => p in providerLookup)
.map(p => providerLookup[p]);
const { reordered: sequence } =
await reorderByHealth(baseSequence);Two important properties:
- Provider order is model-aware
- Execution order is health-aware
This means:
the "first provider" is not fixed
Routing is already dynamic before execution begins.
Execution is not linear
A naive system assumes:
A ➜ fail ➜ B ➜ fail ➜ C ➜ success
The real system does not work like this.
Execution is interleaved with external signals:
- provider responses
- webhook deliveries
- storage writes
- cancellation acknowledgements
A provider can:
- fail locally
- succeed remotely
- deliver output after timeout
So the system introduces a grace period:
await sleep(GRACE_PERIOD);
const { data: files } = await storage.list(path);
if (files.length > 0) {
// previous provider actually succeeded
}This is critical.
success is not determined by the API response
it is determined by state convergence
Storage becomes the source of truth
Providers are inconsistent. They disagree on:
- status
- timing
- completion signals
So the system does not trust them. Instead, it treats:
storage + database = ground truth
Before failing over, it checks:
- storage for output files
- DB for final status
Only if both confirm failure does it continue. This prevents:
- double execution
- overwriting valid outputs
- inconsistent state transitions
Failover is part of execution, not after it
Before moving to the next provider, the system does:
- attempt cancellation of current prediction
- log cost outcome
- mark failover state in DB
- wait for webhook resolution
const cancelResult = await cancelProviderPrediction(
provider,
predictionId,
model
);And:
await update({
generation_failover_in_progress: true
});That flag is not cosmetic. It prevents webhook handlers from:
- marking the job as failed
- refunding credits prematurely
- firing incorrect events
Failover is coordinated across:
- execution
- storage
- billing
- webhooks
Cost is tightly coupled with routing
Before execution:
const { reserved } = await reserveCredits(...)After execution:
- success ➜
chargeCredits - failure ➜
refundCredits
During execution:
- each provider attempt is logged
- outcomes are resolved:
usedfailedcancelled
Routing directly affects:
- how many providers run
- how long they run
- whether they are canceled in time
routing decisions are financial decisions
Providers are fundamentally different systems
Each provider requires a different execution model (image):
- Black Forest Labs ➜ async job + polling + signed URL retrieval
- BytePlus ➜ synchronous response + manual finalization
- Cloudflare ➜ edge-based synchronous execution + direct response
- Fal ➜ queue submit + polling + webhook-backed completion
- OpenAI ➜ synchronous generation/edit response + manual finalization
- Replicate ➜ async submit + polling + webhook-backed completion
- Runway ➜ async task + polling + file download
Even basic concepts differ:
- what "completed" means
- how output is delivered
- how errors are structured
So routing is not switching endpoints. It is:
mapping multiple execution models into a single lifecycle
The real abstraction is execution
The system exposes:
generation_idgeneration_statusgeneration_provider_ordergeneration_prediction_idgeneration_output_file
Not:
- provider APIs
- provider schemas
- provider lifecycle
The user interacts with:
a single execution state machine
Everything else is internal.
Why fallback logic fails
If routing is implemented as fallback:
- providers execute without coordination
- cancellation cannot be enforced consistently
- cost leaks across retries
- valid results are overwritten by later failures
- duplicate or incorrect events are emitted
- idempotency breaks under real load
Fallback assumes providers behave predictably. They do not.
What this system actually is
Not a proxy. Not a router. It is a stateful execution engine coordinating unreliable providers. Routing is one function inside that system.
Closing
Routing is not:
what happens when something fails
It is:
"how execution is defined under uncertainty"
If a system depends on:
- a primary provider
- a clean failure signal
- a simple retry
it will fail in production. Because the real system is not:
request ➜ response
It is:
multiple inconsistent signals converging into a final state
And routing is the mechanism that makes that convergence possible.