How BabySea uses Cloudflare to secure and control execution at scale

Date: April 19, 2026
By: Randy Aries Saputra

BabySea is execution infrastructure for generative media. That means we don't just run requests. We control how they execute across inference providers, with routing, failover, and lifecycle management.

As workloads scale, the problem is not just routing requests across providers. It's ensuring invalid, abusive, or malformed requests never reach execution, where every mistake has a real cost.

We use Cloudflare as the edge control layer to enforce correctness, security, and observability before requests reach our system.

This post breaks down how that works in production.

Architecture: edge as the first execution boundary

BabySea runs across three regions:

api.us.babysea.ai
api.eu.babysea.ai
api.jp.babysea.ai

Each region exposes the same API surface, backed by different infrastructure.

Cloudflare sits in front of all entry points and acts as a:

pre-execution control plane

Before a request reaches our application, it must pass:

request validation ➜ protocol enforcement
WAF ➜ execution guardrail
rate limiting ➜ pre-execution control
managed rules ➜ threat protection
API Shield ➜ contract enforcement
session intelligence ➜ beyond IP

These controls operate as independent layers, each enforcing a different constraint before execution begins.

What we are preventing is not theoretical:

malformed requests reaching execution and wasting compute
repeated invalid calls exhausting system capacity
leaked API keys being used across distributed clients
schema drift between providers causing runtime failures

Each layer exists to eliminate a specific failure mode before it reaches the control plane.

1. Request validation as protocol enforcement

Before applying any higher-level rules, we enforce basic protocol correctness:

block non-HTTPS API calls
block unsupported HTTP methods
enforce JSON-only POST bodies

This ensures:

only structurally valid requests are processed further

Without this layer, invalid requests would propagate deeper into the system and fail after resources have already been allocated.

2. WAF as execution guardrail

We don't treat WAF as a security add-on. We treat it as:

execution constraint enforcement

We explicitly reject malformed or suspicious patterns:

path traversal attempts
injection patterns
malformed headers

This ensures:

only valid execution paths reach the system

Without this layer, malformed requests would propagate into execution, where failures become expensive instead of cheap.

Abuse and scanning protection

We aggressively block automated scanning:

sqlmap, nuclei, nmap, ffuf, burpsuite
high-threat browser traffic is challenged

This removes:

background noise before it becomes load

Edge hardening

We added rules for:

oversized Authorization headers
duplicate Transfer-Encoding headers (request smuggling)

Even if upstream systems can handle these, we block them at the edge:

fail fast, before origin cost is incurred

3. Rate limiting as pre-execution control

Rate limiting is not just protection. It shapes system behavior.

We apply multiple layers:

Global API limit

x requests per minute per IP (pre-auth)

Auth protection

x failed auth attempts ➜ temporary block

Endpoint-specific limits

playground ➜ burst control
webhook ingestion ➜ flood protection
cron endpoints ➜ strict limits

These rules ensure:

untrusted clients cannot dominate execution capacity

Without rate control at the edge, invalid or abusive traffic would compete directly with legitimate execution workloads.

4. Managed rules as threat protection

We leverage Cloudflare managed protections:

OWASP rule sets
Cloudflare threat intelligence
automated bot detection

This layer handles:

known attack patterns and global threat signals

It allows us to absorb common threats without pushing that complexity into the application layer.

5. API Shield: enforcing contract at the edge

The most important layer is API Shield.

We define our API surface using OpenAPI:

15 endpoints per region
deployed across 3 regions
45 operations total

Cloudflare validates incoming requests against this schema.

What this gives us

invalid parameters ➜ detected immediately
unknown endpoints ➜ logged as anomalies
malformed requests ➜ visible before the application layer

Without schema validation at the edge, invalid requests would only fail inside the application layer, after resources have already been consumed.

6. Session-based intelligence (beyond IP)

Traditional rate limiting is IP-based. That breaks in real systems:

shared corporate NAT
distributed attackers
leaked API keys

We track sessions using:

Authorization header as identity

This enables:

per-key behavioral profiling
anomaly detection
request sequence tracking

Result:

volumetric abuse detection
enumeration detection
per-customer visibility

All enforced at the edge.

IP-based control alone cannot reliably identify abuse in distributed systems.

Session-level tracking allows us to enforce behavior constraints per identity, not per network.

Layered execution control model

Client Request
  ↓
Cloudflare Edge
  ├─ Request Validation (Protocol Enforcement)
  ├─ WAF (Execution Guardrail)
  ├─ Rate Limiting (Pre-Execution Control)
  ├─ Managed Rules (Threat Protection)
  ├─ API Shield (Contract Enforcement)
  ├─ Session Intelligence (Beyond IP)
  ↓
BabySea Control Plane
  ├─ Access Control
  ├─ Credit Lifecycle
  ├─ Protocol Translation
  ├─ Policy Routing
  ├─ Failover Orchestration
  ├─ Failure Handling
  ├─ Artifact Pipeline
  ├─ Event System
  ↓
Execution Layer

The control plane defines how execution should happen. The execution layer carries it out.

Each layer removes a different class of risk before it propagates.

no single failure allows invalid or abusive execution to reach the system

Execution is stateful, not just request-based

Once a request passes edge validation, it enters a lifecycle that must remain consistent across providers, retries, and failures.

We treat execution as a controlled state machine:

request accepted ➜ generation created
credits reserved before execution
provider execution begins
result confirmed inline or via webhook
credits finalized as charge or refund

This prevents critical failure classes:

double execution across providers during failover
double charge when multiple providers complete
orphaned jobs when upstream systems fail
inconsistent state between storage, database, and providers

Execution is not just calling a provider.

it is maintaining correctness across asynchronous, multi-provider systems

A key part of that lifecycle is ordering. We create the generation record before reserving credits, so billing and state transitions are attached to a single durable generation ID from the start.

TypeScript

const { data: record } = await adminClient
  .from('file_assets')
  .insert({
    account_id: accountId,
    generation_data: {
      generation_provider_order: toProviderArray(providerOrder),
      generation_status: 'pending',
    },
  })
  .select('id, account_id, generation_id')
  .single();

const { reserved } = await reserveCredits(
  accountId,
  model,
  record.generation_id,
  undefined,
  generationResolution,
);

That ordering matters. It avoids reserve-then-cleanup ambiguity and makes every economic event traceable to the same lifecycle record.

Billing correctness is enforced as an invariant

In a multi-provider execution system, billing correctness cannot depend on good luck or a single success path.

We enforce a simple invariant:

one generation can reserve once, charge once, and refund once

That rule is backed by the database, not just application logic.

Sql

CREATE UNIQUE INDEX IF NOT EXISTS idx_credit_ledger_charge_idempotent
  ON public.credit_ledger (generation_id) WHERE type = 'charge';

CREATE UNIQUE INDEX IF NOT EXISTS idx_credit_ledger_refund_idempotent
  ON public.credit_ledger (generation_id) WHERE type = 'refund';

Credits are also reserved atomically. That prevents concurrent requests from both passing a balance check and spending the same balance twice.

Sql

UPDATE public.credits
SET tokens = tokens - p_tokens
WHERE account_id = p_account_id
  AND tokens >= p_tokens;

This is what turns billing into a system property instead of a best effort.

Failover is controlled, not optimistic

Failover is usually presented as a simple fallback story. In practice, it is a consistency problem.

A provider can fail late. A webhook can arrive after the next provider has already been considered. Storage may already contain a completed output even when the local request path thinks it failed.

We account for that explicitly.

First, provider order is reordered by health so degraded providers are moved back without changing the public execution contract.

TypeScript

const { reordered: sequence } = await reorderByHealth(baseSequence);
providerOrder = sequence.map((s) => s.provider).join(', ');

Then, after a failed attempt, we check storage and database state before spending money on the next provider.

TypeScript

const { data: storageFiles } = await adminClient.storage
  .from('file')
  .list(`${record.account_id}/${record.generation_id}`);

if (storageFiles && storageFiles.length > 0) {
  result = {
    provider: step.provider,
    predictionId: 'webhook-delivered',
    status: 'succeeded',
    providerModelId: step.providerModelId,
  };
  break;
}

This is the difference between "trying another provider" and controlling a distributed execution lifecycle.

Provider abstraction is a contract, not a passthrough

A multi-provider system degrades quickly if provider-specific fields leak into the public API.

We keep the public contract unified and intentionally normalize provider differences into BabySea's own schema model.

That includes rules like:

ratio-based sizing instead of exposing raw width and height intersection of supported formats across active providers pricing dimensions modeled as core execution fields, not provider-specific knobs

A simplified example:

TypeScript

// width: z.number().int().min(256).max(1440).optional(),
// excluded - uses ratio-based sizing, not pixel dimensions

And at the schema level, core execution fields are ordered and enforced consistently across models:

TypeScript

generation_prompt
generation_ratio
generation_output_format
generation_output_number
generation_input_file
generation_duration
generation_resolution
generation_generate_audio
generation_provider_order

This is what lets multiple providers behave like one execution system instead of many inconsistent APIs.

Edge and tenant boundaries matter as much as model execution

Execution correctness is not only about providers. It is also about where requests are allowed to exist.

Our middleware enforces API-only domains, separates marketing and dashboard surfaces, and strips headers that should never influence account identity.

TypeScript

if (!pathname.startsWith('/v1')) {
  return new NextResponse(
    JSON.stringify({
      status: 'error',
      error: {
        code: 'BSE1001',
        type: 'not_found',
        message: 'Restricted access',
      },
    }),
    { status: 404, headers: { 'Content-Type': 'application/json' } },
  );
}

const requestHeaders = new Headers(request.headers);
requestHeaders.delete('x-account-id');

return NextResponse.next({
  request: { headers: requestHeaders },
});

That matters because execution systems are also multi-tenant systems. The edge has to enforce those trust boundaries before requests reach the control plane.

Execution must be protected before it begins

In generative systems, execution is the most expensive part of the pipeline.

Every invalid request that reaches execution is not just an error. It is wasted compute, wasted time, and unnecessary cost.

Our design ensures that:

invalid requests are rejected early
abusive behavior is constrained before execution
only valid, well-formed requests reach the control plane

This is why edge enforcement is not optional. It is part of execution itself.

Why this matters for generative media

Generative workloads are fundamentally different:

long-running execution
high cost per request
unpredictable input
multi-provider dependency

Without strict edge control:

invalid requests waste compute
abuse becomes expensive quickly
provider failures cascade

Cloudflare allows us to enforce: correctness before execution.

Closing

At BabySea, execution infrastructure is not just routing and failover.

It is:

controlling what enters the system
validating it before execution
observing behavior at scale
enforcing constraints at the edge

Cloudflare is not just a security layer for us.

It is where execution constraints begin.

By the time a request reaches our system, it has already been validated, constrained, and shaped.

Execution is not just what happens inside the system.

it starts at the edge