How BabySea built strict request normalization with JSON Schema and TypeScript

Date: May 7, 2026
By: Randy Aries Saputra

Multi-provider AI is not a collection of SDK wrappers. It is translation.

That distinction matters because image and video providers do not agree on names, defaults, formats, safety controls, duration semantics, asset limits, or payload shape. One provider calls a field aspect_ratio; another calls it aspect; another wants width and height. One provider accepts jpg; another returns jpeg. One provider treats moderation as an enabled flag; another treats it as a disabled flag; another wants a tolerance level. One provider exposes audio as a boolean; another makes audio part of a model tier.

If those differences leak into the public API, the product becomes a provider wrapper instead of a control plane.

At BabySea, the customer-facing request shape is kept separate from provider-native payloads. Provider contracts are recorded, refined into one strict public schema, normalized once, and only then converted into provider dialects. That boundary is the pattern we open-sourced as rosetta-bridge: a JSON + TypeScript normalization primitive for teams building one API across many inference providers.

Schema drift is the first reliability bug

Provider failover usually breaks before the network call. It breaks when a request that was valid for the first provider is not actually valid for the fallback provider.

Text

Customer request
  -> Provider A accepts 16:9 + jpg + moderation=false
  -> Provider A fails
  -> Provider B only supports 1:1 + jpeg + a different moderation field
  -> fallback changes semantics or fails after dispatch

The bad version of this system discovers the mismatch after the request has already passed validation. The better version refuses to expose the mismatch in the first place.

Request normalization is not a cleanup step. It is the invariant that makes failover, pricing, persistence, and debugging possible.

The mistake is to treat provider schemas as customer schemas. Provider docs describe what a vendor accepts. A product schema describes what the product promises. Those are related, but they are not the same contract.

Provider docs are raw material, not the public API

BabySea's production schema pipeline starts with a simple rule: the raw provider shape must stay raw. Provider names, defaults, bounds, enum values, and exclusions are recorded from provider documentation without pretending they are already customer-safe.

Then we define the refined customer-facing schema. Internally, that schema uses BabySea generation_* fields, strict validation, declared defaults, capability intersections, and explicit conversion into provider-native payloads. The same discipline appears across the production refine-schema.ts files: provider-specific details are preserved, but customers see one stable contract.

Conceptually, the production path looks like this:

Text

Provider docs / API reference
  -> raw provider schema notes
  -> strict BabySea refine schema
  -> canonical generation input
  -> toProviderInput(...) + provider-specific option mapping
  -> provider-native payload

rosetta-bridge generalizes that boundary with public-neutral terms:

BabySea production concept	Public OSS concept	Preserved invariant
Raw provider schema	Provider contract notes	Provider-native names stay outside customer input.
Strict `generation_*` refine schema	Strict `request_*` field specs	Unknown fields fail before dispatch.
`createModelConverter(...)`	Bridge definition	One product schema owns defaults and supported values.
Shared input fields	`fields.core` + `mapCore(...)`	Product-level fields are visible before provider mapping.
Provider-specific `specificSchema` mapping	`fields.options` + `mapOptions(...)`	Tuning knobs map after core fields.
Structured provider exception	`mapStructured(...)`	Nested payload assembly is explicit, not accidental.
Canonical generation envelope	`normalization-result.v1` envelope	Debug/test output has one comparable shape.

The names are different because the OSS package is meant for the community, not only for BabySea. BabySea can use generation_* internally. A portable primitive should expose neutral request_* fields.

The open-source stack is JSON + TypeScript

The public stack is deliberately small:

JSON Schema Draft 2020-12 versions portable bridge manifests and normalization-result envelopes.
TypeScript declares field specs, validates requests, applies defaults, and runs adapter functions.
Your backend owns authentication, persistence, billing, queues, rate limits, provider SDK calls, webhooks, and routing.

That last line is part of the contract. rosetta-bridge is not a hosted inference gateway, not a provider router, not a billing system, and not BabySea's private model catalog. It is the schema-hardening boundary that should run after your application has authenticated the caller and before your application dispatches to a provider.

Text

Application backend
  │  auth, billing, persistence, queues, routing, provider SDKs live here
  ▼
rosetta-bridge
  │  strict JSON + TypeScript normalization boundary
  ▼
Provider-native payload
  │
  ▼
Application-owned provider client

If a behavior is outside the JSON + TypeScript normalization boundary, it is outside the rosetta-bridge package.

That scoping is what makes the primitive reusable. Different teams will choose different databases, billing providers, queues, SDKs, and routing systems. The normalization invariant is the part they should not have to rediscover.

One public request, many provider dialects

The public request should be boring:

JSON

{
  "request_prompt": "a glass penguin on a bridge",
  "request_aspect_ratio": "16:9",
  "request_output_format": "jpg"
}

Provider payloads are allowed to be weird:

JSON

{
  "prompt": "a glass penguin on a bridge",
  "aspect_ratio": "16:9",
  "format": "jpg"
}

JSON

{
  "text": "a glass penguin on a bridge",
  "aspect": "16:9",
  "output_format": "jpeg"
}

The product promise is not that every provider speaks the same dialect. The product promise is that customers do not need to know which dialect will run their request.

In rosetta-bridge, the public request vocabulary is lowercase snake-case request_*. Common core fields include:

Public field	Why it is core
`request_prompt`	The creative prompt is provider-agnostic.
`request_aspect_ratio`	Customers choose product ratios, not provider pixel formulas.
`request_output_format`	Public formats stay stable while adapters handle provider aliases.
`request_output_count`	Output count affects product behavior and may affect price.
`request_input_assets`	Asset limits must be safe before dispatch.
`request_duration_seconds`	Duration can affect video price and routing.
`request_resolution`	Resolution can affect price, policy, or provider eligibility.
`request_audio`	Audio mode can affect cost and provider payload shape.
`request_provider_order`	The request can carry a public provider-order preference or sentinel.

Option fields are public tuning knobs, such as request_seed, request_negative_prompt, request_enhance_prompt, and request_moderation. They are still public fields, but they map after the core product fields.

Defaults are part of the product contract

Provider defaults are dangerous because they vary by vendor and can change without matching your product policy.

If one provider defaults to permissive moderation and another defaults to strict moderation, the same customer request can produce different behavior depending on failover. If one provider defaults to high resolution and another defaults to low resolution, the same request can change cost. If one provider defaults to audio and another does not, the customer may not understand what they bought.

So the default belongs in the public field spec, not in the adapter.

JSON

{
  "schema_version": "bridge-definition.v1",
  "model_id": "example/media-model",
  "supported_providers": ["provider_a", "provider_b"],
  "core_fields": {
    "request_prompt": {
      "type": "string",
      "required": true,
      "minLength": 1
    },
    "request_aspect_ratio": {
      "type": "enum",
      "values": ["1:1", "16:9"],
      "default": "1:1"
    },
    "request_output_format": {
      "type": "enum",
      "values": ["png", "jpg"],
      "default": "png"
    }
  }
}

That JSON manifest is portable because it contains the public contract, not executable provider functions. Provider adapters live in trusted TypeScript or JavaScript modules because a mapping function is code.

Intersections beat unions

The most tempting schema is the union of every provider capability. It is also the schema most likely to break failover.

Text

Provider A supports: 1:1, 16:9, 9:16, png, jpg
Provider B supports: 1:1, 16:9, png, jpeg

Unsafe public union:        1:1, 16:9, 9:16, png, jpg, jpeg
Safer public intersection:  1:1, 16:9, png, jpg after product-level aliasing

The public schema should expose what every registered provider path can deliver for that bridge. If a provider cannot deliver a capability, the bridge should not accept that capability for a workload that may fall back to that provider.

rosetta-bridge includes helper functions for simple intersections and aspect-ratio ordering, but the invariant is not automatic magic. The adapter author still has to understand provider semantics and choose the product contract deliberately.

A public schema is not a list of everything any provider can do. It is a list of what the product can safely promise.

That is why request_input_assets, request_duration_seconds, request_resolution, and request_audio are core fields. They are not just provider options; they can affect price, policy, routing eligibility, and fallback safety.

Pricing-sensitive dimensions stay visible before dispatch

rosetta-bridge does not bill customers. It still preserves the production invariant that cost-sensitive choices must be visible before provider dispatch.

In BabySea's production schema pipeline, video duration, resolution, generated audio, input files, and output count are not hidden inside provider-specific mappers when they affect execution policy or pricing. They are part of the canonical request shape that the surrounding application can inspect.

The OSS version follows the same rule:

TypeScript

import {
  RosettaBridge,
  enhancePromptToMode,
  moderationToDisableBoolean,
  sharedMediaFields,
} from 'rosetta-bridge';

const bridge = new RosettaBridge({
  schemaVersion: 'bridge-definition.v1',
  modelId: 'example/video-model',
  supportedProviders: ['provider_a', 'provider_b'],
  providerOrder: ['provider_a', 'provider_b'],
  fields: {
    core: sharedMediaFields({
      supportedAspectRatios: ['1:1', '16:9'],
      supportedFormats: ['mp4'],
      durationValues: [5, 10],
      supportedResolutions: ['720p', '1080p'],
      defaultResolution: '720p',
      supportsAudio: true,
      maxInputAssets: 1,
      providerOrders: [
        'fastest',
        'provider_a,provider_b',
        'provider_b,provider_a',
      ],
      defaultProviderOrder: 'fastest',
    }),
    options: {
      request_enhance_prompt: {
        type: 'enum',
        values: ['off', 'standard', 'fast'],
        default: 'off',
      },
      request_moderation: { type: 'boolean', default: false },
      request_seed: { type: 'integer' },
    },
  },
  providers: {
    provider_a: {
      mapCore: (input) => ({
        prompt: input.request_prompt,
        aspect_ratio: input.request_aspect_ratio,
        duration: `${String(input.request_duration_seconds)}s`,
        resolution: input.request_resolution,
        generate_audio: input.request_audio,
      }),
      mapOptions: (input) => ({
        prompt_optimizer: enhancePromptToMode(input),
        disable_safety_checker: moderationToDisableBoolean(input),
        seed: input.request_seed,
      }),
    },
    provider_b: {
      mapCore: (input) => ({
        text: input.request_prompt,
        ratio: input.request_aspect_ratio,
        seconds: input.request_duration_seconds,
        quality: input.request_resolution,
        audio: input.request_audio,
      }),
      mapOptions: (input) => ({
        enhancement_mode: enhancePromptToMode(input),
        safety: input.request_moderation ? 'enabled' : 'permissive',
        seed_value: input.request_seed,
      }),
    },
  },
});

The surrounding application can price, authorize, queue, or route from the canonical core fields before it ever calls a provider SDK. The adapter can still rename or type-convert those fields for each provider, but it does not hide the product decision.

Core mapping and option mapping are separate on purpose

Provider adapters have two normal mapping functions:

Text

provider_payload = {
  ...mapCore(full_normalized_request),
  ...mapOptions(full_normalized_request),
}

mapCore(...) handles product-level fields. mapOptions(...) handles tuning fields. Both receive the full normalized request because provider payloads often combine concepts. For example, a provider detail string may depend on both aspect ratio and quality; a safety setting may depend on a public moderation flag plus a provider-specific convention.

The separation is not about limiting what code can read. It is about making the adapter readable and reviewable.

Text

Core fields
  -> prompt, ratio, format, input assets, duration, resolution, audio

Option fields
  -> seed, negative prompt, enhance prompt, moderation, guidance scale

When a provider requires a nested object or array, the adapter uses mapStructured(...) as an explicit exception:

TypeScript

const providerC = {
  mapStructured: (input) => ({
    content: [
      { type: 'text', text: input.request_prompt },
      {
        type: 'asset',
        url: Array.isArray(input.request_input_assets)
          ? input.request_input_assets[0]
          : undefined,
      },
    ],
  }),
};

That exception is important because nested payload assembly is a different shape of work. It should be visible in review, not buried inside a flat mapper.

Prompt enhancement and moderation are semantic fields

Prompt enhancement and moderation are classic schema-drift traps because providers expose the same product idea through incompatible switches.

One provider may want:

JSON

{ "enhance_prompt": true }

Another may want:

JSON

{ "prompt_optimizer_mode": "standard" }

Another may want:

JSON

{ "prompt_optimizer": { "mode": "fast" } }

The customer should not have to learn those differences. The public schema can expose request_enhance_prompt: "off" | "standard" | "fast", and adapters translate it through helper functions.

Moderation has the same problem. Some providers expose disable_safety_checker; others expose enable_safety_checker; others expose a tolerance field where the most permissive value is the opposite end of the enum. rosetta-bridge keeps the public concept as request_moderation: boolean and gives adapter helpers for the common translations.

A boolean is not always semantically the same boolean. Provider adapters need to encode whether the provider field enables safety, disables safety, or maps into a tolerance scale.

That is the same reason BabySea production keeps prompt enhancement and moderation as canonical public concepts before provider conversion.

Canonical envelopes make debugging possible

Normalization should produce something inspectable. rosetta-bridge can emit a normalization-result.v1 envelope with the schema version, model ID, provider, provider order, canonical input, provider payload, and timestamp.

JSON

{
  "schema_version": "normalization-result.v1",
  "model_id": "example/media-model",
  "provider": "provider_b",
  "provider_order": ["provider_a", "provider_b"],
  "canonical_input": {
    "request_prompt": "a glass penguin on a bridge",
    "request_aspect_ratio": "16:9",
    "request_output_format": "jpg",
    "request_moderation": null
  },
  "provider_payload": {
    "text": "a glass penguin on a bridge",
    "aspect": "16:9",
    "output_format": "jpeg"
  },
  "generated_at": "2026-05-07T00:00:00.000Z"
}

That envelope is useful in tests, CLI smoke checks, docs, and application logs. It also makes omitted option fields explicit when you want complete canonical shapes. A compact object is possible, but complete envelopes make comparison easier across providers and versions.

BabySea's hosted product can persist canonical generation inputs inside its own systems. The OSS primitive does not ship that persistence layer; it emits the normalized shape so your application can decide what to store.

Fail before dispatch

The bridge should run before queues, provider SDKs, and webhooks enter the picture. If a request is malformed, the failure should happen while the application can still return a clean validation error.

rosetta-bridge rejects:

missing required fields
unknown public fields
unsupported enum or number values
invalid defaults in bridge definitions
invalid integer or numeric bounds
unsupported providers
bad URL values in url and url-array fields

URL validation is intentionally HTTP(S)-only at the schema boundary. That is not a full SSRF defense; it is the local type and scheme invariant. A production backend should still enforce trusted origins, redirect handling, private-network protections, file-size limits, content-type checks, and fetch isolation when it dereferences customer-supplied URLs.

Text

normalize(input)
  -> reject unknown request field
  -> reject unsupported enum
  -> reject non-HTTP(S) URL
  -> apply declared defaults
  -> emit canonical input
  -> map to provider payload

Provider dispatch should consume a normalized request, not discover whether a request was valid.

That is the same systems lesson as credit settlement and adaptive routing: put the invariant at the boundary where failure is still cheap.

The `fastest` sentinel is not routing by itself

request_provider_order: "fastest" is a public sentinel. In BabySea production, the surrounding execution control plane can resolve that idea through private routing layers before dispatch. In the OSS primitive, the bridge does not include those routing layers.

Instead, the bridge can expand the sentinel to the configured concrete provider order, or your backend can replace it with its own provider decision before calling toProviderInput(...).

Text

request_provider_order = "fastest"
  -> host application may resolve provider order
  -> bridge validates the resulting provider choice
  -> adapter emits provider-native payload

That keeps the public request vocabulary compatible with smarter host applications without pretending the normalization package is a ranking service.

Why this became OSS

Every multi-provider AI company eventually builds some version of this layer.

They start with one provider. Then they add a fallback. Then they add a second model family. Then they discover that defaults differ, safety flags invert, output formats alias, duration fields change type, input asset limits do not match, nested payloads break flat mappers, and public API fields have accidentally become provider-specific.

By the time the system works, the important part is not the provider SDK wrapper. The important part is the normalization contract.

That is why we open-sourced rosetta-bridge under Apache 2.0. The repository packages the reusable part of BabySea's schema-hardening pattern: JSON Schema contracts, TypeScript field specs, strict unknown-field rejection, declared defaults, provider adapter boundaries, prompt and moderation helper semantics, URL validation, canonical normalization envelopes, CLI smoke paths, and fixture-driven examples.

The pattern should be reusable. The provider catalog, customer data, routing intelligence, billing system, credentials, and operating graph remain outside the primitive.

The boundary is deliberate. We are not publishing BabySea's private provider catalog or hosted execution code. We are publishing the shape of a schema discipline that should not need to be rebuilt from scratch by every team integrating multiple AI providers.

The broader point

As AI infrastructure matures, the winning systems will not be the ones with the most provider wrappers. They will be the ones with the clearest contracts between customer intent, product policy, provider capability, execution routing, and settlement.

rosetta-bridge owns one of those contracts: customer intent becomes a strict canonical request, and that request becomes a provider-native payload only after validation.

That sounds small, but it is the layer that prevents silent schema drift from turning into broken failover, inconsistent pricing, unpredictable moderation, and impossible debugging. JSON Schema gives the public contract a versioned shape. TypeScript gives adapters a safe executable boundary. Together, they turn provider chaos into one request the product can actually promise.