Moderation is not one field
Date: April 1, 2026
By: Randy Aries Saputra
When people discuss moderation in generative media, they often describe it as if there is a single standard control: turn safety on, turn safety off, maybe adjust one tolerance value.
That is not how real model interfaces work.
In practice, moderation is exposed through a scattered set of fields that vary from model to model. One model may expose disable_safety_checker. Another may expose enable_safety_checker. Another may use safety_tolerance as a number. Another may use the same field as a string enum. Some models add safety_filter_level. Others expose a separate moderation mode. Some combine several of these. Some expose only one.
The result is that moderation is not a clean capability. It is a fragmented interface.
Because the inconsistency is not in documentation, it is in the interface itself.
The real interface is inconsistent
A model can expose moderation through fields like these:
disable_safety_checker: boolean
enable_safety_checker: boolean
safety_tolerance: number
safety_tolerance: string
safety_filter_level: string
moderation: stringThe inconsistency is not superficial. It exists at multiple levels at once.
First, the field names conflict. disable_safety_checker and enable_safety_checker represent opposite polarity. A strict moderation intent may require false in one model and true in another.
Second, the types conflict. safety_tolerance may be numeric in one model and a string enum in another.
Third, the semantics conflict. A boolean switch is not equivalent to a tolerance range, and neither is equivalent to a policy level such as block_low_and_above or a moderation mode such as auto.
Fourth, availability is inconsistent. Some models expose one moderation field, some expose several, some expose combinations that have to be aligned manually.
This means there is no stable moderation surface at the model layer.
Why direct exposure fails
A pass-through design looks simple: expose every raw field and let developers choose what they want.
That design fails immediately in a multi-model system. It pushes all inconsistency upward. The application now has to know which model expects inverted booleans, which one expects a string tolerance, which one wants a numeric range, and which one needs multiple moderation fields coordinated together.
At that point, the API is no longer abstracting anything.
It is just forwarding fragmentation.
A useful abstraction has to preserve intent. The same request should mean the same thing regardless of which model is ultimately selected.
The BabySea abstraction
BabySea defines a single control for moderation intent:
generation_moderation: booleanThe meaning is intentionally simple.
true // moderation ON ➜ strict/safe
false // moderation OFF ➜ permissiveThis is the only moderation input that matters at the BabySea layer. Everything else becomes derived behavior.
That choice is important. The goal is not to expose every moderation knob a model happens to have. The goal is to define one stable contract for the user, then translate it into model-specific configuration behind the scenes.
Mapping opposite boolean semantics
The most obvious inconsistency is polarity.
Some models represent moderation through a disabling field. In that case, strict moderation must invert the value:
export function moderationToDisableBoolean(input: SharedInputSchema): boolean | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
return !val;
}Other models represent moderation through an enabling field. In that case, the mapping is direct:
export function moderationToEnableBoolean(input: SharedInputSchema): boolean | undefined {
return input.generation_moderation as boolean | undefined;
}This looks trivial, but it is exactly the kind of inconsistency that creates subtle bugs when raw fields are exposed directly. A user should not need to know whether a model thinks in terms of enabling safety or disabling it.
Moderation is often multi-field, not single-field
The harder problem is that moderation often cannot be expressed through one value alone. A single strict or permissive intent may need to drive several fields at once.
In BabySea, the same generation_moderation input can map into:
{
disable_safety_checker: moderationToDisableBoolean(input),
safety_tolerance: moderationToSafetyTolerance(input, rawField),
safety_filter_level: moderationToSafetyFilterLevel(input),
moderation: moderationToModerationString(input),
}Or, for another model surface:
{
enable_safety_checker: moderationToEnableBoolean(input),
safety_tolerance: moderationToSafetyToleranceString(input, rawField),
}This is the key point: moderation is not normalized by renaming one field. It is normalized by coordinating several unrelated fields so they all represent the same user intent.
If moderation is strict, all derived values must point in the strict direction. If moderation is permissive, all derived values must point in the permissive direction. That coordination is the actual complexity of moderation.
Tolerance cannot be hardcoded
Tolerance is where many implementations become fragile. It is tempting to hardcode something like "strict means 1, permissive means 6."
That only works until a model uses a different range, or exposes tolerance as an enum instead of a number.
BabySea avoids that by deriving bounds from the schema itself:
function extractSafetyBounds(schema: z.ZodTypeAny): { min: number; max: number } {
let inner: z.ZodTypeAny = schema;
while (inner instanceof z.ZodOptional || inner instanceof z.ZodDefault) {
inner =
inner instanceof z.ZodOptional
? inner.unwrap()
: (inner as z.ZodDefault<z.ZodTypeAny>)._def.innerType;
}
if (inner instanceof z.ZodNumber) {
let min = 1;
let max = 6;
for (const check of (inner._def as { checks: Array<{ kind: string; value: number }> }).checks) {
if (check.kind === 'min') min = check.value;
if (check.kind === 'max') max = check.value;
}
return { min, max };
}
if (inner instanceof z.ZodEnum) {
const nums = (inner._def.values as string[]).map(Number).sort((a, b) => a - b);
return { min: nums[0]!, max: nums[nums.length - 1]! };
}
return { min: 1, max: 6 };
}This matters because it makes moderation schema-driven rather than assumption-driven. The converter does not guess the valid strict and permissive bounds. It reads them from the actual field definition.
From there, the mapping becomes deterministic:
export function moderationToSafetyTolerance(input: SharedInputSchema, rawField: z.ZodTypeAny): number | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
const { min, max } = extractSafetyBounds(rawField);
return val ? min : max;
}And for models that expect the same concept as a string:
export function moderationToSafetyToleranceString(input: SharedInputSchema, rawField: z.ZodTypeAny): string | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
const { min, max } = extractSafetyBounds(rawField);
return val ? String(min) : String(max);
}That is a much stronger design than hardcoding model assumptions into business logic. It lets the schema remain the source of truth.
Semantic moderation fields also need normalization
Not all moderation controls are numeric. Some are semantic levels.
BabySea maps those too:
export function moderationToSafetyFilterLevel(input: SharedInputSchema): string | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
return val ? 'block_low_and_above' : 'block_only_high';
}export function moderationToModerationString(input: SharedInputSchema): string | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
return val ? 'auto' : 'low';
}These strings are model-level implementation details. They should not leak into the public API as if they were universal moderation concepts. They are just one more representation that has to be normalized.
What the public contract becomes
Once all of these mappings are in place, the public moderation surface becomes stable.
The user sees this:
generation_moderation: trueInternally, one model may receive:
{
"disable_safety_checker": false,
"safety_tolerance": 1,
"safety_filter_level": "block_low_and_above",
"moderation": "auto"
}Another may receive:
{
"enable_safety_checker": true,
"safety_tolerance": "1"
}The raw fields differ. The types differ. The polarity differs. But the intent is identical.
That is the value of normalization.
Why this matters
This is not just about moderation. It reflects a broader reality of generative media infrastructure: model interfaces are not stable, not uniform, and not designed around shared abstractions.
If an orchestration layer simply forwards raw fields, it inherits that fragmentation and passes it directly to every customer.
A pass-through API is a transport layer for inconsistency.
If the goal is to provide a real platform contract, the inconsistency has to be absorbed above the model layer.
Moderation is one of the clearest examples because the fragmentation is visible immediately: enable versus disable, number versus string, tolerance versus policy mode. But the same pattern shows up in many other parts of the stack.
The correct response is not to document the inconsistency better. The correct response is to normalize it above the model layer.
Conclusion
Moderation is not a field. It is not a toggle. It is part of execution itself.
If safety is defined only at the interface, it will break under real-world conditions; across providers, across models, and across time.
What has to be built is not validation at the edge, but enforcement throughout the lifecycle.
At BabySea, moderation is not something attached to a request.
It is something the system carries through execution.