Moderation is not one field
Date: April 1, 2026
By: Randy Aries Saputra
When people discuss moderation in generative media, they often describe it as if there is a single standard control: turn safety on, turn safety off, maybe adjust one tolerance value.
That is not how real model interfaces work.
In practice, moderation is exposed through a scattered set of fields that vary from model to model. One model may expose disable_safety_checker. Another may expose enable_safety_checker. Another may use safety_tolerance as a number. Another may use the same field as a string enum. Some models add safety_filter_level. Others expose a separate moderation mode. Some combine several of these. Some expose only one. The result is that moderation is not a clean capability. It is a fragmented interface.
Because the inconsistency is not in documentation, it is in the interface itself.
The real interface is inconsistent
A model can expose moderation through fields like these:
disable_safety_checker: boolean
enable_safety_checker: boolean
safety_tolerance: number
safety_tolerance: string
safety_filter_level: string
moderation: stringThe inconsistency is not superficial. It exists at multiple levels at once.
First, the field names conflict. disable_safety_checker and enable_safety_checker represent opposite polarity. A strict moderation intent may require false in one model and true in another.
Second, the types conflict. safety_tolerance may be numeric in one model and a string enum in another.
Third, the semantics conflict. A boolean switch is not equivalent to a tolerance range, and neither is equivalent to a policy level such as block_low_and_above or a moderation mode such as auto.
Fourth, availability is inconsistent. Some models expose one moderation field, some expose several, some expose combinations that have to be aligned manually.
This means there is no stable moderation surface at the model layer.
Why direct exposure fails
A pass-through design looks simple: expose every raw field and let developers choose what they want.
That design fails immediately in a multi-model system. It pushes all inconsistency upward. The application now has to know which model expects inverted booleans, which one expects a string tolerance, which one wants a numeric range, and which one needs multiple moderation fields coordinated together.
At that point, the API is no longer abstracting anything. It is just forwarding fragmentation.
A useful abstraction has to preserve intent. The same request should mean the same thing regardless of which model is ultimately selected.
The BabySea abstraction
We collapse all of that into one field:
generation_moderation: booleanThe meaning is intentionally simple.
true // moderation ON → strict / safe
false // moderation OFF → permissiveThis is the only moderation input that matters at the BabySea layer. Everything else becomes derived behavior.
That choice is important. We are not trying to expose every moderation knob a model happens to have. We are defining one stable contract for the user, then translating it into model-specific configuration behind the scenes.
Mapping opposite boolean semantics
The most obvious inconsistency is polarity.
Some models represent moderation through a disabling field. In that case, strict moderation must invert the value:
export function moderationToDisableBoolean(input: SharedInputSchema): boolean | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
return !val;
}Other models represent moderation through an enabling field. In that case, the mapping is direct:
export function moderationToEnableBoolean(input: SharedInputSchema): boolean | undefined {
return input.generation_moderation as boolean | undefined;
}This looks trivial, but it is exactly the sort of inconsistency that creates subtle bugs when raw fields are exposed directly. A user should not need to know whether a model thinks in terms of enabling safety or disabling it.
Moderation is often multi-field, not single-field
The harder problem is that moderation often cannot be expressed through one value alone. A single strict/permissive intent may need to drive several fields at once.
In BabySea, the same generation_moderation input can map into:
{
disable_safety_checker: moderationToDisableBoolean(input),
safety_tolerance: moderationToSafetyTolerance(input, rawField),
safety_filter_level: moderationToSafetyFilterLevel(input),
moderation: moderationToModerationString(input),
}Or, for another model surface:
{
enable_safety_checker: moderationToEnableBoolean(input),
safety_tolerance: moderationToSafetyToleranceString(input, rawField),
}This is the key point: moderation is not normalized by renaming one field. It is normalized by coordinating several unrelated fields so they all represent the same user intent.
If moderation is strict, all derived values must point in the strict direction. If moderation is permissive, all derived values must point in the permissive direction. That coordination is the actual complexity of moderation.
Tolerance cannot be hardcoded
Tolerance is where many implementations become fragile. It is tempting to hardcode something like “strict means 1, permissive means 6.”
That only works until a model uses a different range, or exposes tolerance as an enum instead of a number.
BabySea avoids that by deriving bounds from the schema itself:
function extractSafetyBounds(schema: z.ZodTypeAny): { min: number; max: number } {
let inner: z.ZodTypeAny = schema;
while (inner instanceof z.ZodOptional || inner instanceof z.ZodDefault) {
inner =
inner instanceof z.ZodOptional
? inner.unwrap()
: (inner as z.ZodDefault<z.ZodTypeAny>)._def.innerType;
}
if (inner instanceof z.ZodNumber) {
let min = 1;
let max = 6;
for (const check of (inner._def as { checks: Array<{ kind: string; value: number }> }).checks) {
if (check.kind === 'min') min = check.value;
if (check.kind === 'max') max = check.value;
}
return { min, max };
}
if (inner instanceof z.ZodEnum) {
const nums = (inner._def.values as string[]).map(Number).sort((a, b) => a - b);
return { min: nums[0]!, max: nums[nums.length - 1]! };
}
return { min: 1, max: 6 };
}This matters because it makes moderation schema-driven rather than assumption-driven. The converter does not guess the valid strict and permissive bounds. It reads them from the actual field definition.
From there, the mapping becomes deterministic:
export function moderationToSafetyTolerance(input: SharedInputSchema, rawField: z.ZodTypeAny): number | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
const { min, max } = extractSafetyBounds(rawField);
return val ? min : max;
}And for models that expect the same concept as a string:
export function moderationToSafetyToleranceString(input: SharedInputSchema, rawField: z.ZodTypeAny): string | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
const { min, max } = extractSafetyBounds(rawField);
return val ? String(min) : String(max);
}That is a much stronger design than hardcoding model assumptions into business logic. It lets the schema remain the source of truth.
Semantic moderation fields also need normalization
Not all moderation controls are numeric. Some are semantic levels.
BabySea maps those too:
export function moderationToSafetyFilterLevel(input: SharedInputSchema): string | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
return val ? 'block_low_and_above' : 'block_only_high';
}export function moderationToModerationString(input: SharedInputSchema): string | undefined {
const val = input.generation_moderation as boolean | undefined;
if (val === undefined) return undefined;
return val ? 'auto' : 'low';
}These strings are model-level implementation details. They should not leak into the public API as if they were universal moderation concepts. They are just one more representation that has to be normalized.
What the public contract becomes
Once all of these mappings are in place, the public moderation surface becomes stable.
The user sees this:
generation_moderation: trueInternally, one model may receive:
{
"disable_safety_checker": false,
"safety_tolerance": 1,
"safety_filter_level": "block_low_and_above",
"moderation": "auto"
}Another may receive:
{
"enable_safety_checker": true,
"safety_tolerance": "1"
}The raw fields differ. The types differ. The polarity differs. But the intent is identical.
That is the value of normalization.
Why this matters
This is not just about moderation. It reflects a broader reality of generative media infrastructure: model interfaces are not stable, not uniform, and not designed around shared abstractions.
If an orchestration layer simply forwards raw fields, it inherits that fragmentation and passes it directly to every customer. If it wants to provide a real platform contract, it has to absorb that inconsistency itself.
Moderation is one of the clearest examples because the fragmentation is visible immediately: enable versus disable, number versus string, tolerance versus policy mode. But the same pattern shows up in many other parts of the stack.
The correct response is not to document the inconsistency better. The correct response is to normalize it above the model layer.
Conclusion
Moderation is not one field. It is not one toggle. It is not even one concept at the model interface.
It is a cluster of inconsistent controls that vary in polarity, type, semantics, and availability.
At BabySea, we solve that by defining a single public intent, generation_moderation, and translating it into model-specific configuration through schema-aware converters. Boolean inversion, direct passthrough, numeric tolerance, string tolerance, filter levels, and moderation modes are all derived from the same source of truth.
That is what makes moderation predictable.
Not exposing every raw field.
Standardizing the layer above them.