Guardrails (ReactPlanner safety layer)¶
What it is / when to use it¶
PenguiFlow guardrails are a policy enforcement layer that can inspect and constrain:
- user input before the LLM plans (
llm_before) - tool execution (
tool_call_start/tool_call_result) - streamed text output (
llm_stream_chunk)
Use guardrails when you need:
- tool allowlists/denylists enforced outside the prompt,
- secret redaction in streamed output,
- “fail closed” behavior for high-risk actions (STOP / PAUSE),
- a policy pack you can audit and version.
This page focuses on guardrails as used by ReactPlanner. ToolNode also has its own hardening controls; see Tooling.
Non-goals / boundaries¶
- Guardrails are not a replacement for
ToolPolicy/ tool visibility. Use those for primary access control. - Guardrails do not make unsafe tools “safe”. They are a control plane, not a sandbox.
- Guardrails are not a full content moderation product; policy packs are intentionally small and composable.
Contract surface¶
Wiring guardrails into ReactPlanner¶
Guardrails are off by default. Enable them by passing a gateway at construction:
ReactPlanner(..., guardrail_gateway=gateway)- optionally:
guardrail_conversation_history_turns=N(adds memory turns to the guardrail context payload)
When enabled, the planner can enforce these actions:
STOP: terminate (or force-safe finish) with a user-safe messagePAUSE: trigger HITL approval (planner pauses withreason="approval_required")RETRY: re-run the LLM call with corrective instructionsREDACT: redact text from streamed output and tool resultsALLOW: do nothing
Guardrail events and decisions¶
The gateway evaluates a GuardrailEvent and returns a GuardrailDecision.
Key fields:
GuardrailEvent.event_type: e.g.llm_before,tool_call_start,tool_call_result,llm_stream_chunkGuardrailEvent.text_content: user input or streamed text (when applicable)GuardrailEvent.tool_name/tool_args: tool call metadata (when applicable)GuardrailEvent.payload: extra structured context (tool_call_id, action_seq, conversation history, etc.)
Decisions include:
action:ALLOW | REDACT | RETRY | PAUSE | STOPrule_id: stable identifier for policy auditingseverity/confidence- optional action-specific payloads:
redactions: tuple[RedactionSpec, ...]retry: RetrySpec(max_attempts, corrective_message)pause: PauseSpec(scope, approver_roles, prompt, timeout_s)stop: StopSpec(error_code, user_message, internal_reason)
Modes: shadow vs enforce¶
Guardrail gateway config controls enforcement:
GatewayConfig(mode="shadow"): evaluate and log decisions, but do not block executionGatewayConfig(mode="enforce"): apply STOP/PAUSE/RETRY/REDACT behavior
Tip
Start in shadow mode in production to measure rule hit rates, then roll to enforce.
Sync vs async rules (latency control)¶
Rules are classified by cost:
- FAST rules are evaluated synchronously (on the request path)
- DEEP rules can be evaluated asynchronously via a
SteeringGuardInbox
Gateway options you will tune:
sync_timeout_ms(default 15ms)sync_parallel(run sync rules concurrently)sync_fail_open/async_fail_open(availability vs safety tradeoff)
Observation size “guardrail” (reliability safety net)¶
Separately from the guardrail gateway, ReactPlanner applies an observation clamp to prevent context overflow:
ReactPlanner(..., observation_guardrail=ObservationGuardrailConfig(...))- default: enabled with
ObservationGuardrailConfig()
If a tool produces an overly large JSON observation, the planner will:
1) try to store it as an artifact (if an ArtifactStore is available) and return an artifact reference, otherwise
2) truncate it (optionally preserving JSON structure)
This is a reliability guardrail, not a policy decision system.
Operational defaults (recommended)¶
- Use
ToolPolicy/ tool visibility as your primary access control. - Run guardrails in
shadowmode first, then enforce: - enforce tool allowlists (STOP on unauthorized tools)
- enforce secret redaction (REDACT on streamed output/tool results)
- Keep sync rules small and deterministic; push high-latency checks to async rules.
Tradeoff guidance:
sync_fail_open=Falseis safer (a guardrail timeout blocks), but can impact availability.sync_fail_open=Trueimproves availability, but can allow policy bypass on timeouts.
Failure modes & recovery¶
Guardrails “stopping everything”¶
Symptoms
- planner finishes early with a generic safe message, or tool calls return
guardrail_stop:*
Fix
- check which rule triggered (
rule_id) and whether you are inshadoworenforce - verify your allowlist/denylist config (especially when catalogs are tenant-scoped)
Redaction doesn’t apply to streamed output¶
Likely causes
stream_final_response=False(no LLM streaming path)- guardrails aren’t enabled (no
guardrail_gateway) - gateway is in
shadowmode (decision logged but not applied)
Observation clamp triggers unexpectedly¶
Symptoms
- tool observations become
{artifact: ..., preview: ...}or truncated stubs
Fix
- ensure ToolNode artifact extraction is configured correctly (large outputs should be artifacts)
- increase
ObservationGuardrailConfig.max_observation_charsonly if you have token budget headroom
Observability¶
Recommended signals:
- log guardrail decisions (action, rule_id, severity, confidence) without raw content
- record
PlannerEvent(event_type="guardrail_retry")counts (retries can hide a failing model/prompt) - alert on repeated
guardrail_stop:*outcomes for a tenant (misconfig or abuse)
Useful places to look:
PlannerEvent(event_type="tool_call_result").extra["result_json"]may include afailure.guardrailpayload on STOP/PAUSE.- streamed output seen in
PlannerEvent(event_type="llm_stream_chunk")is already post-redaction (when enforced).
Security / multi-tenancy notes¶
- Guardrails should not rely on “prompt compliance”. Enforce the policy in code.
- Prefer per-tenant configuration (tool visibility + allowlists) instead of global rules.
- Never log raw prompts/tool results when debugging guardrails; log decision metadata plus stable correlation ids.
Runnable examples¶
Guardrails examples exist under examples/guardrails/:
uv run python examples/guardrails/huggingface/flow.pyuv run python examples/guardrails/scope_classifier/flow.py
Troubleshooting checklist¶
- Is
ReactPlanner(..., guardrail_gateway=...)set (guardrails are disabled otherwise)? - Is
GatewayConfig.modeset toenforcewhen you expect blocking/redaction? - Do your rules list the correct
supports_event_typesfor the events you care about? - Are timeouts too aggressive (
sync_timeout_ms) causing unexpected STOP (fail-closed) behavior? - Are you confusing policy guardrails with the observation clamp (
ObservationGuardrailConfig)?