Native LLM layer (planner integration)¶
What it is / when to use it¶
PenguiFlow includes a native LLM implementation (penguiflow.llm) and a compatibility adapter (NativeLLMAdapter) that implements the planner’s JSONLLMClient protocol.
Use the native LLM layer when you want:
- provider-specific correctness fixes (message normalization, schema normalization),
- structured output fallback behavior (e.g. schema mode downgrade when a provider rejects a schema),
- optional native reasoning extraction and reasoning streaming callbacks,
- a single internal implementation surface you can instrument and test.
Non-goals / boundaries¶
- This page does not document every provider’s environment variables or auth story.
- The native layer is not required;
ReactPlanner(llm="...")(LiteLLM-backed) remains supported. - The planner still consumes a
JSONLLMClientprotocol; native integration is an implementation detail behind that protocol.
Contract surface¶
Enabling native LLM on ReactPlanner¶
Set:
ReactPlanner(..., use_native_llm=True)
Then pass:
llm="provider/model"orllm={"model": "...", ...}
When enabled, the planner internally builds a native adapter via:
penguiflow.llm.create_native_adapter(...)
create_native_adapter(...) input shape¶
create_native_adapter accepts:
- a string model id:
"openai/gpt-4o"(example format), or - a config mapping:
- required:
{"model": "..."} - optional:
{"api_key": "...", "base_url": "...", ...}
Any extra keys in the mapping are forwarded as provider kwargs.
Structured output and schema mode behavior¶
The planner requests structured output through response_format (commonly json_schema).
The native adapter:
- normalizes schemas for stricter validators (e.g. ensures object roots),
- may downgrade structured mode when a provider rejects a schema (e.g.
json_schema → json_object → text).
If your provider frequently downgrades, you will see increased invalid JSON/repair traffic at the planner layer.
Native reasoning integration¶
Planner configuration:
use_native_reasoning=True(default)reasoning_effort="low" | "medium" | "high" | None(provider/model dependent)
When enabled and supported, the adapter can call the planner’s reasoning callback during streaming. This is used only for observability; the planner action schema remains JSON-only.
Operational defaults (recommended)¶
- Keep
temperature=0.0for planners. - Keep
json_schema_mode=Trueunless your provider cannot handle it. - Set aggressive timeouts (
llm_timeout_s) and bounded retries (llm_max_retries) per SLO. - Enable
stream_final_response=Trueonly when you have an event sink that can handle chunk volume safely.
Failure modes & recovery¶
Provider rejects response_format / JSON schema¶
Symptoms
- repeated invalid JSON repairs
- logs indicating response_format downgrade
Fix
- simplify tool schemas (fewer nested objects, fewer unions)
- reduce catalog size and ambiguity
- if needed, set
json_schema_mode=Falsetemporarily (expect more repair traffic)
System/developer message ordering issues¶
Some providers are sensitive to system messages not being first. The native adapter applies provider-specific normalization, but you should still keep system prompts consistent and avoid dynamic system-message insertion outside the planner.
Reasoning callback never fires¶
Likely causes
- provider/model does not support native reasoning
use_native_reasoning=False- streaming disabled for the call path
Observability¶
At minimum:
- record
PlannerEvent(event_type="llm_call")latency and retry counts - record whether schema mode was requested (
json_schema_mode) and whether streaming was enabled - for incident debugging, log only redacted prompt/response summaries (not raw content)
Security / multi-tenancy notes¶
- Do not place API keys or credentials into
llm_context. - Prefer environment-variable based secrets and inject them into provider config at construction time (or via
tool_context-held factories). - Treat model/provider configuration as tenant-controlled only if you have explicit allowlists; otherwise it becomes an injection vector.
Runnable examples¶
Native LLM usage requires provider credentials. Minimal planner example:
from __future__ import annotations
from pydantic import BaseModel
from penguiflow import ModelRegistry, Node
from penguiflow.catalog import build_catalog, tool
from penguiflow.planner import ReactPlanner, ToolContext
class EchoArgs(BaseModel):
text: str
class EchoOut(BaseModel):
text: str
@tool(desc="Echo", side_effects="pure")
async def echo(args: EchoArgs, ctx: ToolContext) -> EchoOut:
del ctx
return EchoOut(text=args.text)
def build_planner() -> ReactPlanner:
registry = ModelRegistry()
registry.register("echo", EchoArgs, EchoOut)
catalog = build_catalog([Node(echo, name="echo")], registry)
return ReactPlanner(
llm={"model": "openai/gpt-4o"},
catalog=catalog,
use_native_llm=True,
json_schema_mode=True,
temperature=0.0,
)
Troubleshooting checklist¶
- Did you set
use_native_llm=True(otherwise the LiteLLM path is used)? - Are your model ids provider-qualified consistently (
openai/...,anthropic/..., etc.)? - Are you seeing schema downgrade logs (indicating provider incompatibility with the schema)?
- Are you relying on native reasoning content for behavior (don’t; it is observability-only)?