Skip to content

Native LLM layer (planner integration)

What it is / when to use it

PenguiFlow includes a native LLM implementation (penguiflow.llm) and a compatibility adapter (NativeLLMAdapter) that implements the planner’s JSONLLMClient protocol.

Use the native LLM layer when you want:

  • provider-specific correctness fixes (message normalization, schema normalization),
  • structured output fallback behavior (e.g. schema mode downgrade when a provider rejects a schema),
  • optional native reasoning extraction and reasoning streaming callbacks,
  • a single internal implementation surface you can instrument and test.

Non-goals / boundaries

  • This page does not document every provider’s environment variables or auth story.
  • The native layer is not required; ReactPlanner(llm="...") (LiteLLM-backed) remains supported.
  • The planner still consumes a JSONLLMClient protocol; native integration is an implementation detail behind that protocol.

Contract surface

Enabling native LLM on ReactPlanner

Set:

  • ReactPlanner(..., use_native_llm=True)

Then pass:

  • llm="provider/model" or llm={"model": "...", ...}

When enabled, the planner internally builds a native adapter via:

  • penguiflow.llm.create_native_adapter(...)

create_native_adapter(...) input shape

create_native_adapter accepts:

  • a string model id: "openai/gpt-4o" (example format), or
  • a config mapping:
  • required: {"model": "..."}
  • optional: {"api_key": "...", "base_url": "...", ...}

Any extra keys in the mapping are forwarded as provider kwargs.

Structured output and schema mode behavior

The planner requests structured output through response_format (commonly json_schema).

The native adapter:

  • normalizes schemas for stricter validators (e.g. ensures object roots),
  • may downgrade structured mode when a provider rejects a schema (e.g. json_schema → json_object → text).

If your provider frequently downgrades, you will see increased invalid JSON/repair traffic at the planner layer.

Native reasoning integration

Planner configuration:

  • use_native_reasoning=True (default)
  • reasoning_effort="low" | "medium" | "high" | None (provider/model dependent)

When enabled and supported, the adapter can call the planner’s reasoning callback during streaming. This is used only for observability; the planner action schema remains JSON-only.

  • Keep temperature=0.0 for planners.
  • Keep json_schema_mode=True unless your provider cannot handle it.
  • Set aggressive timeouts (llm_timeout_s) and bounded retries (llm_max_retries) per SLO.
  • Enable stream_final_response=True only when you have an event sink that can handle chunk volume safely.

Failure modes & recovery

Provider rejects response_format / JSON schema

Symptoms

  • repeated invalid JSON repairs
  • logs indicating response_format downgrade

Fix

  • simplify tool schemas (fewer nested objects, fewer unions)
  • reduce catalog size and ambiguity
  • if needed, set json_schema_mode=False temporarily (expect more repair traffic)

System/developer message ordering issues

Some providers are sensitive to system messages not being first. The native adapter applies provider-specific normalization, but you should still keep system prompts consistent and avoid dynamic system-message insertion outside the planner.

Reasoning callback never fires

Likely causes

  • provider/model does not support native reasoning
  • use_native_reasoning=False
  • streaming disabled for the call path

Observability

At minimum:

  • record PlannerEvent(event_type="llm_call") latency and retry counts
  • record whether schema mode was requested (json_schema_mode) and whether streaming was enabled
  • for incident debugging, log only redacted prompt/response summaries (not raw content)

See Planner observability.

Security / multi-tenancy notes

  • Do not place API keys or credentials into llm_context.
  • Prefer environment-variable based secrets and inject them into provider config at construction time (or via tool_context-held factories).
  • Treat model/provider configuration as tenant-controlled only if you have explicit allowlists; otherwise it becomes an injection vector.

Runnable examples

Native LLM usage requires provider credentials. Minimal planner example:

from __future__ import annotations

from pydantic import BaseModel

from penguiflow import ModelRegistry, Node
from penguiflow.catalog import build_catalog, tool
from penguiflow.planner import ReactPlanner, ToolContext


class EchoArgs(BaseModel):
    text: str


class EchoOut(BaseModel):
    text: str


@tool(desc="Echo", side_effects="pure")
async def echo(args: EchoArgs, ctx: ToolContext) -> EchoOut:
    del ctx
    return EchoOut(text=args.text)


def build_planner() -> ReactPlanner:
    registry = ModelRegistry()
    registry.register("echo", EchoArgs, EchoOut)
    catalog = build_catalog([Node(echo, name="echo")], registry)

    return ReactPlanner(
        llm={"model": "openai/gpt-4o"},
        catalog=catalog,
        use_native_llm=True,
        json_schema_mode=True,
        temperature=0.0,
    )

Troubleshooting checklist

  • Did you set use_native_llm=True (otherwise the LiteLLM path is used)?
  • Are your model ids provider-qualified consistently (openai/..., anthropic/..., etc.)?
  • Are you seeing schema downgrade logs (indicating provider incompatibility with the schema)?
  • Are you relying on native reasoning content for behavior (don’t; it is observability-only)?