Wire the LLM provider
Bifrost is Harbor's LLM driver — one wire surface that speaks many providers. You don't change Go code to swap providers; you change the llm: block in harbor.yaml. This skill walks the four common postures + the dev-mock escape hatch + the model_profiles block the planner needs for context budgeting.
1. The four canonical postures
Posture A — OpenRouter (aggregator, easiest start)
llm:
driver: bifrost
provider: openrouter
model: anthropic/claude-haiku-4.5
api_key: env.OPENROUTER_API_KEY
timeout: 60s
model_profiles:
anthropic/claude-haiku-4.5:
context_window_tokens: 200000OpenRouter aggregates 100+ models behind one API key. Best for prototyping ("does this model work for my agent?") and for production agents that want provider failover without bespoke wiring. Pricing is per-token, slightly above raw provider list price. Get a key at openrouter.ai.
Posture B — Anthropic direct
llm:
driver: bifrost
provider: anthropic
model: claude-haiku-4.5
api_key: env.ANTHROPIC_API_KEY
timeout: 60s
model_profiles:
claude-haiku-4.5:
context_window_tokens: 200000Direct API access — usually cheapest per-token + lowest latency. Best when you've committed to Anthropic. Get a key at console.anthropic.com.
Posture C — OpenAI direct
llm:
driver: bifrost
provider: openai
model: gpt-4.1-mini
api_key: env.OPENAI_API_KEY
timeout: 60s
model_profiles:
gpt-4.1-mini:
context_window_tokens: 1000000Same posture for OpenAI's API. Get a key at platform.openai.com.
Posture D — Custom OpenAI-compatible endpoint (NIM, vLLM, ollama, …)
llm:
driver: bifrost
provider: nim
model: nvidia/nemotron-3-super
timeout: 60s
custom_providers:
- name: nim
base_url: https://integrate.api.nvidia.com/v1
api_key_env_var: NVIDIA_API_KEY
models: [nvidia/nemotron-3-super]
model_profiles:
nvidia/nemotron-3-super:
context_window_tokens: 128000For any provider that exposes an OpenAI-compatible endpoint — NVIDIA NIM, vLLM serving, ollama for local LLMs, Together AI, Anyscale, Groq, Mistral, Cohere, Bedrock-compatible gateways. The custom_providers block tells Bifrost how to reach it; the provider: field references the name. Multiple custom providers can coexist — pick one as the active provider:, register the others for swap.
2. model_profiles — the budgeting contract
model_profiles.<model>.context_window_tokens is what the planner consults when it decides how much memory to replay, how much tool output to inline, and when to clip. A profile for your llm.model is effectively REQUIRED: a model with no model_profiles entry has no context-window number, so the first LLM call hard-fails with ErrUnsupportedModel — there is no silent fallback. Give every model you actually use a profile:
model_profiles:
anthropic/claude-haiku-4.5:
context_window_tokens: 200000
anthropic/claude-sonnet-4.5:
context_window_tokens: 200000
gpt-4.1-mini:
context_window_tokens: 1000000Look up the official context-window number from the provider's docs; never guess.
Future fields (post-V1.1) will let you set per-model output_max_tokens, pricing_per_input_token, and pricing_per_output_token. For V1.1, only context_window_tokens is honoured.
3. The dev-mock escape hatch
CLAUDE.md §13 forbids stub LLMs as production defaults. The boot path is fail-loud: no key set, no provider configured, the binary exits with ErrMissingAPIKey. There IS a documented escape hatch for first-clone convenience and CI smoke:
HARBOR_DEV_ALLOW_MOCK=1 harbor devWhen HARBOR_DEV_ALLOW_MOCK=1 is set, the binary boots with a deterministic stub LLM and prints a stderr banner on every boot:
[DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION]The banner is unmissable — it's bright, it's printed on every request to the LLM endpoint, and it's gated by a single env var. Production deployments NEVER set this var; CI smoke runs do, so we don't burn provider tokens validating the boot path. The flag does NOT degrade silently — a production.yaml with a misconfigured key + HARBOR_DEV_ALLOW_MOCK=0 (the default) is still a fail-loud exit.
4. Swap models without redeploying
Models are hot-reloadable. With harbor dev running:
- Edit
harbor.yaml, changellm.model:(and add the matchingmodel_profilesentry). - Save.
harbor dev's fsnotify watcher drains in-flight runs, re-reads the config, re-binds Bifrost to the new model, and accepts new runs.
You see this in the runtime stderr:
time=... msg="config reload: llm.model changed" old=claude-haiku-4.5 new=claude-sonnet-4.5The Console's connection footer reflects the new model on the next Task run.
Provider swap (e.g. OpenRouter → Anthropic direct) is the same flow — edit, save, watcher reloads. Bifrost handles the provider handshake internally.
5. Timeouts + retries
llm:
# ... provider + model ...
timeout: 60s # request-level timeout
network_defaults: # applies to every provider (native + custom)
max_retries: 2 # extra attempts after the first try
retry_backoff_initial: 1s # backoff before the first retry
retry_backoff_max: 8s # backoff ceiling
model_profiles:
anthropic/claude-haiku-4.5:
context_window_tokens: 200000
max_retries: 4 # per-model override of network_defaults.max_retriesnetwork_defaults sets the retry policy for every provider; a model_profiles.<model>.max_retries entry overrides it for that one model. The retry policy is per-attempt; the timeout applies to each individual attempt. Backoff grows from retry_backoff_initial up to the retry_backoff_max ceiling. Omit any field and it falls through to Bifrost's package-level default. Bifrost honours Retry-After headers from the provider when present.
Long-running models (deep reasoning, large context) sometimes exceed the default 60s; bump to 120s or 240s for those. The Console's Task page surfaces timeout errors with the provider's verbatim response, so you can tune fast.
6. Embeddings are a separate block (not an llm knob)
Harbor's embedding client is a sibling seam to the chat client — turning text into vectors gets its own provider/model/key, configured at the top-level embeddings: block, never inherited from llm.*:
embeddings:
driver: bifrost # the default; omit freely
provider: openai
model: text-embedding-3-small
api_key: env.OPENAI_API_KEY # same env.NAME convention as llm.api_key
# dimensions: 256 # optional reduced output dimensionYou only need it when something consumes embeddings — the opt-in semantic retrieval modes (memory.retrieval: semantic / skills.retrieval: semantic; see configure-memory-and-skills) or your own à-la-carte retrieval (docs/recipes/embed-and-retrieve.md). Enabling a semantic mode without the block fails validation loudly naming the missing keys; there is no mock embeddings driver and no fallback to the chat provider.
Common failure modes
harbor devexits immediately withErrMissingAPIKey: env.OPENROUTER_API_KEY not set. Source your.envor export the var in the shell that ranharbor dev. Verify withecho $OPENROUTER_API_KEY.harbor devexits withErrUnknownProvider: "nim". You setprovider: nimbut forgot the matchingcustom_providers:entry. Add it.- Every LLM call times out. Either your
timeout:is too low for the model, OR the provider is unreachable from the runtime's network. Check with acurl https://api.openrouter.ai/v1/modelsfrom the runtime host first. llm.context_leakevents fire mid-run. A tool returned >32KB inline instead of anArtifactStub. Seeadd-an-in-process-tool§4.harbor devfails the first LLM call withErrUnsupportedModel: model has no configured ModelProfile. Yourllm.modelhas nomodel_profiles.<model>entry, so the runtime has no context-window number for it and refuses the call — there is no fallback. Add themodel_profiles.<model>.context_window_tokensentry.
See also
define-the-agent-yaml— thellm:block in the context of the full yaml.configure-memory-and-skills— memory budgeting against the context window.observe-with-the-console— the LLM tab in the Console's Task page shows every prompt/completion.- Bifrost's full provider matrix:
github.com/maximhq/bifrost. - The CONFIG.md reference:
docs/CONFIG.md#llm(and#embeddingsfor the embedding client).