Skip to content

Use the Harbor Protocol

The Harbor Protocol is the canonical event/state contract between Runtime and any client. The bundled Console is one consumer; this skill walks the path for building your own. A working chatbot UI is achievable in a day on top of the Protocol — the wire is small, typed, and stable.

Three properties make this practical:

  1. A generated, drift-gated contract reference — the published Protocol adoption track carries four pages (methods / events / errors / types) emitted by cmd/harbor-gen-protocol-docs from the Go single sources and gated in CI by make protocol-docs-gen-check, plus an executed quickstart, five choreography guides, a worked build-a-client walkthrough, and the conformance-certification path. For typed TS wire shapes, vendor the Console's hand-maintained web/console/src/lib/protocol.ts (the D-093 TS generator was deferred per D-132 — protocol.ts is hand-maintained today, kept honest by the Console's own CI).
  2. Capability advertisementruntime.info.capabilities tells you at attach which Protocol surfaces this Runtime advertises (task_control, events_subscribe, runtime_posture, topology_snapshot). Your UI degrades gracefully on stripped-down runtimes.
  3. Stable Protocol versioning — breaking changes go through a deprecation window; same-major versions are compatible. Pin the major in your client; tolerate additive change. The full adopter contract is the published versioning & compatibility choreography.

The Protocol is what makes Harbor headless. The Runtime never imports Console code; the Console never reads internal Runtime objects. Your UI sits in the same posture as the Console.

1. The wire — base URL, auth, identity

The wire is REST-per-method: each Protocol method is its own route under /v1/, you POST a flat JSON body, and you get a flat JSON response back — there is no JSON-RPC envelope. Every request carries:

http
POST /v1/control/start HTTP/1.1
Host: 127.0.0.1:18080
Content-Type: application/json
Authorization: Bearer <JWT>
X-Harbor-Tenant: <tenant_id>
X-Harbor-User: <user_id>
X-Harbor-Session: <session_id>
  • Bearer JWT: RS256/RS384/RS512/ES256/ES384/ES512 signed token. Issuer + audience match the Runtime's identity: block. For harbor dev, the ephemeral HARBOR_DEV_TOKEN (printed on stderr) is what you use — see run-the-dev-loop.
  • X-Harbor-Session: the per-request session selector (D-171). The connection JWT verifies the WHO (tenant + user) and the scopes; the session is chosen per-conversation by this header and may differ on every request — the connection token is a per-backend credential, not a single-session pin. A new session id is a new conversation (create-on-first-use on the first start). The token's session claim is a back-compat default used only when the header is absent. X-Harbor-Tenant / X-Harbor-User can never widen the JWT-verified principal. Every storage call still filters by the full (tenant, user, session) triple — no cross-session leakage. Full Console contract: docs/notes/session-model-contract.md.

Routes group by surface family:

  • Task controlstart plus the nine steering verbs (cancel / pause / resume / redirect / inject_context / approve / reject / prioritize / user_message) all POST to POST /v1/control/{method} (e.g. /v1/control/start, /v1/control/cancel). The read-only posture methods (runtime.info, topology.snapshot) and artifacts.put share this route shape.
  • Event streamGET /v1/events (SSE; see §4).
  • Read surfaces group by family under their own prefix: POST /v1/tasks/{method} (e.g. /v1/tasks/get), POST /v1/tools/{method}, POST /v1/sessions/{method}, POST /v1/memory/{method}, and so on.

The body is a flat JSON object — the method's request shape — with an identity object carrying the triple (or the headers above; the body's identity may be left empty when the headers supply it):

json
{ "identity": { "tenant": "dev", "user": "dev", "session": "dev" }, "query": "Hello, agent!" }

The response is the method's flat response shape directly — no result / error wrapper. A failure is an HTTP status plus a {"code": "..."} envelope (e.g. 404 {"code": "unknown_method"}).

CORS is default-deny. For browser clients, your origin must be in the Runtime's server.allowed_origins. See run-the-dev-loop §2.

2. The handshake — runtime.info first

The first call your client makes:

bash
curl -sS -X POST "$HARBOR_BASE_URL/v1/control/runtime.info" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Harbor-Session: $SESSION" \
  -H "Content-Type: application/json" \
  -d '{"identity": {}}'

A real response from a dev Runtime:

json
{
  "instance_id": "harbor-dev-192.168.1.7",
  "display_name": "harbor dev",
  "build_version": "v0.0.0-dev",
  "build_commit": "dev",
  "build_go_version": "go1.26.3",
  "protocol_version": "0.1.0",
  "capabilities": ["events_subscribe", "runtime_posture", "task_control"],
  "uptime_seconds": 16
}

Two things to read and act on:

  • protocol_version — the wire-contract version (distinct from build_version, the Runtime's own release). Same major ⇒ compatible; on a major mismatch, warn loudly or refuse.
  • capabilities — the advertised Protocol surfaces. Shape your UI on this list: a runtime that doesn't advertise topology_snapshot gets the topology panel disabled, not a crash. A method outside the Runtime's registry returns the canonical 404 {"code": "unknown_method"} envelope — treat it (and 405 / 501) as "not served here, degrade", the same SKIP posture Harbor's own smoke scripts encode.

3. Starting a task — the chat-message equivalent

bash
curl -sS -X POST "$HARBOR_BASE_URL/v1/control/start" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Harbor-Session: $SESSION" \
  -H "Content-Type: application/json" \
  -d '{"identity": {}, "query": "What'\''s the weather in Madrid?", "input_artifact_ids": []}'

The request is the flat StartRequest: identity (the triple — empty here because the headers supply it), the query string, and the optional input_artifact_ids. There is no foreground field — every start mints a task and you observe it on the event stream.

Response is the flat StartResponse:

json
{
  "task_id": "tsk_01HXYZ...",
  "reused": false,
  "protocol_version": "0.1.0"
}

reused is true only when you supplied an idempotency_key that matched an existing task; protocol_version lets you detect a version skew.

For multimodal input, upload artifacts FIRST (artifacts.put, see §6) and pass the returned IDs in input_artifact_ids (D-166). The per-MIME dispatch — image inline vs PDF/audio as ArtifactStub — happens inside the planner; your client just passes refs. To override how an attachment is handed to the model, add the optional input_artifact_dispositions map (Phase 84b — D-189), keyed by artifact id with values ref | inline | provider_native | tool:<name> (e.g. {"art_x": "tool:pdf.extract"} forces the named catalog tool). Your hint is the top precedence layer (hint > the agent's multimodal.disposition config map > the runtime default: image inline, everything else ref); an omitted map keeps today's behaviour. tasks.get reflects the hint on input_artifacts[].disposition, and the resolution (including degradations — e.g. an unknown tool:<name>) is observable as task.input_disposition.resolved events. A provider_native hint is honoured end-to-end (Phase 84c — D-190): the LLM driver uploads the attachment to the provider's file surface and the upload is observable as llm.provider_file.uploaded events (artifact ref, provider, modality, file_id).

4. The events stream — SSE events.subscribe

The Protocol exposes events as Server-Sent Events:

http
GET /v1/events?access_token=<JWT>
Accept: text/event-stream
X-Harbor-Tenant: <tenant_id>
X-Harbor-User: <user_id>
X-Harbor-Session: <session_id>

The subscription is identity-scoped — it streams the whole session's events — so there is no task_id query param. A client that can set headers narrows server-side with the optional X-Harbor-Run (a task id) and the repeatable X-Harbor-Event-Type headers. A browser EventSource (which can't set custom headers) authenticates via the ?access_token= query-param shim — same JWT, same identity triple, its session claim scoping the stream — and filters client-side on the event payload's task id. The query-param shim is documented in internal/protocol/transports/transports.go.

The stream is a sequence of event: <type>\ndata: <JSON>\n\n blocks:

text
event: llm.completion.chunk
data: {"task_id":"tsk_01HXYZ","chunk":"Hello"}

event: llm.completion.chunk
data: {"task_id":"tsk_01HXYZ","chunk":" there!"}

event: tool.invoked
data: {"task_id":"tsk_01HXYZ","tool":"weather.get_current","args":{"city":"Madrid"}}

event: tool.result
data: {"task_id":"tsk_01HXYZ","tool":"weather.get_current","result":{"temperature_c":21.3}}

event: task.completed
data: {"task_id":"tsk_01HXYZ","status":"completed"}

A gotcha: the event payload's task ID field is payload.TaskID (capital T) — match exactly when parsing in JS/TS. Documented in the Console's chat panel handler; easy to miss when hand-rolling.

For a chat UI, you'd:

  1. Append a "user turn" bubble to the chat.
  2. POST start, get task_id.
  3. Open an SSE stream for that task_id.
  4. Append llm.completion.chunk content to a streaming "assistant turn" bubble.
  5. Render tool.invoked / tool.result as collapsed cards inside the assistant bubble.
  6. Close the bubble on task.completed.

5. Pause + steer + resume

The unified pause/resume primitive (RFC §3.3) is one wire choreography for every cause — HITL approval, tool-side OAuth, operator pause. The steering verbs share one route shape, POST /v1/control/{method}, with the run id and your steering scope in the body's identity:

bash
# park the run at the next planner-step boundary
curl -sS -X POST "$HARBOR_BASE_URL/v1/control/pause" \
  -H "Authorization: Bearer $TOKEN" -H "X-Harbor-Session: $SESSION" \
  -H "Content-Type: application/json" \
  -d '{"identity": {"run": "'$TASK_ID'", "scope": "owner_user"}}'

# feed it context while parked, then wake it
curl -sS -X POST "$HARBOR_BASE_URL/v1/control/inject_context" \
  -H "Authorization: Bearer $TOKEN" -H "X-Harbor-Session: $SESSION" \
  -H "Content-Type: application/json" \
  -d '{"identity": {"run": "'$TASK_ID'", "scope": "session_user"}, "payload": {"note": "Actually, make it Barcelona."}}'

curl -sS -X POST "$HARBOR_BASE_URL/v1/control/resume" \
  -H "Authorization: Bearer $TOKEN" -H "X-Harbor-Session: $SESSION" \
  -H "Content-Type: application/json" \
  -d '{"identity": {"run": "'$TASK_ID'", "scope": "owner_user"}}'

The 200 {"accepted": true, …} means enqueued; the effect is narrated on the event stream — pause.requested when the run parks, pause.resumed (with a typed Decision of approve / reject / resume / timeout) when it wakes. The planner sees injected context on its next step.

For HITL: an approval-gated tool emits tool.approval_requested with a pause token; your UI routes the human verdict through POST /v1/control/approve or /reject with "payload": {"token": "<pause-token>", "reason": "…"}. POST /v1/pause/list is the snapshot of everything currently awaiting a human — reconcile against it on every (re)attach; it is authoritative across Runtime restarts. The full wire choreography (including the OAuth callback leg and DecisionTimeout reaps) is the published pause-model choreography.

The "steer vs queue" UI choice in drive-the-playground §3 maps directly to "POST /v1/control/pause + inject + resume" vs "wait for task.completed then POST a new start".

6. Artifact upload — multimodal input

For images / PDFs / audio uploads from your UI, artifacts.put is a control-surface method: POST the bytes (base64-encoded inline on the request leg) and you get back a reference, never an echo of the body:

bash
curl -sS -X POST "$HARBOR_BASE_URL/v1/control/artifacts.put" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Harbor-Session: $SESSION" \
  -H "Content-Type: application/json" \
  -d '{
    "scope": {"tenant": "dev", "user": "dev", "session": "dev"},
    "bytes": "'"$(base64 < report.pdf)"'",
    "opts": {"mime_type": "application/pdf", "filename": "report.pdf"}
  }'

Response carries the canonical ref:

json
{
  "ref": {
    "id": "art_01H...",
    "mime_type": "application/pdf",
    "size_bytes": 142853,
    "filename": "report.pdf"
  },
  "protocol_version": "0.1.0"
}

Pass ref.id in start's input_artifact_ids. The upload bytes ride the request leg only (base64-inline, bounded by the Runtime's max request size — an oversize body is rejected with request_too_large); the response is a reference, and bytes never reach the LLM edge inline.

7. Topology snapshot — render the runtime's wiring

bash
curl -sS -X POST "$HARBOR_BASE_URL/v1/control/topology.snapshot" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Harbor-Session: $SESSION" \
  -H "Content-Type: application/json" \
  -d '{"identity": {}}'

Response is a graph of components + edges — Bifrost, tool catalog (with per-tool nodes), memory driver, state driver, artifact store, event bus, skill catalog. The Console's Topology page is one consumer; your custom dashboard could be another.

The capability is topology.snapshot: true (V1.1 phase 84a).

8. Typed wire shapes — where they actually come from

Two trustworthy sources, neither of which is hand-rolling:

  • The generated contract referencethe generated types page catalogues every canonical wire struct field-by-field with the snake_case JSON keys, generated by cmd/harbor-gen-protocol-docs from the Go single sources and drift-gated by make protocol-docs-gen-check. Transcribe your client types from it; when the wire changes, the page changes in the same PR by construction.
  • Vendor the Console's client module — copy web/console/src/lib/protocol.ts into your TS client. It carries the wire types + the typed HarborClient. It is hand-maintained (the D-093 TS generator was deferred per D-132 / issue #179), kept honest by the Console's CI rather than by codegen. License is Apache-2.0; attribution required.

Hand-rolling the types from scratch is fine for a quick prototype but you'll drift. Anchor any client you intend to maintain on the generated reference.

9. A minimal client (TS, ~30 LoC)

typescript
const baseUrl = "http://127.0.0.1:18080";
const token = "<HARBOR_DEV_TOKEN>";
const identity = { tenant: "dev", user: "dev", session: "dev" };

// One REST call per method: POST /v1/<family>/<method>, flat body in, flat body out.
async function call<T>(route: string, body: object): Promise<T> {
  const res = await fetch(`${baseUrl}${route}`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${token}`,
      "X-Harbor-Tenant": identity.tenant,
      "X-Harbor-User": identity.user,
      "X-Harbor-Session": identity.session,
    },
    body: JSON.stringify(body),
  });
  if (!res.ok) {
    const err = await res.json().catch(() => ({ code: `http_${res.status}` }));
    throw new Error(err.code ?? `http_${res.status}`);
  }
  return res.json() as Promise<T>;
}

const info = await call("/v1/control/runtime.info", { identity: {} });
console.log("connected to harbor", info);

const { task_id } = await call<{ task_id: string }>("/v1/control/start", { identity: {}, query: "Hello!" });

// The stream is session-scoped (no task_id query param), so filter client-side.
const sse = new EventSource(`${baseUrl}/v1/events?access_token=${encodeURIComponent(token)}`);
sse.addEventListener("llm.completion.chunk", (e) => {
  const data = JSON.parse(e.data);
  if (data.task_id === task_id) process.stdout.write(data.chunk);
});
sse.addEventListener("task.completed", (e) => {
  if (JSON.parse(e.data).task_id === task_id) sse.close();
});

That's a working CLI chatbot in 30 lines. Wrap the same in React/Svelte/Vue/whatever your stack is, render the chunks into a bubble, and you have a chat UI.

Common failure modes

  • Every call returns 401. Token expired (24h TTL) or rotated (harbor dev restarted). Re-fetch token, retry.
  • CORS preflight fails. Your origin isn't in server.allowed_origins. Add it to the yaml + restart Runtime.
  • SSE stream opens but no events. The payload.TaskID capital-T gotcha — your handler is reading payload.task_id (lowercase). Fix the case.
  • A control call returns 404 {"code": "unknown_method"} or 405/501. This runtime doesn't serve that surface. Call runtime.info first, branch on capabilities, and degrade the feature instead of crashing (the versioning & compatibility contract).
  • Artifact upload returns 413 Payload Too Large. The request body exceeded the Runtime's protocol.max_request_bytes (default 4 MiB) — the canonical {"code": "request_too_large"} envelope. Chunk uploads aren't supported in V1.1; raise protocol.max_request_bytes in the Runtime's harbor.yaml if you need larger inline uploads.
  • Topology snapshot rejected. This Runtime doesn't advertise the topology_snapshot capability — check runtime.info.capabilities before enabling the panel.
  • The Console reads internal Runtime objects. It doesn't — that would be a CLAUDE.md §13 violation. If you suspect leakage, file a bug; the Console reads only what's documented as a Protocol surface.

See also

Apache-2.0 licensed — see LICENSE.