Skip to content

Harbor — Master Phase Plan

How to read this file

This is the canonical execution index for Harbor's V1 build. Every individual phase plan (docs/plans/phase-NN-<slug>.md) lives under it and inherits its done-definition, dependency declarations, and coverage discipline.

  • Source of truth: /RFC-001-Harbor.md (referenced as RFC §X.X). Every phase below traces to one or more RFC sections; if a phase plan and the RFC drift, the RFC wins (AGENTS.md §2).
  • Research substrate: the eleven briefs in docs/research/01..11.md (canonical index: docs/research/INDEX.md). Decisions on shape, sharp edges, and Go-flavored types come from there.
  • Numbering: phase-NN-<slug>.md, two-digit zero-padded; lettered suffixes (26a, 33a, 36a, 36b, 53a, 64a, 83a83e, 85a85j, 85k, 85m) insert work into an existing band without renumbering. Phases 01–82 + 26a + 33a + 36a + 36b + 53a + 64a are V1; 83–100 + 83a–e + 85a–j + 85k + 85m are post-V1 follow-ups listed for completeness so we don't lose track. The integer phase 85 (Skills Portico provider driver) was removed — Portico is an MCP gateway and speaks MCP like any server, so the generic MCP client driver is its consumer; the 85-band is now MCP client/host compliance (85a–j + 85m; 85k is Harbor agent-builder skills). Per the MCP 2026-07-28 RC re-plan (2026-05-28), phases 85c / 85e / 85h / 85i are Cut and 85m is new; see the 85-band detail block. See brief 14.
  • Done-definition (binding, from AGENTS.md §4.2): (a) all acceptance criteria pass; (b) coverage targets met; (c) scripts/smoke/phase-NN.sh shows OK ≥ count(criteria) and FAIL = 0; (d) prior phases' smoke scripts still pass.
  • Coverage defaults (override per phase): 80% for new packages; 85% for persistence drivers and conformance-tested subsystems; 70% for CLI/tooling.
  • Predecessor name: does not appear in this repository, ever. (AGENTS.md §13.)

Phase index

#NameSubsystemRFC §DepsCov.Status
00Skeletonrepo / hygienen/an/aShipped
01Identity & isolation tripleidentity§40090%Shipped
02Configuration loaderconfig§100085%Shipped
03Audit redactoraudit§6.4, §6.150090%Shipped
04slog Logger + standard attribute settelemetry§6.140385%Shipped
05Event taxonomy + InMem EventBus + isolationevents§6.1301, 0385%Shipped
06Bus replay + ring buffer + cursorevents§6.130585%Shipped
07StateStore iface + InMem + conformance suitestate§6.11, §901, 0385%Shipped
08SessionRegistry + lifecycle + GCsessions§6.901, 0785%Shipped
09Envelopes, Headers, Identity quadrupleruntime/messages§6.101, 0885%Shipped
10Engine + workers + cycle detectionruntime/engine§6.10985%Shipped
11Reliability shell (timeout/retry/validate)runtime/engine§6.11085%Shipped
12Streaming + per-run capacity backpressureruntime/streaming§6.110, 1185%Shipped
13Cancellation + per-run fetch dispatcherruntime/engine§6.110, 1285%Shipped
14Routers + concurrency utils + subflowsruntime/routers§6.110, 1185%Shipped
15SQLite StateStore driverstate/sqlite§6.11, §90790%Shipped
16Postgres StateStore driverstate/postgres§6.11, §90790%Shipped
17ArtifactStore iface + InMem + FS driversartifacts§6.10, §901, 0785%Shipped
18ArtifactStore SQLite-blob + Postgres-blobartifacts§6.10, §917, 15, 1685%Shipped
19ArtifactStore S3-style driverartifacts§6.101780%Shipped
20TaskRegistry iface + InProcess + lifecycletasks§6.801, 0785%Shipped
21TaskGroup + retain-turn + patchestasks§6.82085%Shipped
22MessageBus + RemoteTransport contractsdistributed§6.1209, 2085%Shipped
23MemoryStore iface + InMem + conformancememory§6.601, 0785%Shipped
24Memory strategies (truncation, summary)memory§6.62385%Shipped
25SQLite + Postgres memory driversmemory§6.6, §923, 15, 1690%Shipped
25aDurable memory strategies (truncation + rolling_summary on SQL drivers; Summarizer through memory.Open)memory§6.6, §923, 24, 25, 15, 16n/aShipped (V1.1.x)
26Tool catalog core + InProcess registrationtools§6.401, 05, 0985%Shipped
26aFlow-as-Tool registration + per-flow Budgetruntime/flow + tools§6.1, §6.414, 2685%Shipped
26bPer-MCP-server + per-tool tool-policy config (policy: / tool_policies:)tools + config§6.426, 28n/aShipped (V1.1.x)
27HTTP tool drivertools/http§6.42685%Shipped
28MCP southbound drivertools/mcp§6.42680%Shipped
29A2A southbound driver (full spec)tools/a2a§6.426, 2280%Shipped
30Tool-side OAuth + HITL via pause/resumetools/auth§6.4, §3.326, 50, 53a85%Shipped
31Tool-side approval gatestools/approval§6.4, §3.33080%Shipped
32LLM client core + StreamSink contractllm§6.50985%Shipped
33bifrost integrationllm§6.5, §11Q33280%Shipped
33aCustom OpenAI-compatible providers + timeoutsllm§6.53380%Shipped
34Provider correction layer (one mode, baked)llm§6.53385%Shipped
35Structured output strategies + downgradellm§6.533, 3485%Shipped
36Retry with feedbackllm§6.53585%Shipped
36aCost accumulator + per-identity ceilingsgovernance§6.1511, 15, 3385%Shipped
36bPer-identity rate limits + per-call MaxTokensgovernance§6.1536a85%Shipped
37Skill store + LocalDB driver + FTS5 ladderskills§6.701, 07, 1585%Shipped
38Skill planner tools (search/get/list)skills/tools§6.726, 3785%Shipped
39Virtual directory subsystemskills§6.73780%Shipped
40Skills.md importer (gap-closer)skills/importer§6.73790%Shipped
41In-runtime skill generator with persistenceskills/generator§6.737, 38, 0390%Shipped
42Planner iface + Decision sum + RunContextplanner§6.2, §3.209, 13, 26, 3290%Shipped
43Trajectory + serialise (fail-loudly contract)planner/trajectory§6.2, §3.442, 0790%Shipped
44Schema repair pipelineplanner/repair§6.242, 3285%Shipped
45Reference ReAct planner (minimum viable)planner/react§6.242, 43, 44, 3285%Shipped
46Trajectory compression / summariserplanner§6.243, 3280%Shipped
47Parallel-call exec + ReAct emission upgradeplanner+runtime§6.245, 14, 42, 20, 2185%Shipped
48Deterministic planner (proves the iface)planner/deterministic§6.2, §11Q64285%Shipped
49Planner conformance packplanner§6.242, 45, 4890%Shipped
50Pause/Resume Coordinator + handle registryruntime/pauseresume§6.3, §3.307, 09, 1390%Shipped
51Pause-state serialise contract (fail-loud)runtime/pauseresume§6.3, §3.450, 4390%Shipped
52Steering inbox + control taxonomyruntime/steering§6.350, 0585%Shipped
53Steering wiring (9 control events)runtime/steering§6.352, 1385%Shipped
53aAgent Registry (registration identity + IDs)runtime/registry§6.16, §701, 05, 07, 0885%Shipped
54Protocol task control surfaceprotocol§5.2, §6.350, 53, 2085%Shipped
55OTel traces + propagation conventionstelemetry§6.1404, 0585%Shipped
56Metrics + OTLP + Prometheus driverstelemetry§6.14, §11Q555, 0585%Shipped
57Durable event log driver (StateStore-backed)events§6.1305, 07, 15, 1685%Shipped
58Protocol types/methods/errors single sourceprotocol§5, §80190%Shipped
59Protocol versioning + deprecation policyprotocol§5.35885%Shipped
60Protocol wire transport (SSE + REST)protocol§5.4, §11Q158, 0585%Shipped
61Protocol auth + identity-scope enforcementprotocol§5.5, §458, 60, 0190%Shipped
62Protocol conformance suiteprotocol§558, 60, 6185%Shipped
63Harbor CLI skeleton (harbor + cobra)cmd/harbor§86070%Shipped
64harbor dev v1 (boot runtime + protocol)cmd/harbor§863, 6075%Shipped
64aTool catalog OAuth + approval wiringtools/catalog§6.426, 30, 31, 50, 6480%Shipped
65harbor dev hot-reloadcmd/harbor§86475%Shipped
66harbor dev draft-save scaffoldingcmd/harbor§86475%Shipped
67harbor scaffoldcmd/harbor§86370%Shipped
68harbor validatecmd/harbor§863, 0275%Shipped
69harbor inspect-events / inspect-runscmd/harbor§863, 6070%Shipped
70harbor inspect-topology (ASCII renderer)cmd/harbor§863, 6070%Shipped
71harbortest test kit packagetesting§6.1305, 09, 0785%Shipped
72Console subscription protocol surfaceprotocol§5.2, §760, 05, 0685%Shipped
72aevents.subscribe filter ext + events.aggregateprotocol+events§5.2, §6.1360, 61, 7285%Shipped
72bIdentityScope admin-impersonation extensionprotocol§5.5, §760, 6189%Shipped
72csearch.* cluster (5 methods)protocol+search§5.2, §760, 61, 08, 20, 0585%Shipped
72dnotification.* event topic + mapperprotocol+events§5.2, §6.1305, 06, 2085%Shipped
72epause.list snapshot Protocol methodprotocol§5.2, §6.350, 60, 61, 1790%Shipped
72fRuntime posture surface (runtime.*/metrics.snapshot)protocol§5.3, §6.15, §760, 61, 5685%Shipped
72ggovernance.posture + llm.postureprotocol§5.5, §6.1536a, 36b, 64, 72f85%Shipped
72hConsole DB local schema + SvelteKit scaffoldweb/console§76085%Shipped
73Console state inspection surfaceprotocol§5.2, §760, 07, 1785%Shipped*
73lConsole Artifacts pageweb/console§5.2, §6.10, §773, 7580%Shipped
73iConsole Flows page (Protocol + UI)protocol+web/console§5.2, §6.1, §773, 75, 26a85%Shipped
73gConsole Events pageweb/console§5.2, §6.13, §772a, 73, 7580%Shipped
73cConsole Sessions page (Protocol + UI)protocol+web/console§5.2, §6.9, §708, 60, 61, 72a, 72b, 72c, 7580%Shipped
74Console topology projection eventsprotocol§5.2, §6.1305, 0985%Shipped
75Console e2e Playwright harness baselinetesting§760, 72n/aShipped
73kConsole MCP Connections pageweb/console§6.4, §728, 30, 50, 60, 61, 64a, 72a, 7580%Shipped
73dConsole Tasks page (kanban + bulk control)protocol+web/console§5.2, §6.8, §720, 21, 54, 60, 61, 72c, 7585%Shipped
73bConsole Live Runtime page (Protocol + UI)protocol+web/console§5.2, §6.3, §6.13, §760, 61, 72a, 73, 73i, 74, 7585%Shipped
73nConsole Playground page (Protocol + UI)protocol+web/console§5.1, §6.4, §6.13, §754, 60, 61, 72b, 73l, 74, 7585%Shipped
73aConsole Overview page (composition-only UI)web/console§5.2, §6.13, §6.15, §754, 60, 61, 72a, 72e, 72f, 73d, 7570%Shipped
73mConsole Settings page + harbor console subcommandprotocol+web/console+cmd§5.3, §5.5, §6.15, §772d, 72f, 72g, 72h, 7575%Shipped
75aConsole e2e Playwright wave-end suitetesting§775, 73a-73nn/aShipped
76Cross-tenant isolation conformance harnesstesting§4.307, 17, 23, 37, 2095%Shipped
77Goroutine leak conformance harnesstesting§5(Go)10, 13, 50n/aShipped
78Chaos / fault injection harnesstestingn/a76, 77n/aShipped
79Performance benchmarkstestingn/a10, 12, 05n/aShipped
80Documentation hygiene polish (godoc, recipes)docs§2all V1n/aShipped
81Release engineering (versioning, changelog)release§12all V1n/aShipped
82V1 cutrelease§1, §1281n/aShipped
83Auto-sequence detection (planner opt.)planner§1245n/aPost-V1
83aReAct prompt structured sectionsplanner/react§6.24585%Shipped
83bReAct tool schema injection (catalog rendering)planner/react§6.2, §6.483a, 2685%Shipped
83cReAct dynamic repair guidance + planning hintsplanner/react§6.283a, 44, 0585%Shipped
83dReAct skills + memory injection (UNTRUSTED)planner/react§6.2, §6.683a, 23, 3785%Shipped
83eReAct reasoning channel decouplingplanner/react+llm§6.2, §6.545, 32, 33, 4490%Shipped
83fDev RunLoop populates 83-band RunContextruntime/dev§6.2, §6.683c, 83d, 23, 37, 2080%Shipped
83gMCP southbound consumer in harbor devruntime/dev§6.428, 2680%Shipped
83hDev-binary fixes (hot-reload sqlite + LLM Model)runtime/dev + llm§6.5, §883g, 64, 3280%Shipped
83iRunContext wiring closure (Catalog/Trajectory/Memory/Emit)runtime/dev + steering§6.2, §6.6, §6.883f, 83g, 83h, 26, 2380%Shipped
83nharbor init + tiered yaml + docs/CONFIG.md + built-in toolscli + tools/builtin§8, §6.467, 63, 2685%Shipped
83oscaffold reads operator yaml + per-custom-tool Go stubs + --patchcli/scaffold + config§8, §6.467, 83n, 2685%Shipped
83lreal-bifrost integration tests + snapshot CustomProviders bug fixtest/integration + cli§6.533, 33a, 45, 83h, 83i80%Shipped
83mWARN cleanup band (MCP push id, sqlite watcher, closers, skills kw, llm timeout, scopes, tool_count, reasoning)cmd/harbor + mcp + llm + tasks + steering + planner§6.2, §6.4, §6.5, §6.8, §883g, 83h, 83i, 83l85%Shipped
83kConsole release embed (make build + release pipeline rebuild Console; staleness gate; placeholder copy)cmd/harbor + Makefile + release pipeline§5, §873m, 81, 83nn/aShipped
83pSettings two-group layout (console-local always; runtime-posture wrapped) — closes walkthrough F1web/console + Settings page§5, §873m, 73pn/aShipped
83qPlayground sidebar nav + breadcrumb case — closes walkthrough F2 + N1web/console + (console) layout§5, §773nn/aShipped
83rDisconnected-state hygiene + isDisconnected() predicate — closes W1/W2/W3 + N4/N5/N8/N9/N10web/console (cross-page)§573m, 73p, 83pn/aShipped
83sSaved-views label "Save view" + per-page footer dedup — closes N2 + N7web/console (cross-page)§573m, 73pn/aShipped
83uConsole DB chicken-and-egg fix — attachConnection() + best-effort DB upsert (closes round-2 F3)web/console + Settings page§5, §773m, 73p, 83pn/aShipped
83vRuntime CORS allowlist — default-deny + per-origin echo + dev-only escape (closes round-2 F4)internal/protocol/transports + config + cmd/harbor§5, §76090%Shipped
83wWire-surface gaps — friendly unknown_method info banner (F5) + mcp.servers.list (F6)web/console + cmd/harbor + mcpconsole§5, §6.4, §783g, 83m, 73kn/aShipped
83xReal-data layout polish — W4-W11 + N11-N14 (incl. W6 created_at + W8 session-row Go fixes)web/console + cmd/harbor + internal/protocol/artifacts§5, §6.4, §6.6, §6.10, §6.1373m, 73p, 83i, 83mn/aShipped
84Reflection / critique loopplanner§1245n/aPost-V1
84aRuntime-capability gate + session aggregates (round-8 F1+F8 closeout)internal/protocol + web/console§5.3, §6.4, §772f, 73c, 73d, 72b, 83w90%Shipped
84bMultimodal attachment disposition policy (mechanism→policy; default ref)internal/planner + internal/config + internal/protocol + web/console§6.4, §6.5, §6.10F11/D-166, 107cn/aShipped (V1.1.x)
84cProvider-native multimodal mechanism (image/audio/video first, files/PDF last; opt-in via 84b)llm/drivers/bifrost + planner§6.5, §6.10, §11Q384b, 107, 32n/aShipped (V1.1.x)
84dEmbedding client (Embedder→bifrost) + semantic memory & skill retrieval (opt-in)internal/embeddings + internal/memory + internal/skills§6.5, §6.6, §6.732, 23, F11/84bn/aShipped (V1.1.x)
84eSemantic memory consumption in the run loop (SearchTurns recall → <read_only_external_memory>; opt-in via 84d)internal/runtime/runctx + internal/memory + internal/config + cmd/harbor + harbortest§6.2, §6.5, §6.684d, 83d, 83f, 110b, 110c, 107cn/aShipped (V1.1.x)
85aMCP client core-compliance fixes (roots-empty now permanent)tools/mcp§6.42885%Ready now
85bMCP HTTP OAuth (RFC 9728 + 8707 + RC auth SEPs)tools/mcp+auth§6.4, §3.328, 30, 5085%Ready now (scope ↑)
85cMCP sampling providertools/mcp+llm§6.4, §6.528, 32, 50Cut — RC deprecates sampling
85dMCP elicitation provider (RC InputRequiredResult shape)tools/mcp§6.4, §3.328, 50, 85m85%Revisit after SDK-RC
85eMCP roots providertools/mcp§6.428, 85aCut — RC deprecates roots
85fMCP remaining server features (sans logging)tools/mcp§6.428, 85a85%Ready now (slim)
85gMCP Apps host (Console ui:// renderer)web/console§6.4, §728, 85aDeprecated → superseded by 109a–c (D-172)
85hMCP Tasks wire types (hand-transcribed)tools/mcp§6.428Cut — RC redesigns Tasks
85iMCP Tasks clienttools/mcp§6.485h, 28Cut — RC redesigns Tasks
85jMCP client conformance + compliance statement (target: RC)tools/mcp + docs§6.485a, 85b, 85d, 85f, 85g, 85m85%Revisit after RC-final
85mMCP 2026-07-28 RC adoption (sessions, headers, errors, schema, cache, trace)tools/mcp§6.428, 85a85%Revisit after SDK-RC
85kHarbor agent-builder skills (adoption surface, ~10 SKILL.md playbooks; MCP wiring is one of them)docs/skills + scripts§1, §7, §6.4V1.1 closure, 85a (for the MCP skill), sibling Dockyard skills/n/aPending (V1.1.x)
86Durable distributed bus driverdistributed§6.12, §122285%Shipped
87Durable TaskService backendtasks§6.8, §1220, 2285%Shipped
86aDistributed task dispatcher (the MessageBus consumer) + multi-worker deploymentdistributed + tasks + runtime§6.8, §6.12, §1286, 8785%Post-V1
88Episodic memory tiermemory§6.6, §11Q424, 25n/aPost-V1
89A2A northbound (Harbor as A2A server)tools/a2a§6.4, §11Q229n/aPost-V1
90Additional planner concretesplanner§1249n/aPost-V1
91Console-driven key rotation (Protocol)governance§6.1536a, 60, 73n/aPost-V1
92Console-driven mid-session model swapgovernance§6.1536a, 60, 73n/aPost-V1
92aAgent-config control plane (extends 91/92)governance/agentcfg§6.15, §6.1686, 87, 92, 53a, 37, 110a, 109in/aPost-V1
93Failover chains as Harbor policygovernance§6.1536a, 33n/aPost-V1
94Provider circuit breakers (provider, key)governance§6.1533, 93n/aPost-V1
95LLM cache (exact-match + semantic)governance/cache§6.1533n/aPost-V1
96PII redaction at the LLM boundaryaudit§6.1503, 33n/aPost-V1
97Media-input tool wrapperstools/media§6.5, D-02117, 26, 33n/aPost-V1
98Media-output tool wrapperstools/media§6.5, D-02117, 26, 33n/aPost-V1
99Vision-aware memory summarizationmemory§6.6, D-02124, 33, 97n/aPost-V1
100Recipe loader (declarative YAML flows)runtime/flow/recipe§6.1, D-02326an/aPost-V1
101GitHub Actions Node 24 modernisation.github/workflows§1281n/aShipped (V1.1.x)
102Godoc hygiene — strip internal phase jargoninternal/ + cmd/§1, §12(none hard)n/aShipped (V1.1.x)
103GitHub Pages docs site (Dockyard parity)docs/site + workflows§1, §7, §1285k (102 soft — see D-208)n/aShipped (V1.3)
104Composable resilient flows — value propositionRFC §1 + README + docs/skills§1, §6.185kn/aPending (V1.1.x)
105Console first-attach UX (zero-clicks-to-attached)web/console + cmd/harbor + internal/server§1, §785k, 73mn/aShipped
106Playground displays the real assistant responsecmd/harbor + internal/tasks + web/console§1, §6.5, §773n/aShipped
107Streaming completion pipeline (bifrost → events bus → Playground)internal/llm + internal/planner + cmd/harbor + web/console§1, §6.5, §7106, 105, 84b, 83en/aShipped
107aReasoning trace projection (tasks.get enricher + Playground accordion)internal/protocol + internal/tasks + cmd/harbor + web/console§1, §6.5, §6.8, §773d, 83e, 106n/aShipped
107bStreaming answer extractor (React planner streamAnswerFilter)internal/planner/react§1, §6.2, §6.5, §7107, 83a, 83b, 83c, 83d, 83en/aSuperseded by 107c (not shipped)
107cNative tool-calling + deferred tools/skills + search meta-tools (alt to 107b — collapses Path B into one wave)internal/llm + internal/tools + internal/planner/react + internal/config + cmd/harbor§1, §6.2, §6.4, §6.5, §6.7, §7107, 83a, 83b, 83c, 83d, 83e, 83n, 37, 26, 32, 33, 33an/aShipped
107dNative parallel tool-calls (dev executor CallParallel branch + React CallParallel emission + default flip; closes 107c's serialization carve-out)cmd/harbor + internal/runtime/parallel + internal/planner/react + internal/config§6.2, §6.5107c, 47, 83in/aShipped (V1.1.x)
107eSpawnTask + AwaitTask dev-executor dispatch (background-task execution; closes the last ErrDecisionShapeUnsupported carve-out)cmd/harbor + internal/config§6.2, §6.5, §6.8107c, 47, 83i, 83fn/aPending (V1.1.x)
107fSession artifact manifest (read-only <session_artifacts> prompt block + provenance canonicalisation)internal/planner + cmd/harbor + internal/protocol + internal/runtime/flow§6.2, §6.4, §6.5107c, 17, 33n/aShipped (V1.1.x)
108Playground page polish + Console shell layout (first of 14 page-polish phases)web/console§1, §773n, 105, 106n/aPending (V1.1.x)
109aMCP Apps runtime + Protocol surface (_meta.ui.resourceUri parse, ui:// projection, mcp.servers.read_resource, real DisplayMode negotiation, app-tool-call proxy)internal/tools/drivers/mcp + internal/protocol + cmd/harbor§6.4, §6.5, §728, 85a, 84an/aShipped (V1.1.x)
109bConsole MCP Apps host (sandboxed iframe + CSP + official AppBridge in manual-handler mode + inline DisplayMode)web/console§6.4, §7109a, 73n, 108n/aShipped (V1.1.x)
109cMCP Apps DisplayMode layout (fullscreen tab + pip 50/50 split + rail toggle)web/console§7109bn/aShipped (V1.1.x)
109dInline MCP-app discovery (mcp.app_available event + MCPAppRef.server_id + ChatMessage app ref + MessageBubble renderer mount)internal/tools/drivers/mcp + internal/protocol + web/console§6.4, §6.5, §7109a, 109b, 109c85%Shipped (V1.1.x)
109eMCP App discovery reads the tool-DEFINITION _meta.ui (spec-conformance fix — discovery fires against real ext-apps servers; live-test-found)internal/tools/drivers/mcp§6.4, §6.5, §7109a, 109d85%Shipped (V1.1.x)
109fRender heavy MCP App documents (fetch the offloaded artifact) + operator "pop to side-by-side" affordance (live-test-found)web/console§6.4, §6.5, §7109a, 109b, 109c, 109dn/a (Console)Shipped (V1.1.x)
109gMCP App documents render inline on every artifact driver (read_resource scopes the heavy threshold out of ui:// app docs; live-test-found)internal/mcpconsole§6.5, §7109a55%Shipped (V1.1.x)
109hMCP Apps UI-host capability advertisement (the driver advertises io.modelcontextprotocol/ui displayModes on the initialize handshake — the write side of negotiateDisplayModes — preserving roots)internal/tools/drivers/mcp + internal/config§6.4, §7109an/aShipped (V1.1.x)
109iMCP Apps tool-context capture + mcp.apps.tool_context (the Data-Delivery backend — capture input+result behind a declared ui:// app, identity-scoped read)internal/tools/drivers/mcp + internal/mcpconsole + internal/protocol + cmd/harbor§6.4, §6.5, §7109a, 109d, 109g85%Shipped (V1.1.x)
109jConsole pushes tool-input/tool-result into the app after ui/initialize (official AppBridge sendToolInput/sendToolResult) — the Data Delivery Console halfweb/console§6.4, §7109i, 109bn/a (Console)Reverted (#346 — handshake regression; re-land #347)
109kMCP Apps spec-conformance hardening (wave-end audit fixes: mimeTypes UI capability not displayModes; server-namespaced app→host tool calls; size-changed + teardown + live-theme host obligations)internal/tools/drivers/mcp + internal/mcpconsole + internal/protocol + web/console§6.4, §7109a, 109b, 109h, 109i, 109j100%Shipped (V1.1.x)
110aTool-executor promotion (internal/runtime/dispatch + exported answer envelope + tools.NewPlannerView; devstack degraded executor deleted)internal/runtime/dispatch + internal/planner + internal/tools + cmd/harbor + harbortest§6.4, §6.5, §6.2D-192 fix, 107d, 107e, 83i85%Shipped (V1.1.x)
110bRunContext population + event-closure promotion (internal/runtime/runctx + events.IdentityStampingEmitter + llm.NewChunkPublisher; devstack Emit/OnChunk/envelope parity)internal/runtime/runctx + internal/events + internal/llm + cmd/harbor + harbortest§6.2, §6.5, §6.13110a, 83f, 83i, 83m, 10790%Shipped (V1.1.x)
110cConfig-projection exporters (five FromConfig + config.Defaults() + ValidateCore + internal/drivers/prod aggregator; fixes live devstack planner drift B3)internal/llm + internal/memory + internal/skills + internal/planner + internal/governance + internal/config + internal/drivers/prod§6.5, §6.6, §6.7, §9, §1083l, 83f, 107d, 107e95%Shipped (V1.1.x)
110dAssembly promotion (exported error-returning assemble.Assemble + MCP attach + auth.BuildProviders + events.OpenWith; D-094 mirror collapses to thin callers; headless recipe)internal/runtime/assemble + tools/mcp + tools/auth + internal/events + cmd/harbor + harbortest§6.4, §6.13, §9, §10110a, 110b, 110c, 64, 83g, 30, 5780%Shipped (V1.1.x)
111aGovernance enforcement assembly (identity_tiers actually enforce; SetFactory's first production caller)internal/governance + cmd/harbor + harbortest§6.15, §6.5, §6.1132, 36a, 36b, 110c (soft)90%Shipped (V1.1.x)
111bTool-OAuth completion leg (auth.CallbackHandler + full pause→callback→resume choreography E2E)internal/tools/auth + cmd/harbor§6.4, §3.3, §6.330, 50, 31, D-192 fix85%Shipped (V1.1.x)
111cDurable pauses + pause lifecycle (checkpoint-store wiring, trajectory threading, max-park sweeper → DecisionTimeout's first producer)internal/runtime/pauseresume + internal/runtime/steering + cmd/harbor§3.3, §6.3, §6.1150, 51, D-192 fix90%Shipped (V1.1.x)
111dSkills canonical surface + ingestion (builtin→Phase-38 delegation; harbor skill import/rm; Directory disposition decision)internal/skills + internal/tools/builtin + cmd/harbor§6.7, §837, 38, 39, 40, 41, 107c85%Shipped (V1.1.x)
111eTrajectory compression consumer (LLM-backed planner.Summariser + RunLoop MaybeCompress + token_budget wiring)internal/llm/summarizer + internal/planner + internal/runtime/steering + cmd/harbor§6.2, §6.546, 35, 107, D-192 fix85%Shipped (V1.1.x)
111fTelemetry assembly + approval-gate authorizer seam (telemetry.New in production; BridgeBusToTracer; protocolauth out of approval)internal/telemetry + internal/tools/approval + internal/runtime/steering + cmd/harbor§6.14, §6.4, §5.103, 04, 05, 31, 55, 56, D-192 fix85%Shipped (V1.1.x)
112aThe public SDK facade (sdk/ alias-based re-export tree per RFC §3.6)sdk/ (new top-level)§3.6, §1110a-d, 111a-f, D-204n/aShipped (V1.2)
112bExternal consumers on the facade + the external-module compile gate (scaffold templates, harbortest vocabulary, recipes/README, the standing gate)cmd/harbor/scaffold + harbortest + docs§3.6, §8112an/aShipped (V1.2)
113aProtocol adoption track — generated contract reference (cmd/harbor-gen-protocol-docs + protocol-docs-gen-check gate) + the executed quickstart + choreographies 1–3 + nav/README/§18cmd/harbor-gen-protocol-docs + docs/site + workflows§5, §3.6103, 58, 59, 60, 61, 62, 110c70%Shipped
113bProtocol adoption track — pause + versioning choreographies, build-a-client (worked event-viewer + compile gate), conformance-certification pagedocs/site + examples/protocol-clients§5, §3.3, §6.3113a, 50, 72e, 111b, 111c, 84an/aShipped
114Steering verified-identity authority (control surface derives caller scope + tenant from the verified ctx, not the request body — closes a steering privilege escalation)internal/protocol§6.3, §5.552, 55, 5685%Shipped (V1.1.x)
115Production JWT verification (JWKS) + harbor serve (the JWKSURL/JWKSFile config fields gain a consumer; a production auth path beyond the dev signer)internal/protocol/auth + cmd/harbor + internal/config§5.5114, 55, 56n/aShipped (V1.1.x)
116Non-admin session-scoped token contract (lesser-privileged tokens — the steering-authority consumer that makes 114 load-bearing; safe session_user derivation)internal/protocol/auth + internal/protocol§5.5, §6.3114, 115n/aShipped (V1.1.x)
117Chat module encapsulation hardening (self-contained theming contract + font-family inheritance + host/theme parameterization per D-091; no Console look-and-feel leakage)web/console§7109b, 108n/a (Console)Shipped (V1.1.x)
118Protocol TS lockstep gate (cmd/harbor-protocol-ts-lockstep emits a committed wire manifest; protocol-ts-gen-check VERIFIES the hand-maintained per-page TS client field-by-field — D-093's "generate" half deferred, D-223; generator name reserved)cmd/harbor-protocol-ts-lockstep + web/console + workflows§5113an/aShipped (V1.1.x)

V1 critical path: phases 01–82 + 26a + 36a + 36b (85 phases beyond skeleton). Post-V1 follow-ups: phases 83–84, 86–100, plus the lettered bands 83a–e (ReAct prompt depth + reasoning-channel decoupling) and 85a–j + 85m (MCP client/host compliance — the prioritised first post-V1 work; 85k is the separate Harbor agent-builder skills phase). The integer phase 85 (Skills Portico provider driver) was removed; the 85-band is now MCP compliance. Per the MCP 2026-07-28 RC re-plan (2026-05-28) the 85-band re-shapes: 85a / 85b / 85f are ready now; 85d / 85m revisit after SDK-RC (≈ Aug 2026); 85g / 85j revisit after RC-final (2026-07-28); 85c / 85e / 85h / 85i are cut. Governance is 91–96, Multimodal-output 97–99, Recipe loader 100. The next release tag is V1.1.x — both the hygiene + positioning + UX band (101–104 + 108) and the Playground-depth band (105 + 106 + 107 + 107a + 107c + 107d) roll up under it; the previously-sketched V1.2 / V1.3 splits collapse. Phases 105–107c ship with this release: Console first-attach UX (105), Playground real assistant response (106), the streaming completion pipeline (107), reasoning trace projection (107a), and native tool-calling + deferred tools/skills + search meta-tools (107c) — the four built-in *_search/*_get meta-tools plus the optional declarative_action escape-hatch tool preserving brief 07's prompt-engineered path for weaker models. The 107b streaming answer extractor was deliberately superseded by 107c (one cutover instead of stop-gap-then-replace); the file at docs/plans/phase-107b-streaming-answer-extractor.md is kept as historical context. Phase 107d (shipped) is the native-tool-calling follow-up that closes 107c's documented serialization carve-out: it wires the already-shipped internal/runtime/parallel.Executor (Phase 47 / D-056) into the dev ToolExecutor, flips the React planner to native CallParallel emission for N>1 tool-calls, and pins the JoinKind-collapses-to-JoinAll-on-native semantic (D-169). Phase 107e (pending) closes the last ErrDecisionShapeUnsupported carve-out the dev ToolExecutor carries: it wires planner.SpawnTask + planner.AwaitTask dispatch through the already-shipped tasks.TaskRegistry (Phase 47 / D-056) and teaches the per-task RunLoop driver to drive KindBackground tasks (closing the D-097 dead-task gap for the background kind), bounded by a new planner.absolute_max_spawn_depth recursion cap; on the synchronous V1.1.x runloop a retain-turn spawn blocks in-decision and a non-retain-turn spawn is joined by an explicit AwaitTask (eager push wake-on-resolution is a documented steering-runloop follow-up). SpawnTask + AwaitTask dispatch land together per §13 (D-170). Phase 108 starts a 14-round page-by-page visual-polish series (one phase per Console page, anchored to docs/design/console/page-*.md + docs/design/console/CONVENTIONS.md) and is the largest piece still pending under V1.1.x. Background context for the native-tool-calling cutover: research brief 15. Immediately after Phase 108, the three-phase "MCP Apps host" wave 109a–c lands (D-172): 109a (MCP Apps runtime + Protocol surface — _meta.ui.resourceUri parse, ui:// projection, mcp.servers.read_resource, real DisplayMode negotiation, app-tool-call proxy), 109b (Console sandboxed-iframe host + the official ext-apps AppBridge in manual-handler mode + inline DisplayMode), 109c (fullscreen-tab + pip-split DisplayMode layout). This wave deprecates and supersedes Phase 85g, pulling MCP Apps forward from the post-V1 85-band: Apps is a stable independent extension (io.modelcontextprotocol/ui), not gated on the July RC, and ships an official host bridge that removes 85g's hand-rolled-bridge risk. The architectural invariant is D-173 — the AppBridge runs in manual-handler mode and every app→host call is Protocol-proxied through the Runtime, never a direct MCP connection, so an in-iframe app stays inside the (tenant, user, session) isolation boundary and the unified approval/OAuth gates. The 14-round page-polish series continues from the next free integer after the 109 band; the band precedes it in execution order, it does not displace it. Live Runtime reframe (2026-06-01, D-177): after 108d shipped the topology-first Live Runtime page, an operator review found it low-value and Playground-overlapping on the dominant planner/RunLoop runtime (no engine graph). Phase 108e supersedes the topology-first composition (D-126) with a single-runtime capability-adaptive cockpit — the runtime's advertised runtime.info capabilities compose the page (an always-present spine + capability-gated topology / health / cost panels), so it is full on a planner runtime and richer on engine/multi-agent shapes with no rebuild. Plan: docs/plans/phase-108e-live-runtime-capability-cockpit.md. Protocol auth-hardening sequence (114–116, D-219): a planning + adversarial review of the Protocol surface found a steering-control privilege escalation — dispatchControl derived caller scope + tenant from the request body instead of the verified context identity, so a caller could assert scope:"admin" in the body and the cross-tenant gate could never fire. Phase 114 (shipped) closes it: the control surface now reads authority from identity.From(ctx) + the JWT scope claims, fails closed when no verified identity is present, and a non-admin caller can steer only runs it owns (admin for cross-tenant). 114 is the prerequisite hardening for the lesser-privileged-token work: Phase 115 adds production JWKS verification + a harbor serve auth path (giving the inert JWKSURL/JWKSFile config fields a consumer), and Phase 116 introduces the non-admin session-scoped token contract — the consumer that makes 114's derivation load-bearing and the seam where the session_user tier becomes safe to grant. Independently, Phase 117 hardens the chat module's encapsulation boundary (D-091) so it renders self-contained — its own theming contract, font-family inheritance, and host/theme parameterization — with no Console look-and-feel leakage, and Phase 118 builds the long-tracked protocol-ts-gen-check gate as a field-level lockstep VERIFICATION of the hand-maintained per-page TS client against a committed, Go-generated wire manifest (cmd/harbor-protocol-ts-lockstep) — a D-093 deviation (D-223): the "generate" half (per-domain generated TS type modules) is a deferred future phase and the cmd/harbor-gen-protocol-ts name stays reserved for it.

Shipped* (Phase 73): the phase was dissolved — its surface was decomposed across the Console page phases that consumed each slice; the methods with no V1 consumer are deferred post-V1. See the Phase 73 detail block and D-133.


Per-phase detail

Format: Phase NN — Name (RFC §X.X). Each entry is the stub the per-PR plan file expands. Acceptance criteria are binding once the phase ships.

01 — Identity & isolation triple (RFC §4)

Goal. Provide the identity package: Identity{TenantID, UserID, SessionID}, From / MustFrom / With(ctx). The triple flows through every layer. Acceptance. MustFrom panics in handler-only paths; From returns ok-bool elsewhere; round-trips through JWT claims and JSON; identity scopes can be derived (admin / console:fleet). Smoke. phase-01.sh asserts the package exists and tests pass; no protocol surface yet. Tests. Unit + property (round-trip). Risks. None significant.

02 — Configuration loader (RFC §10)

Goal. YAML + env + flag layering; per-key annotation restart_required vs live; structured validation errors that point to the offending source. Acceptance. Loader returns typed Config; missing required keys fail with file:line; examples/harbor.yaml round-trips. Smoke. harbor validate --config examples/harbor.yaml returns 0 (subcommand auto-skip until phase 68). Tests. Unit on layering precedence; golden tests on validation errors.

03 — Audit redactor (RFC §6.4, §6.15)

Goal. A single audit.Redactor that summarizes/truncates/redacts payloads before persistence or emission. Used by Logger, EventBus persistence, tool audit. Acceptance. Redactor handles nested maps, byte arrays, secret-shaped strings (bearer/api-key/jwt), and oversize payloads; configurable allowlist/denylist; audit emits audit.redacted events for inspection. Smoke. N/A (library only). Tests. Unit + golden (fixed-input fixed-output).

04 — slog Logger + standard attribute set (RFC §6.14)

Goal. Logger wrapper around log/slog; pinned attribute set (tenant_id, user_id, session_id, run_id, task_id, trace_id, span_id, tool); JSON in production, text in dev; emits a paired runtime.error bus event on Error. Acceptance. Loggers accept WithIdentity(Identity); no log carries unredacted secret payloads (uses phase 03); CLI flag --log-format=text|json selects handler at process start. Smoke. N/A. Tests. Unit; integration with phase 03 redactor. Deps. 03.

05 — Event taxonomy + InMem EventBus + isolation (RFC §6.13)

Goal. Event, EventType (exhaustive sealed enum), EventPayload sealed interface, EventBus.Publish/Subscribe, Filter with server-enforced identity gates. In-memory MPSC ingress + per-subscriber bounded fan-out + drop-oldest with bus.dropped events. Acceptance. Subscribe rejects filters that elide the identity triple unless the caller has admin scope; identity-scope mismatches are audited; cardinality lint check fails CI on RunID/TraceID metric labels. Smoke. phase-05.sh asserts EventType exhaustiveness via go test; protocol smoke skips. Tests. Unit + fan-out + drop-policy + cross-tenant isolation; goroutine leak test. Deps. 01, 03.

06 — Bus replay + ring buffer + cursor (RFC §6.13)

Goal. Replay(from Cursor, filter) against an in-memory ring (default 10k events, configurable). Cursor = (SessionID, Sequence); gap-free guarantee within a RunID. Acceptance. Late subscriber resumes cleanly; no duplicates; documented loss when ring overrun (durable log handled in phase 57). Tests. Unit + concurrency (subscribe-during-publish); idle-subscription reaper test. Deps. 05.

07 — StateStore iface + InMem + conformance suite (RFC §6.11, §9)

Goal. Single mandatory StateStore interface (no Supports* ceremony). InMem driver. conformance.RunSuite(t, factory) covering save/load/idempotency/identity-mandatory/cross-tenant-isolation/cross-session-isolation/concurrency/leak. Acceptance. InMem passes the suite; the suite is the gate every later driver must pass; documented EventID (ULID) idempotency. Smoke. N/A. Tests. Unit + the conformance suite itself. Deps. 01, 03.

08 — SessionRegistry + lifecycle + GC (RFC §6.9)

Goal. SessionRegistry over phase 07 store. Open/get/touch/close/inspect/GC. Identity triple captured on Open and immutable; reopen-after-close rejected; GC sweeps idle sessions but never reaps RUNNING. Acceptance. Defaults: idle 24 h, hard cap 30 days, sweep 15 min; configurable via GCPolicy. Tests. Unit + integration; cross-tenant isolation test on Open. Deps. 01, 07.

09 — Envelopes, Headers, Identity quadruple (RFC §6.1)

Goal. Envelope{Payload, Headers, RunID, SessionID, Timestamp, DeadlineAt, Meta}. Headers{TenantID, UserID, Topic, Priority}. RunID is the runtime concurrency boundary; TraceID reserved for OTel. Acceptance. WithRunID returns a copy; (Tenant, User, Session, Run) round-trips through JSON; Meta last-write-wins on collision (until merge function lands as RFC follow-up). Tests. Unit + JSON round-trip. Deps. 01, 08.

10 — Engine + workers + cycle detection (RFC §6.1)

Goal. Engine with one goroutine per node, bounded channels per adjacency (default 64), cycle detector at construction (AllowCycle opt-in), Run / Stop / Emit / Fetch. Egress dispatcher always-on. Acceptance. Linear graph end-to-end works; Stop joins all workers; goroutine-leak test passes; cycle detector rejects without AllowCycle. Smoke. harbor dev boots an empty engine; /healthz returns 200 (gated by phase 64). Tests. Unit + integration + leak. Deps. 09.

11 — Reliability shell (RFC §6.1)

Goal. Per-node NodePolicy{Validate, TimeoutMS, MaxRetries, BackoffBase, BackoffMult, MaxBackoff}. RunError{Code, Message, Cause, Metadata}. Errors route to Protocol unconditionally; egress emission is opt-in via engine option. Acceptance. Timeout produces RunError(NodeTimeout); retries respect MaxRetries; validate=both rejects malformed envelopes. Tests. Unit on backoff math; integration per error code. Deps. 10.

12 — Streaming + per-run capacity backpressure (RFC §6.1)

Goal. StreamFrame{StreamID, Seq, Text, Done, Meta}. EmitChunk honors per-run capacity waiters keyed by RunID. Backpressure baked in, not bolted on — the seam closes the predecessor's deadlock-under-streaming gap. Acceptance. N parallel runs × K frames each: ordering preserved per StreamID; no cross-run deadlock; goroutine-leak under streaming returns to baseline after Stop. Tests. Integration + concurrency + leak. Deps. 10, 11. Risks. This is Brief 01's "must bake in." Don't accept a "we'll add it later" PR.

13 — Cancellation + per-run fetch dispatcher (RFC §6.1)

Goal. Cancel(runID) is idempotent, drops queued envelopes for that run only, cancels in-flight invocations, drains per-run egress. FetchByRun(runID) demuxes via per-run dispatcher (always-on, no dual mode). Acceptance. Two concurrent runs; cancelling one leaves the other completing; FetchByRun never returns frames from another run. Tests. Concurrency + property (cancel idempotency). Deps. 10, 12.

14 — Routers + concurrency utils + subflows (RFC §6.1)

Goal. PredicateRouter, UnionRouter, RoutePolicy, MapConcurrent, JoinK, Subflow(factory, parent, opts...) (mirrors parent cancellation; runs to first egress payload). Acceptance. Each pattern matches its specified behavior; subflow cancellation mirrors parent. Tests. Integration per pattern. Deps. 10, 11.

15 — SQLite StateStore driver (RFC §6.11, §9)

Goal. modernc.org/sqlite (CGo-free), WAL journal, forward-only migrations under internal/state/sqlite/migrations/. Acceptance. Passes the phase 07 conformance suite end-to-end; clean DB starts cleanly; existing DB at version N migrates to N+1 idempotently. Tests. Conformance suite + migration tests. Deps. 07.

16 — Postgres StateStore driver (RFC §6.11, §9)

Goal. pgx/v5/stdlib-backed state.StateStore, embedded forward-only migrations gated by pg_advisory_lock for safe multi-replica boot, opaque BYTEA payloads (per RFC §6.11 + D-027 — superseding the older brief 05 §1 "JSONB payloads" narrative). Acceptance. Passes the phase 07 conformance suite end-to-end; CI matrix exercises against a containerized Postgres. Tests. Conformance suite + migration tests (clean-start, idempotency, advisory-lock concurrent boot) + Postgres-specific concurrent-reuse stress. Deps. 07.

17 — ArtifactStore iface + InMem + Filesystem drivers (RFC §6.10, §9)

Goal. Mandatory routing above heavy-output threshold (default 32 KB, runtime-configurable, per-tool overridable). ScopedArtifacts facade auto-stamps identity. Content-addressed IDs. Acceptance. Re-uploading identical bytes returns the existing ref; cross-scope reads rejected; NoOp fallback explicitly absent. Tests. Unit + isolation; dedup test. Deps. 01, 07.

18 — ArtifactStore SQLite-blob + Postgres-blob (RFC §6.10, §9)

Goal. Persistent artifact lifetimes that survive restart; same conformance suite as InMem + FS. Acceptance. Bytes round-trip; deletion is scope-checked; size enforcement matches thresholds. Tests. Conformance suite. Deps. 17, 15, 16.

19 — ArtifactStore S3-style driver (RFC §6.10)

Goal. S3-compatible driver behind the same interface (suitable for MinIO/AWS/R2/GCS-via-compat). Acceptance. Conformance suite; lifecycle integration; presigned-URL GetRef path. Tests. Conformance + integration against MinIO container. Deps. 17. Risks. V1 stretch — can slip to V1.1 if calendar pressure builds.

20 — TaskRegistry iface + InProcess + lifecycle (RFC §6.8)

Goal. Single TaskID namespace unifying foreground + background; lifecycle state machine (PENDING → RUNNING → COMPLETE, with PAUSED → RUNNING, FAILED|CANCELLED terminal); idempotency via IdempotencyKey; cancellation propagates per PropagateOnCancel. Acceptance. Spawning with same IdempotencyKey returns same handle; cascade vs isolate behave per spec. Tests. Unit + concurrency + isolation. Deps. 01, 07.

21 — TaskGroup + retain-turn + patches (RFC §6.8)

Goal. Group resolution/sealing/cancel/apply; retain-turn semantics block foreground until group completes; ApplyPatch for human-approved context patches; AcknowledgeBackground. Acceptance. Group sealing freezes membership; retain-turn correctly blocks; patches transition through pending → applied/rejected. Tests. Integration; group lifecycle property tests. Deps. 20.

22 — MessageBus + RemoteTransport contracts (RFC §6.12)

Goal. Contract definitions + in-process MessageBus (loopback) + RemoteTransport capable of A2A. Publish is at-least-once; handlers idempotent on (TaskID, Edge, EventID). No durable distributed driver at V1. Acceptance. In-process loopback delivers; RemoteTransport returns request/reply and stream with final done=true. Tests. Unit + integration; contract tests for distributed driver (skip when no driver wired). Deps. 09, 20.

23 — MemoryStore iface + InMem + conformance (RFC §6.6)

Goal. MemoryStore interface with mandatory identity (require_explicit_key=true, no opt-out). Strategy=none only. Conformance harness includes fail-closed-on-missing-SessionID test. Acceptance. Missing identity fails closed + emits audit event; InMem passes the suite. Tests. Conformance suite. Deps. 01, 07.

24 — Memory strategies (RFC §6.6)

Goal. Add truncation and rolling_summary. Health states healthy → retry → degraded → recovering → healthy. Summarizer is an injectable Summarizer interface (LLM call lives in phase 32+). Acceptance. Strategy matrix tested; degraded mode falls back to recent-window + queues recovery loop bounded by RecoveryBacklogMax; memory.health_changed events emitted. Tests. Strategy matrix + property + integration with a stub summarizer. Deps. 23. Status. Shipped (D-035 — OverflowDropOldest-only enum, bounded recovery loop with memory.recovery_dropped overflow emit, retry/backoff/cadence constants not exposed as config; phase plan phase-24-memory-strategies.md).

25 — SQLite + Postgres memory drivers (RFC §6.6, §9)

Goal. Persistent memory state across restarts; same conformance suite. Acceptance. All three drivers (InMem, SQLite, PG) pass; Snapshot/Restore round-trips byte-stable. Tests. Conformance + Snapshot round-trip. Deps. 23, 15, 16.

26 — Tool catalog core + InProcess registration (RFC §6.4)

Goal. Tool, ToolDescriptor, ToolCatalog, ToolProvider interfaces + the ToolPolicy reliability shell (D-024). In-process registration via Go generics + reflection (schemas derived from input/output types) — tools.RegisterFunc(name, fn, opts...) is the minimum-expression API. CatalogFilter keyed on (tenant, user, session) triple plus GrantedScopes. Argument validation at the catalog edge using santhosh-tekuri/jsonschema. Dispatcher wraps every invocation in the ToolPolicy shell (timeout / retry-with-exponential-backoff / validation) regardless of transport — so even a zero-config RegisterFunc is production-resilient. Acceptance. A registered Go function appears in cat.List(filter) for the matching identity; arg validation produces typed tool.invalid_args events on failure; default ToolPolicy (zero-value) yields a 3-retry / 100ms→30s exponential backoff / 30s timeout shell on transient errors; tools.WithPolicy(...) overrides each axis. Tests. Unit (filter combinations + ToolPolicy default firing); integration; concurrency (N concurrent calls under a misbehaving tool — backoff respected). Deps. 01, 05, 09.

26a — Flow-as-Tool registration + per-flow Budget (RFC §6.1, §6.4, D-023)

Goal. flow.Definition shape (entry/exit nodes, node specs, optional intrinsic Budget). flow.Compose(def) → Engine builds a runnable engine reusable across invocations. flow.RegisterAsTool(catalog, def, eng) wires the Engine into the Tool catalog with Transport: Flow and schemas derived from entry/exit types. Per-flow Budget (deadline / hop-budget / cost-cap) composes with parent run + identity-tier ceilings via min(); whichever fires first aborts the flow with ErrFlowBudgetExceeded. Reliability shell: per-node NodePolicy from §6.1 still applies inside the flow; no double-wrapping. Acceptance. A 3-node flow registers as a Tool whose schema reflects entry-input → exit-output; planner invokes it through the standard dispatcher; per-flow budget exceedance emits flow.budget_exceeded and produces ErrFlowBudgetExceeded; identity-tier governance can still abort the same flow via ErrBudgetExceeded. Tests assert both abort paths fire correctly under contention. Tests. Unit (Definition validation; min() composition math). Integration (flow-as-tool round-trip via planner mock; budget-exceedance events). Concurrency (parallel flow invocations don't bleed budget state across runs). Smoke additions. flow.budget_exceeded event observable; ErrFlowBudgetExceeded mappable to a tool.error payload. Coverage target. internal/runtime/flow: 85%. Deps. 14 (subflows + reliability shell), 26 (tool catalog + ToolPolicy). Briefs. brief 01 §6.1 / §6.5 (subflow lifecycle and reliability shell). Risks. Budget-composition math under concurrent flow invocations — must be lock-free / atomic, same pattern as 36a's accumulator. Document. RFC anchor. §6.1 (Flow-as-Tool subsection) + §6.4 (Flow transport variant).

27 — HTTP tool driver (RFC §6.4)

Goal. Inline (RegisterHTTPTool(name, method, urlTemplate, ...)) and out-of-process via UTCP-style manifest. Static auth (API key, bearer, cookie). Retry + rate-limit handling. Acceptance. Both inline + manifest paths drive the same ToolDescriptor; integration against httptest.Server. Shippedinternal/tools/drivers/http exports RegisterHTTPTool, LoadManifest, RegisterManifest, three AuthKinds; URL/body/header templates use text/template with urlquery escaping and reject {{ .Auth.* }} references at load time (AGENTS.md §7 — no credential passthrough). Retry-After (seconds-integer + HTTP-date) honoured before returning the rate-limit error so the policy shell's exponential backoff stacks on top — driver consumes ONE retry budget per Invoke (D-024 no double-wrap). 4xx maps to ErrToolInvalidArgs (planner-reformulation channel); 5xx + transport errors are transient. ToolsConfig.HTTPManifests []string added to internal/config. Coverage: 88% (target 85%). D-025 concurrent-reuse test exercises N=128 invocations against a shared httptest.Server under -race; no context bleed, no goroutine leaks. Tests. Integration; retry test. Deps. 26.

28 — MCP southbound driver (RFC §6.4)

Goal. Go MCP client over stdio + streamable-HTTP + SSE. Auto-detect via MCPTransportMode = Auto | SSE | StreamableHTTP. Tool/resource/prompt mapping into Tool. Transport-level reconnect lives in ToolPolicy (D-024 retry shell), not in a parallel state machine inside the driver (D-037). Acceptance. Mock MCP server (in-process) integration tests pass; resource subscriptions emit a separate event topic (mcp.resource_updated). Tests. Integration + transport-fallback test; D-025 concurrent-reuse (N=100) against the in-process mock server pair. Deps. 26. Implementation note. Wraps github.com/modelcontextprotocol/go-sdk@v1.6.0 — the official Go SDK. Auto-mode fallback (streamable-HTTP → SSE) lives at Provider.Connect, not at Transport.Connect, so failures during the MCP initialize handshake (a client.Connect error) trigger the fallback the same as transport-level connect errors. See docs/decisions.md D-037.

29 — A2A southbound driver (full spec) (RFC §6.4)

Goal. Agent Card discovery (GET /.well-known/agent-card.json); JSON-RPC message/send, message/stream (SSE), tasks/get, tasks/cancel, tasks/pushNotificationConfig/*. Registry with route scoring (trust tier, latency tier, capability match). Acceptance. Mock A2A server integration (full Agent Card); registry resolves remote skills; A2A peers appear as Tool entries via ToolProvider. Tests. Integration + spec-compliance suite. Deps. 26, 22.

30 — Tool-side OAuth + HITL via pause/resume (RFC §6.4, §3.3)

Goal. TokenStore interface (InMem + SQLite + Postgres drivers) with encryption-at-rest for token material. OAuthProvider covering both user-bound and agent-bound binding scopes — BindingScope is a declared config field, not inferred. On tool.auth_required, the tool driver emits a typed ErrAuthRequired carrying a structured payload (provider, scope, binding-scope, flow-initiation URL); the runtime pauses via the unified pause/resume primitive (phase 50). Resume reattaches the token; A2A AUTH_REQUIRED converges on the same primitive. Authorization flows use PKCE; RFC 7591 dynamic client registration and authorization-server metadata discovery are supported. Agent-bound tokens are keyed by the Agent Registry's registration agent_id (phase 53a, D-059) — never by an isolation-tuple element, since agent_id is not part of the isolation tuple. Acceptance. OAuth full pause/resume cycle round-trips for both binding scopes; A2A AUTH_REQUIRED triggers an identical event shape; ErrAuthRequired payload is typed and audit-redacted (no raw token material in events); PKCE challenge/verifier round-trips; dynamic registration + discovery exercised against a test authorization server; token material is encrypted at rest (driver conformance asserts ciphertext on disk); admin-scope authz gates protect provider configuration; cross-tenant / cross-user / cross-agent isolation conformance — one identity's tokens never resolve for another; user-bound and agent-bound tokens coexist for the same tool without collision; initiate-then-cancel emits no goroutine leak. Tests. Integration end-to-end (both binding scopes); conformance with phase 50; isolation conformance (cross-tenant/user/agent); encryption-at-rest driver conformance; goroutine-leak (initiate-then-cancel). Deps. 26, 50, 53a. Briefs. brief 09 (docs/research/09-mcp-oauth-from-bifrost.md) — documents bifrost's OAuth surface (OAuth2Provider, OAuth2Config, OAuth2Token, OAuth2FlowInitiation, MCPUserOAuthRequiredError, MCPClientConfig OAuth fields) as a Go-shaped reference for what to lift, what to leave, and what Harbor must add. Bring back into the conversation when authoring the per-phase plan file (§"Re-discussion checklist" at the bottom of the brief). §4.3 deviation (shipped). The master-plan line "TokenStore (InMem + SQLite + Postgres drivers)" was implemented as a typed wrapper over the existing state.StateStore §4.4 seam (D-027) — the same approach Phase 50 (D-067) and Phase 53a (D-068) took for their persistence layers. Driver pluralism (in-mem / SQLite / Postgres) is inherited from the StateStore triad; the Phase 30 conformance suite runs the same TokenStore assertions against every StateStore driver to prove parity. This avoids the §13 two-parallel-implementations smell. Documented in D-083.

31 — Tool-side approval gates (RFC §6.4, §3.3)

Goal. Synchronous "approve this tool call" gates using the same pause/resume primitive — distinct from OAuth, simpler payload shape. Acceptance. APPROVE/REJECT round-trip via the protocol; reject path raises typed tool.rejected events. Tests. Integration. Deps. 30. §4.3 deviation (shipped). The master-plan row's owning-subsystem tools/auth was the right home for "approval as another consumer of the OAuth machinery." The implementation chose a SIBLING package internal/tools/approval under internal/tools/ so the approval gate has zero OAuth baggage (no TokenStore, no Sealer, no PKCE / RFC 7591 / discovery surface — none of which an HITL approval gate needs). The two siblings (auth/ + approval/) share the Coordinator + bus + redactor seams via the public pauseresume / events / audit packages; nothing else. The master-plan row's subsystem column was updated tools/auth → tools/approval in the same PR. Documented in D-086 §1 ("the approval-gate package is a SIBLING of internal/tools/auth, not a subpackage"). Settled decisions: D-086. See also. docs/plans/phase-31-tool-approval-gates.md.

32 — LLM client core (RFC §6.5)

Goal. LLMClient interface — one method, Complete(ctx, req) (resp, error). CompleteRequest carries Messages whose Content is a sum-type (Text *string for the common case, or multimodal Parts []ContentPart for image/audio/file inputs — D-021), optional ResponseFormat, optional OnContent/OnReasoning streaming callbacks, cancellation via ctx, reasoning-effort hint. No Tools, no ToolChoice, no FunctionCall — tool dispatch lives in the runtime (RFC §6.4 "Code-level tool dispatch"). Inline DataURL content above the heavy-output threshold is auto-materialized to ArtifactRef before persistence/emit (D-022). Context-window safety net (D-026): a catch-all pass at the LLM-client edge walks the assembled CompleteRequest immediately before the driver call and (a) fails loudly with ErrContextLeak if any message field carries raw bytes/strings ≥ heavy-output threshold that aren't ArtifactStub-shaped, (b) estimates total tokens against the model's configured context limit and fails with ErrContextWindowExceeded when the estimate is within ContextWindowReserve (default 5%) of the cap. V1 fails loudly; auto-cascade is post-V1. Acceptance. Mock LLM client passes round-trip with text-only AND multimodal payloads (text + image part). Cancellation aborts streaming cleanly. Interface compiles without any tool-calling type ever appearing in internal/llm/.... Auto-materialization of oversized DataURL content is observable via llm.image.materialized event. Safety-net catch-all pass exists; planted-leak test (a deliberately-buggy producer that emits ≥-threshold raw bytes) triggers ErrContextLeak + llm.context_leak audit event. Token-budget test (a synthetic huge prompt) triggers ErrContextWindowExceeded cleanly with a reservedness margin matching config.Tests. Unit + integration with mock (text + multimodal); assert no Tool* symbol leaks into the LLM package; auto-materialize threshold test; planted-leak test (raw bytes survive a producer); token-budget test (synthetic big prompt); ArtifactStub round-trip test (a stub renders to the model-agnostic JSON shape and parses back).Deps. 09.

33 — bifrost integration (RFC §6.5, §11 Q-3)

Goal. Wire github.com/maximhq/bifrost/core (pure Go LLM gateway library) behind LLMClient. Implement a thin Driver adapter that translates Harbor's CompleteRequest ↔ bifrost's BifrostChatRequest / BifrostChatResponse, and a minimal schemas.Account providing API keys. Translation includes multimodal ContentParts (D-021): map Harbor's ImagePart/AudioPart/FilePart (with URL / DataURL / Artifact supply forms) to bifrost's per-provider content shapes; auto-materialize oversized DataURL content to ArtifactRef (D-022) before sending. Bifrost's Tools / ToolChoice parameters are intentionally NOT used — Harbor's runtime owns tool dispatch (RFC §6.4). Q-3 is resolved; this is a normal implementation phase, not a decision gate. Acceptance. Six-provider smoke green: basic chat + json_object response_format + streaming with content callback + ctx cancellation accepted by the runtime + token usage parsed + cost parsed + one multimodal text+image round-trip against a vision-capable model. Driver registers via init() blank-import per AGENTS.md §4.4. The driver package contains zero references to bifrost's Tools / ToolChoice types. Tests. Unit (request/response translation); integration with mock; six-provider live conformance test (gated behind HARBOR_LIVE_LLM=1 so CI does not burn API credits by default — the local dev loop and harbor dev do exercise it). Deps. 32. Risks. Bifrost requires Go 1.26+; Harbor's go.mod was bumped during validation. Stream-channel close timing on long streams may exceed naive cancel budgets — mitigation is ctx.Done()-driven channel-reader abandonment + goroutine-leak tests. See also. docs/research/08-llm-client-validation.md (full validation report and results).

33a — Custom OpenAI-compatible providers + per-provider timeouts (RFC §6.5)

Goal. Extend Phase 33's bifrost driver so operators can wire any OpenAI-compatible LLM endpoint (NIM, vLLM, ollama, lm-studio, in-house gateways) via harbor.yaml without per-provider Go code. Adds LLMConfig.CustomProviders []LLMCustomProviderConfig (Name / BaseURL / APIKeyEnvVar / Models / per-provider Timeout / retry/backoff/concurrency knobs / RequestPathOverrides) + LLMConfig.NetworkDefaults (global fallthrough for native + custom). When llm.provider names a custom entry, the entry's network knobs apply and legacy llm.api_key / llm.base_url / llm.timeout are ignored. Phase 33a supports only base_provider_type: openai; future phases widen. Acceptance. Account widened to multi-entry (single-PRIMARY contract per D-040 preserved — GetConfiguredProviders returns the one configured primary). GetConfigForProvider returns *ProviderConfig with CustomProviderConfig.BaseProviderType = schemas.OpenAI when the primary is a custom entry. Missing env var fails closed at New with ErrMissingAPIKey naming the var. httptest integration (happy / timeout / 5xx) green. D-025 N≥100 concurrent stress green on mixed config. No tool-call API symbol leak (extends Phase 33 static guard). Tests. Unit (custom-provider construction + validation; NetworkDefaults fallthrough + per-provider override; native-and-custom coexist). Integration (httptest.Server mimicking OpenAI-compatible /v1/chat/completions: happy + 5xx + timeout). Concurrency (D-025 mixed config). Smoke scripts/smoke/phase-33a.sh. Deps. 33. Risks. Operator-facing BaseURL gotcha — bifrost's OpenAI provider appends /v1/chat/completions; operators set the host root, not the full /v1 path. Documented in yaml + the wire-test asserts the correct path. Sub-second timeouts get rounded down to 0 by bifrost's int(seconds) cast — practical minimum is 1s today; widening waits for a NetworkConfig API rev. Corrections (Phase 34) match by model-name prefix; custom-provider model names are typically unprefixed — operators declare ModelProfiles[<model>].Corrections explicitly to get quirks applied. Settled decisions: D-042. See also. docs/plans/phase-33a-custom-providers.md.

34 — Provider correction layer + SchemaSanitizer (one mode, baked in) (RFC §6.5)

Goal. A thin correction layer — bifrost already normalizes provider-specific transport quirks across its 23 first-class providers (brief 08), so this phase is NOT a "native vs. LiteLLM" dual-architecture; it is a narrow SchemaSanitizer + message-shape normalizer that lives between the runtime and the LLMClient (NOT inside the client), handling only what bifrost does not. Scope: response_format shape adjustments, reasoning-effort routing for thinking-class models (o1, o3, deepseek-reasoner), schema normalization (additionalProperties: false, strict: true modes), message reordering (NIM), usage backfill (proxies that report 0/0). No use_native toggle — there is one mode, baked in. Scope is structured-output and message-shape correctness only — never tool-call APIs (those don't exist on this layer). Acceptance. Each documented quirk has a passing normalizer test; switching providers does not require a configuration toggle; no tool-call API references in this package; the layer is demonstrably thin — quirks bifrost already handles are NOT re-implemented here. Tests. One unit test per quirk; assert no Tool* symbol leaks. Deps. 33. Briefs. brief 07 (code-level tool calling — runtime owns dispatch, so this layer never touches tool-call APIs), brief 08 (bifrost validation — what the LLM substrate already normalizes, so this phase doesn't).

35 — Structured output strategies + downgrade chain (RFC §6.5)

Goal. OutputMode = Native | Tools | Prompted. Per-provider ModelProfile selects mode. Downgrade chain: json_schema → json_object → text on invalid_json_schema errors. llm.mode_downgraded events. Acceptance. Forced-failure on each step of the chain results in observable downgrade and continued completion. Tests. Integration per provider. Deps. 33, 34.

36 — Retry with feedback (RFC §6.5)

Goal. Validation/parse failures feed back into the planner via LLMClient retry; bounded by MaxRetries; observable. Acceptance. A planner-tagged invalid arg triggers a single LLM retry with corrective sub-prompt; retry count respects bound. Tests. Integration with mock + bounded-loop assertion. Deps. 35.

36a — Cost accumulator + per-identity ceilings (RFC §6.15)

Goal. Subscribe to llm.cost.recorded events; aggregate Usage.Cost.TotalCost by (tenant, user, session) and by model in StateStore-backed accumulators; gate the next call when ceiling exceeded; emit governance.budget_exceeded; fail loudly with ErrBudgetExceeded. Establish the governance.Subsystem interface with PreCall/PostCall hooks wrapping the LLMClient driver. Acceptance. Three-driver conformance (in-mem / SQLite / Postgres) green for accumulators. Ceilings settable via config (Protocol-driven setters land post-V1 phase 91). Ceiling exceedance emits governance.budget_exceeded with the identity triple; runtime can route to the unified pause/resume primitive when configured. Cross-session isolation test passes. Tests. Unit (accumulator math). Integration per driver. Concurrency (N concurrent calls do not overshoot ceiling — atomic / lock-free path documented). Cross-session isolation. Failure-mode (StateStore read failure → fail-loud, no silent permit). Smoke additions. Healthz still 200; governance.budget_exceeded observable when synthesized; config knob round-trip. Coverage target. internal/governance: 85%. Deps. 11 (event bus skeleton — llm.cost.recorded shape lives there). 15 (StateStore SQLite driver — accumulator persistence). 33 (bifrost integration — cost reporting passthrough is the source). Briefs. brief 03 §6 (LLM client surface, cost reporting), brief 06 §3 (event bus + identity-scoped subscriptions). Risks. Concurrent-call ceiling overshoot if accumulator math isn't atomic — the design must be lock-free (atomic add + compare-and-swap) and the test must exercise high-concurrency. RFC anchor. §6.15.

36b — Per-identity rate limits + per-call MaxTokens (RFC §6.15)

Goal. Token-bucket rate limiter per (identity, model) with bucket-state persisted in StateStore so it survives runtime restart. Per-call MaxTokens enforced from the identity's tier in PreCall. Emits governance.rate_limited and governance.maxtokens_exceeded events; fails loudly with ErrRateLimited and ErrMaxTokensExceeded. Acceptance. Bucket fills/drains per config; bucket state survives runtime restart; MaxTokens tier resolved from identity in PreCall and applied to the request before it leaves Harbor; events emitted with identity triple; CLI smoke configures a tiny bucket and asserts the limit kicks in. Tests. Unit (token-bucket math under fast and slow refill rates). Integration per driver. High-concurrency (N concurrent calls — bucket never goes negative; never permits more than capacity). Restart-survival. Smoke additions. governance.rate_limited observable when bucket exhausted; bucket-fill timestamps consistent with config. Coverage target. internal/governance: 85%. Deps. 36a (Subsystem interface + identity scaffolding). Briefs. brief 03 §6 (LLM client surface), brief 06 (event bus). Risks. Token-bucket race conditions under concurrent call paths — must be lock-free. RFC anchor. §6.15.

37 — Skill store + LocalDB driver + FTS5 ladder (RFC §6.7)

Goal. SQLite-backed skill store; FTS5 → regex → exact ranking ladder; CI tests both FTS-on and FTS-off builds. Schema with Origin / OriginRef / Scope / ContentHash. Acceptance. Same scoring constants documented in brief 04 §4.4 produce stable rankings; existing_origin != "pack" short-circuit refuses overwrites. Tests. Unit (golden ranking) + FTS-off-fallback test. Deps. 01, 07, 15.

38 — Skill planner tools (search/get/list) (RFC §6.7)

Goal. skill_search, skill_get, skill_list registered through phase 26 catalog. Capability filter (RequiredTools/Namespaces/Tags ⊆ allowed). PII + tool-name redaction at injection. Tiered budgeter (full → drop optional → cap steps to 3). Acceptance. Filter excludes mismatched skills; redactor strips disallowed names; budgeter fits within max_tokens. Tests. Unit + integration. Deps. 26, 37.

39 — Virtual directory subsystem (RFC §6.7)

Goal. Directory(cfg) API + pinned_then_recent / pinned_then_top selectors; identity-scoped; capability-filtered; redacted before injection. Acceptance. Default max_entries=30, range 1–200; pinned skills always included; selection respects identity. Tests. Unit + property. Deps. 37.

40 — Skills.md importer (RFC §6.7)

Goal. Spec-compliant CommonMark parser; YAML frontmatter; section normalization (## Steps, ## Preconditions, ## Failure modes); attachments resolved as ArtifactRef (option (b) — RFC settled). Round-trip byte-stable. Acceptance. Golden corpus of N spec-compliant Skills.md files imports without source edits and re-exports byte-stable; missing trigger/empty steps fail loudly. Tests. Golden corpus + negative tests. Deps. 37. Risks. This is the predecessor's gap-closer. The byte-stable round-trip is a tested invariant.

41 — In-runtime skill generator with persistence (RFC §6.7)

Goal. skill_propose(persist=true) validates draft, stamps Origin=Generated, OriginRef = "gen:{session_id}:{run_id}", scopes by operator-provided Scope (default project), upserts via store. Conflict policy: refuse to overwrite Origin=PackImport; for Generated→Generated, content-hash gates last-write-wins. Audit is mandatory.Acceptance. Generator persists; subsequent search discovers; audit event emitted on every persist. Tests. Integration end-to-end + isolation (cross-session no-leak unless promoted). Deps. 37, 38, 03.

42 — Planner iface + Decision sum + RunContext (RFC §6.2, §3.2)

Goal. Define Planner.Next(ctx, RunContext) (Decision, error); Decision sum (CallTool, CallParallel, SpawnTask, AwaitTask, RequestPause, Finish); RunContext is the only surface planner sees. Acceptance. Stub planner returning Finish runs end-to-end; planner package imports no Runtime internals. Tests. Conformance harness skeleton; import-graph lint. Deps. 09, 13, 26, 32. Wake-on-resolution contract (D-032). When the planner emits a SpawnTask (or group SpawnTask via the patched surface from Phase 21) WITHOUT retain-turn, it MUST consume tasks.WatchGroup(sessionID, groupID) (<-chan GroupCompletion, func(), error) from internal/tasks to learn when the group resolves. The three wake modes (push, poll, hybrid) are documented at the internal/tasks package godoc; this phase ships the planner-side interface contract that each concrete (45, 48, future) maps onto exactly one mode. The TaskRegistry stays neutral — no WakeMode field, no Supports* capability protocol.

43 — Trajectory + serialise contract (RFC §6.2, §3.4)

Goal. Trajectory.Serialize() (bytes, error) returns (nil, ErrUnserializable{Field:...}) on any non-JSON-encodable entry. No silent-drop path. ToolContext split: serialisable half + handle registry (process-local at V1 — see RFC §6.3). Acceptance. Round-trip is byte-stable; non-serialisable handle returns ErrUnserializable; resume with missing handle returns ErrToolContextLost. Tests. Round-trip + negative cases (per RFC contract). Deps. 42, 07. Risks. This phase closes the predecessor's silent-context-loss bug. The fail-loudly tests are the gate.

44 — Schema repair pipeline (RFC §6.2)

Goal. Salvage → schema repair → graceful failure → multi-action salvage, in internal/planner/repair/. Configurable per concrete (arg_fill_enabled, repair_attempts, max_consecutive_arg_failures). Acceptance. Each step passes its targeted unit test; graceful failure forces Finish{Reason: NoPath, Followup: true} after N consecutive arg failures. Tests. Unit per step + integration with malformed mock LLM responses. Deps. 42, 32.

45 — Reference ReAct planner (minimum viable) (RFC §6.2)

Goal. LLM call loop, JSON-only action format, tool selection, completion detection, single tool call per step. Functional options for the small policy-shaped knobs. Acceptance. 3-step reasoning task succeeds against a mock LLM; planner package has no Runtime imports; planner is concurrent-safe across runs. Tests. Conformance pack (skeleton) + scenario. Deps. 42, 43, 44, 32. Wake mode. ReAct ships the push wake mode (D-032): a non-retain-turn SpawnTask returns control to the runtime; the runtime registers the planner against tasks.WatchGroup; on GroupCompletion the runtime re-invokes Planner.Next with the resolved MemberOutcome slice surfaced through RunContext. The LLM sees the next planner step only after the group resolves — no LLM call burns while children are in flight.

46 — Trajectory compression / summariser (RFC §6.2)

Goal. Configurable summariser invoked by runtime when token_budget exceeded. Produces TrajectorySummary{Goals, Facts, Pending, LastOutputDigest, Note}. Compression is a runtime concern; planner sees only the compacted view. Acceptance. Over-budget trajectory triggers summarisation; summary replaces raw step history in subsequent prompt builds. Tests. Integration with mock summariser. Deps. 43, 32.

47 — Parallel-call execution + ReAct CallParallel/SpawnTask/AwaitTask emission (RFC §6.2)

Goal. CallParallel{Branches, Join} executes branches concurrently; atomic setup validation (any branch's invalid args fails the whole call before execution); parallel-pause atomicity (no branch starts side-effecting tools, or all reach checkpointed observation before pause commits); system cap absolute_max_parallel=50. PLUS the §13 primitive-with-consumer bundle: ReAct upgrades to EMIT CallParallel (delete the Phase 45 D-051 single-tool-call-per-step stop-gap) AND emit SpawnTask / AwaitTask via the two new reserved tool names (_spawn_task, _await_task). Phase 47 closes three primitive-with-consumer gaps in one wave (CallParallel runtime + SpawnTask emitter + AwaitTask emitter). D-056. Acceptance. Atomicity contract holds under fault injection; ordering preserved per-branch; deterministic merge keys (branch index + tool name); 51-branch input fails with ErrParallelCapExceeded; JoinFirstSuccess cancels remainder; JoinN waits for N successes; ReAct emits _spawn_task → runtime spawns real task → group resolves → planner re-enters via RunContext.Trajectory.Background → planner emits Finish end-to-end. Tests. Concurrency + property (atomicity invariant) + spawn → wake → re-entry integration test against real TaskRegistry + EventBus + ArtifactStore drivers. Deps. 45, 14, 42, 20, 21. Wake-mode interaction. ReAct's WakePush declaration (Phase 45 / D-032) is wired end-to-end: a non-retain-turn SpawnTask returns control to the runtime; the runtime registers against tasks.WatchGroup; on GroupCompletion the runtime re-invokes Planner.Next with the resolved MemberOutcome slice surfaced through RunContext.Trajectory.Background. The integration test asserts the round-trip. Parallel-pause atomicity contract surface. Phase 47 ships the stub (ErrParallelPauseUnsupported) — the executor fails loud on a mid-execution pause request. Phase 50 (unified pause/resume primitive) upgrades the path to a checkpointed atomic pause.

48 — Deterministic planner (proves the iface) (RFC §6.2, §11 Q-6)

Goal. A second concrete that exercises a non-LLM Decision shape. Executes a programmatic decision tree without an LLM call. Acceptance. Deterministic planner passes the conformance pack; the same Runtime executes both deterministic and React without changes. Tests. Conformance pack. Deps. 42. Wake mode. Deterministic ships the poll wake mode (D-032): each Planner.Next invocation reads its outstanding group's GroupCompletion via a non-blocking receive on the channel returned from tasks.WatchGroup. If the channel hasn't fired, the planner emits AwaitTask and the runtime sleeps the step until the next deterministic boundary; if it has fired, the planner reads the resolved MemberOutcome slice and proceeds. No LLM, no eager wake — a clean deterministic shape that proves the registry's WatchGroup surface is mode-neutral.

49 — Planner conformance pack (RFC §6.2)

Goal. A shared test pack any Planner implementation must pass: top-20 prompts produce valid Decision against canned tool catalog + LLM mock; respects budget; never panics on malformed LLM output. Acceptance. Pack runs against React and Deterministic; go test ./internal/planner/conformance/... exits 0. Tests. The pack itself. Deps. 42, 45, 48. Wake-mode round-trip (D-032). The conformance pack MUST include a SpawnTask → group completes → planner re-enters → reads MemberOutcome round-trip exercising whichever wake mode the concrete declares (push / poll / hybrid). React validates push; Deterministic validates poll; future hybrid concretes validate hybrid. Failure to wire tasks.WatchGroup is the test's failure mode, not silent deadlock.

50 — Pause/Resume Coordinator + handle registry (RFC §6.3, §3.3)

Goal. pauseresume.Coordinator with Request/Resume/Status. Token is opaque (runtime-owned encoding). Handle registry is process-local at V1 (documented constraint; distributed handle directory deferred — RFC §12). Acceptance. Round-trip pause→serialise→load→resume succeeds; pauses survive Runtime restart only when StateStore-backed checkpoint is configured. Tests. Unit + integration; durability (in-mem / SQLite / Postgres). Deps. 07, 09, 13.

51 — Pause-state serialise contract (fail-loud) (RFC §6.3, §3.4)

Goal. Pause record serialises with format_version: 1 JSON. Non-serialisable handles → ErrUnserializable (no silent nil); missing-on-resume handles → ErrToolContextLost. Acceptance. Negative tests are the gate. CI fails on any silent-drop regression. Tests. Conformance with phase 43 Trajectory.Serialize. Deps. 50, 43. Shipped. internal/runtime/pauseresume/pauserecord.go ships SerializeRecord / DeserializeRecord + the FormatVersion constant. The Phase 43 reflective walker is exported as trajectory.ValidateEncodable and shared (not forked) by the pause-record contract — SerializeRecord walks it, surfacing trajectory.ErrUnserializable rooted at PauseRecord.payload.<key>; DeserializeRecord enforces format_version: 1 (ErrUnsupportedFormatVersion on any other value). Coordinator.Request's Payload-encodability check is unconditional (fails loud with or without a checkpoint store). Negative tests (pauserecord_test.go, pauserecord_contract_test.go, test/integration/phase51_pause_serialise_test.go) are the gate. Coverage 94.0% (target 90%). See D-069.

52 — Steering inbox + control taxonomy (RFC §6.3)

Goal. Per-run inbox owned by Runtime. Nine control event types: INJECT_CONTEXT, REDIRECT, CANCEL, PRIORITIZE, PAUSE, RESUME, APPROVE, REJECT, USER_MESSAGE. Validation/sanitisation at Protocol edge: depth ≤ 6, ≤ 64 keys, ≤ 50 list items, ≤ 4096 chars/string, ≤ 16 KiB total. Per-event scopes per RFC §6.3. Acceptance. Oversize/over-deep payloads rejected at edge; per-event scope mismatch returns 403 + audit. Tests. Unit (validation) + integration (auth scope per event). Deps. 50, 05.

53 — Steering wiring (9 control events) (RFC §6.3)

Goal. Drain-between-steps; planner sees only RunContext.Control. CANCEL hard/soft propagation; PAUSE blocks at next boundary; RESUME unblocks; INJECT_CONTEXT/REDIRECT/USER_MESSAGE visible on next planner step; APPROVE/REJECT advance pause; PRIORITIZE updates task; control-history capped per session. Acceptance. Each event type has a passing integration test; no event applied mid-tool-call. Tests. Integration matrix; concurrency mid-step. Deps. 52, 13. Shipped. internal/runtime/steering/runloop.go ships RunLoop — the per-run planner-step loop, the §13 first consumer of BOTH the Phase 50 pauseresume.Coordinator AND the Phase 52 steering inbox/taxonomy. RunLoop.Run drains the per-run Inbox once per step boundary (apply.go applies the nine control-event side effects; the planner sees only RunContext.Control), routes a planner's RequestPause through Coordinator.Request and blocks via the new Inbox.WaitForEvent (a coalesced 1-buffered notify channel — no busy-spin) until a RESUME/APPROVE arrives, and caps per-session applied-control history (history.go, MaxControlHistory newest-wins ring). Deviation (§4.3): Phase 53 builds the per-run planner loop rather than retrofitting an existing one — internal/runtime/engine is a graph executor, not a planner-step loop; the only Planner.Next driver before Phase 53 was the Phase 49 conformance harness. The loop lives in internal/runtime/steering (its master-plan subsystem); no new top-level directory, no RFC change (RFC §6.3 §4: "the runtime implements this loop"). CANCEL is soft-by-default with an optional WithHardCancelHook seam (no hard import of the engine). The nine-event integration matrix + the §13 pause-Coordinator round-trip + the drain-between-steps invariant test + the concurrency-mid-step test live in test/integration/phase53_steering_wiring_test.go. Coverage 92.4% (target 85%). See D-071.

53a — Agent Registry (registration identity + IDs) (RFC §6.16, §7)

Goal. An in-process, per-runtime-instance registry.AgentRegistry subsystem, StateStore-backed (in-mem / SQLite / Postgres, §4.4 seam). Owns the registration identity of agents and the three-ID model (D-059): a stable agent_id (minted once at first registration, persisted, rehydrated on restart), an ephemeral incarnation (bumps every process start), and a content-derived version_hash (deterministic hash over prompt set, tool set + schemas, planner config, model policy — bumps only when configuration changes). agent_id is a registration identity, not an isolation principal — the isolation tuple stays (tenant, user, session, run) (D-059, CLAUDE.md §6). Handles both creation cases (D-060): locally-hosted agents (the runtime mints a local agent_id) and connect-to-remote agents (the local agent_id is a handle; the canonical identity is the remote A2A AgentCard, owned by the remote operator). Emits agent.* events (agent.registered, agent.restarted, agent.health, agent.drained, agent.deregistered) so the Console Agents page renders runtime state, never Console-local state (D-061). Fleet control (pause / drain / restart / force-stop) is a distinct, more-elevated privilege tier than fleet observation (D-066) — every control command is audit-redacted and emitted. Acceptance. agent_id is stable across restart when a durable StateStore driver is configured (rehydration test); the in-mem driver is dev-only and documented as non-persistent. incarnation bumps on every restart; version_hash bumps iff configuration content changed and is stable otherwise (restart ≠ recreate — restart keeps the record, recreate mints a fresh agent_id). Remote-agent registration stores a handle + AgentCard reference; the handle is runtime-instance-local and never assumed globally unique. agent.* events carry the registration agent_id. Cross-tenant / cross-session isolation conformance — one identity's registry view never bleeds into another. Fleet-control commands require the elevated scope claim and emit audit events; fleet-observation does not. Concurrent-reuse test: N≥100 concurrent registrations / lookups / control commands against one shared AgentRegistry under -race (no data races, no context bleed, no goroutine leaks). Tests. Unit (three-ID model, version_hash determinism, restart-vs-recreate); integration (StateStore-backed rehydration across all three drivers, real events.EventBus on the seam, identity propagation, ≥1 failure mode — missing identity fails closed); conformance (cross-tenant/session isolation); concurrency (D-025 N≥100 reuse stress). Deps. 01, 05, 07, 08. Briefs. brief 09 (agent-as-actor / agent-bound OAuth — the registration agent_id is what Phase 30 keys agent-bound tokens by), brief 11 (operator Console mockup — the Agents page is a runtime lens over this subsystem; console-agents-page.png). Why here. Slotted into the 50–53 band (steering / pause-resume wave) because the earlier runtime-subsystem bands are already shipped; its real dependencies (01, 05, 07, 08) all landed long ago, so it can be implemented any time after them, but it must land before the Protocol surface (54+) and the Console-attaching wave (72–75) that consume it. Settled decisions: D-059, D-060, D-061, D-062, D-066.

54 — Protocol task control surface (RFC §5.2, §6.3)

Goal. Protocol endpoints: start, cancel, pause, resume, redirect, inject_context, approve, reject, prioritize, user_message. Acceptance. All nine endpoints + start round-trip via SSE+REST (phase 60); identity scope enforced. Tests. Smoke phase-54.sh exercises each method. Deps. 50, 53, 20.

55 — OTel traces + propagation (RFC §6.14)

Goal. Tracer wrapper; spans derived from events. Propagation: traceparent HTTP southbound; _meta.traceparent per request for stdio MCP; HARBOR_TRACEPARENT env on stdio spawn. Acceptance. Trace continuity across HTTP and stdio; spans align with run/step boundaries. Tests. Integration with Jaeger/OTLP collector. Deps. 04, 05.

56 — Metrics + OTLP + Prometheus (RFC §6.14, §11 Q-5 settled)

Goal. MetricsRegistry derives from Event.Type / NodeName / Producer only. OTLP exporter default; built-in Prometheus /metrics endpoint at V1. Acceptance. Cardinality-lint test fails CI on RunID/TraceID labels; both exporters emit core counters. Tests. Integration; static cardinality lint. Deps. 55, 05. Deviations (§4.3, see D-076). (1) NodeName / Producer are realised as the reserved Event.Extra["node"] / Event.Extra["producer"] keys — not new events.Event struct fields — because the Phase 05 Event doc already reserves Extra for "Phase 56's bounded low-cardinality metric labels"; no events.Event shape change. (2) The static cardinality-lint flags attribute.* calls only when nested inside metric.WithAttributes(...) — a span's attribute.String("run_id", …) inside trace.WithAttributes is legitimate (D-073) and is left alone; the rule is metric-labels-only. (3) The /metrics endpoint ships as the standalone telemetry.PrometheusHandler http.Handler constructor; the live Runtime server that mounts it at /metrics is the Phase 60+ bootstrap (there is no internal/server/ yet). (4) The master-plan "§11 Q-5" citation: RFC §11's Q-5 is the skill-versioning question; the metrics-exporter question is brief 06 Q-2, resolved by RFC §6.14 — "§11 Q-5" is read as "the §11-tracked metrics-exporter question is settled".

57 — Durable event log driver (RFC §6.13)

Goal. Persists Event records keyed by (SessionID, Sequence) via StateStore. Replay-from-cursor exact across restarts. Acceptance. Late subscriber after Runtime restart sees no gaps; ring buffer mode auto-degrades to "best-effort" with warning. Tests. Integration across all three StateStore drivers. Deps. 05, 07, 15, 16. Downstream (load-bearing). This is not just the Console event-stream backing — it is the hard dependency for the post-V1 Evaluations / agent version-control program (D-064). Evaluations is built on fully replayable sessions ("create eval from session", "mark as test case"); a session is only replayable if its event log is durable and gap-free. Lossy events (ring-buffer-only) in V1 would foreclose Evaluations entirely, since you cannot retrofit completeness into already-shipped sessions. Treat this phase's durability guarantees as binding for that reason, not optional.

58 — Protocol types/methods/errors single source (RFC §5, §8)

Goal. internal/protocol/types/, internal/protocol/methods/, internal/protocol/errors/ are the only definitions. Lint check forbids hardcoded method strings outside methods/. Acceptance. Build succeeds with the lint check active; new methods land only in methods/. Tests. Lint test (CI). Deps. 01. Status. Shipped — D-075. Phase 54 (D-072 §1) already laid the methods/errors/types single-source layout, so Phase 58 is the enforcement: internal/protocol/singlesource ships ScanProtocolTree, a go/parser AST-walking checker, and TestSingleSource_ProtocolTreeIsClean is the build-gating go test (the same AST-lint pattern as internal/planner/conformance/importgraph_test.go — zero external-tool dependency, no golangci-lint plugin). The checker lints internal/protocol/ only (method-name strings are legitimate unrelated vocabulary in other subsystems — a repo-wide scan would be all false positives) and lints _test.go files too. It surfaced and consolidated three pre-existing hardcoded method literals (control.go's dispatchStart, two _test.go fixtures) — now re-derived from the methods constants. Citation note (§4.3): the row's "§8" is CLAUDE.md §8 ("Harbor Protocol rules") — RFC-001 has no §8; RFC §5 is the design anchor, CLAUDE.md §8 is the rule the checker enforces. Coverage on internal/protocol/singlesource 94.5% (target 90%).

59 — Protocol versioning + deprecation policy (RFC §5.3)

Goal. ProtocolVersion constant; deprecation window discipline; capability negotiation. Acceptance. Version constant returned on harbor version (after phase 63); deprecation note format settled. Tests. Unit. Deps. 58.

60 — Protocol wire transport (SSE + REST) (RFC §5.4, §11 Q-1)

Goal. SSE stream for events; REST/JSON for control surface. Identity-scope enforcement at edge. Q-1 RESOLVED 2026-05-14 — SSE + REST (owner sign-off given; RFC §5.4 + §11 Q-1 updated). Phase 60 is now a normal implementation phase, not a decision gate. WebSocket remains an additive alternate transport for a later phase via the internal/protocol/transports/ seam — not a fork of this phase. Acceptance. Console can stream events and submit control over SSE+REST; smoke covers both directions. Tests. Integration; full duplex stress. Deps. 58, 05. Risks. Q-1 resolved — the load-bearing decision is settled. Remaining risk is ordinary implementation risk (SSE keepalive/reconnect discipline, identity-scope enforcement at the edge).

61 — Protocol auth + identity-scope enforcement (RFC §5.5, §4)

Goal. JWT (asymmetric only); (tenant, user, session) in claims; admin/console:fleet scopes for elevated subscriptions. Acceptance. Missing claim rejected with audit; HS*/none algorithms rejected at parser level. Tests. Unit + integration; security suite. Deps. 58, 60, 01. Status. Shipped — D-079. internal/protocol/auth ships the transport-agnostic Validator (asymmetric-algorithm allowlist enforced via jwt.WithValidMethods at parse time — HS* and alg:none are structurally impossible, the keyfunc is belt-and-braces with a non-asymmetric-key shape rejection); Middleware is the net/http decorator (Authorization: Bearer <jwt> → identity in r.Context() via identity.With + scopes via WithScopes); the eight typed sentinels (ErrTokenMissing / ErrTokenMalformed / ErrAlgNotAllowed / ErrSignatureInvalid / ErrTokenExpired / ErrTokenNotYetValid / ErrUnknownKey / ErrIdentityClaimMissing, plus ErrAudienceMismatch / ErrIssuerMismatch) cover every rejection. The new CodeAuthRejected Protocol error code lands in internal/protocol/errors/ (single-source preserved); transports.NewMux gains a WithValidator option that wraps both Phase 60 handlers in the middleware (additive — the Phase 60 trust-based posture is preserved verbatim when no validator is supplied). The control handler's assertBodyMatchesAuthedIdentity is the defence-in-depth check (a body claiming a different (tenant, user, session) than the JWT is rejected 401 before Dispatch runs); the SSE handler's ?admin=1 query param is gated on the verified ScopeAdmin / ScopeConsoleFleet scope (rejected 403 without). The golang-jwt/jwt/v5 library was promoted from indirect to direct (no new module — already pulled by aws-sdk-go-v2/credentials). test/integration/phase61_auth_test.go exercises every rejection mode end-to-end against a real ES256-keypair-signed bearer + the real ControlSurface + the real events.EventBus behind httptest.Server; the security suite covers algorithm-confusion, alg:none, scope-escalation, kid-substitution, expired-token, and tampered-body attacks; D-025 concurrent-reuse pinned at N=128 with goroutine-baseline assertion. Coverage: auth 90.1%, errors 100%, transports 94.3%, control 89.5%, stream 86.6% (all ≥ targets).

62 — Protocol conformance suite (RFC §5)

Goal. A single conformance suite the protocol surface passes; covers every method, every error code, every event filter. Acceptance. go test ./internal/protocol/conformance/... exits 0; smoke runs the same suite against harbor dev. Tests. The suite itself. Deps. 58, 60, 61. Status note. Shipped at 81.2% statement coverage (master-plan target 85%) per the documented §4.3 deviation in docs/plans/phase-62-protocol-conformance.md — matches the precedent set by Phase 49's internal/planner/conformance (70.8% under the same target). Conformance-suite coverage is dominated by t.Fatalf rollback branches that fire only on assertion failure; the assertion density (10 methods × 2 transports; 8 error codes × ≥1 failure path; every event-filter shape; the version handshake; the auth pipeline; an N=100 D-025 stress) is the load-bearing surface. The suite ships paired with test/integration/wave10_test.go — the Wave 10 wave-end E2E that consumes the same suite from a different consumer profile against the assembled real-driver Wave 10 surface.

63 — Harbor CLI skeleton (RFC §8)

Goal. harbor cobra binary with subcommands dev, scaffold, validate, version, inspect-events, inspect-runs, inspect-topology. All structured-error / --quiet / --json output mode. Acceptance. harbor --help matches a golden file; harbor version returns version + build hash + Protocol version. Tests. CLI golden tests. Deps. 60.

64 — harbor dev v1 (RFC §8)

Goal. Boot embedded Runtime + open Protocol on 127.0.0.1:<port>. No hot-reload yet. Identity injection via dev-token. Acceptance. harbor dev returns /healthz 200; events stream cleanly to a test Console subscriber. Smoke. phase-64.sh boots dev; assert_status 200 /healthz. Tests. Integration (boot, smoke, teardown). Deps. 63, 60.

Phase 64 — harbor dev v1 (pre-plan scoping note — BINDING when the plan is authored)

Phase 64 is the moment cmd/harbor/main.go stops being a driver-registration stub and starts instantiating an LLM-backed runtime for the first time. Before this phase, no production code path resolves the LLM client — every "test stub as default" call (the mock LLM driver, EchoSummarizer, staticSummariser) is dormant. Phase 64 is the moment they go live.

The §13 entry "Test stubs as production defaults on operator-facing seams" is pre-settled for this phase. The plan author MUST satisfy the constraints below — they are not re-litigable inside the phase plan:

  1. Default LLM driver is bifrost, not mock. Phase 64 flips llm.DefaultDriver from "mock" to "bifrost" (internal/llm/registry.go:172) and updates examples/*.yaml so driver: bifrost is the demonstrated path. The mock driver subpackage (internal/llm/mock/) moves under a harbor_testfixtures build tag (or to a testfixtures/ subdirectory) so it is unreachable from cmd/harbor/main.go's blank-import block in a normal build. Production tests that need a deterministic LLM consume it via the build-tagged path or via *_test.go-local fixtures.

  2. Boot fails loudly when no LLM provider is configured. Missing API key, missing bifrost provider section, or an empty llm: block → harbor dev prints a one-line error that names the missing config key (e.g. config.llm.providers[0].api_key: required when driver=bifrost) and points to examples/dev.yaml, then exits non-zero. Silent fallback to the mock is forbidden — this is the §13 "fail loudly at boot" consequence.

  3. LLM-backed defaults for memory.Summarizer and planner.Summariser. When memory.strategy: rolling_summary is configured and no custom Summarizer is injected, Phase 64 (or a same-wave sibling phase) provides a default LLM-backed Summarizer that composes an llm.LLMClient with a versioned compaction prompt template. Same shape for planner.Summariser consumed by CompressionRunner. EchoSummarizer and staticSummariser move to testfixtures and are no longer reachable from the production wiring path. If the author chooses to split this into a sibling phase (e.g. Phase 64a), that phase MUST ship in the same wave as Phase 64 — the §13 primitive-with-consumer rule applies recursively: a harbor dev that defaults to rolling_summary but has no Summarizer wired is the same failure mode one layer down.

  4. Dev-only escape hatch is explicit and banner'd. A --mock flag on harbor dev (or HARBOR_DEV_ALLOW_MOCK=1 env var — Phase 64's plan picks ONE and pins the choice in a D-NNN decisions entry) is the ONLY path to the mock LLM at runtime. When the escape hatch fires, every boot prints a stderr banner: [DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION]. The README's quickstart MAY use this path but must label it as a dev shortcut, not the production install — examples/dev.yaml shows the production-shaped config and the README's "5-minute quickstart" demonstrates the escape-hatch path with a one-line note.

  5. scripts/smoke/phase-64.sh exercises the LLM seam, not just /healthz. A smoke that only checks GET /healthz is insufficient — the phase exists to wire the LLM, so the smoke MUST exercise the LLM. The script boots harbor dev against a recorded bifrost fixture (no live network — use httptest.Server or a recorded-cassette pattern), submits one task over the Phase 60 REST handler, and asserts the SSE stream emits a planner Decision derived from a real LLMClient.Complete call. A second smoke assertion: boot with no provider configured and assert the non-zero exit with the expected error message.

  6. The §18 mirror invariant applies in spirit. Phase 64 introduces a binary that real users will run. The README's ## Status table, cmd/harbor's godoc, and any "Quick start" prose are updated in the same PR — no aspirational claims like "harbor dev boots the Console" that land before the Console-boot phases (72–75) ship. If §3's "Harbor CLI" bullet describes a command that doesn't yet exist, the bullet says so in future tense with a phase reference.

  7. Tool catalog wires Phase 30 (OAuth, D-083) + Phase 31 (approval gates, D-086) primitives from operator config (issue #104). Both phases shipped runtime-side primitives whose only consumers today are tests — internal/tools/auth.OAuthProvider and internal/tools/approval.ApprovalGate reach the runtime, but the tool catalog (internal/tools/catalog/) doesn't know about either. Phase 64 (or a same-wave sibling per the §13 primitive-with-consumer rule) extends the catalog so a tool registration can declare an ApprovalPolicy and/or an OAuth BindingScope via operator config (tools.<name>.approval: <policy>, tools.<name>.oauth: <provider> or equivalent shape). The catalog auto-wraps the registered Tool with an ApprovalGate and/or an OAuth-aware invocation wrapper. Operators get HITL approval AND tool-side OAuth out of the box without writing Go wiring code. The Wave 11 wave-end E2E exercises APPROVE/REJECT via the real transports/control HTTP handler (closing the Protocol-wire round-trip half of issue #104); the catalog-wiring half lands in Phase 64. ✅ shipped in Phase 64a / D-090.

Mandatory reading before authoring this plan (per §16): RFC §5 (Protocol surface), RFC §6.5 (LLM client), RFC §6.6 (Memory + Summarizer), docs/research/brief-02-trajectory-compression.md, docs/research/brief-04-memory-strategies.md (or whichever brief indexes summariser design — docs/research/INDEX.md resolves), docs/decisions.md (D-026 LLM-edge safety, D-035 rolling summary, D-044 latent governance, D-055 trajectory compression rendering rule), the shipped internal/llm/registry.go (the default-driver flip site) and internal/memory/strategy/ (the Summarizer wiring site).

Pre-assigned decisions slot: Phase 64's plan claims a D-NNN number when dispatched and records: (a) the mockbifrost default flip; (b) the chosen escape-hatch mechanism (--mock flag vs env var); (c) the LLM-backed default Summarizer location (in-package vs new internal/llm/summarizer/ subpackage); (d) any deliberate carve-out from the §13 entry above (requires an RFC PR — bake the carve-out into the RFC, then reference it here).

First production consumer of Phase 55's W3C carriers. Phase 64 is the first production consumer of telemetry.InjectHTTP / telemetry.ExtractHTTP (the HTTP carrier helpers Phase 55 shipped as standalone functions — see issue #94). The plan threads traceparent through tools/drivers/http on outbound calls and extracts on inbound — internal/protocol/transports/control + tools/drivers/mcp follow the same shape. This is the §13 primitive-with-consumer obligation closed for the Phase 55 carriers; before Phase 64 they are dormant helpers exercised only by unit tests.

Departures from this note require an RFC PR. This note is binding, not advisory — it encodes a Wave 10 audit finding (the §13 amendment above) that future plan-authors do not have visibility into. Treat it as the equivalent weight of an RFC section.

65 — harbor dev hot-reload (RFC §8)

Goal. fsnotify watcher; graceful-drain restart on Go-source change; configurable retain-in-flight policy. Acceptance. File change triggers drain; in-flight runs cancel cleanly; new code picked up. Tests. Integration with file mutation. Deps. 64.

§4.3 shape decision (D-099). In-process bootDevStack rebuild, NOT binary re-exec. Re-exec was considered and rejected for V1: it requires an out-of-process supervisor (the binary cannot re-exec itself without losing live http.Server connections), it costs a Go build per cycle (~5s on a warm machine — the developer feedback loop is the load-bearing UX here), and an operator iterating on YAML config does NOT need a binary rebuild. The in-process rebuild satisfies the "new code picked up" acceptance for every config / scaffold change; operators changing Go source rebuild + re-launch the binary manually (the same cycle they'd run today without hot-reload). A future opt-in policy: rebuild can layer binary-rebuild semantics on without changing the supervisor's shape.

66 — harbor dev draft-save scaffolding (RFC §8)

Goal. Project-local .harbor/drafts/ scratchpad endpoint; iterate on agent without committing scaffold; "save" promotes to harbor scaffold-emitted layout. Acceptance. Draft round-trip: edit → preview run → save → resulting scaffold passes harbor validate. Tests. Integration + golden. Deps. 64. Status. Shipped — D-100. internal/devdraft package ships the filesystem-backed Store + the http.Handler mounted at /v1/dev/drafts/ on the harbor dev mux behind the Phase 61 JWT validator. On-disk layout is <root>/<tenant>/<user>/<session>/<draft_id>/ so concurrent operators sharing the same .harbor/drafts/ root cannot collide (CLAUDE.md §6 applied to a filesystem-backed store). Five endpoints: POST / (create + seed via the Phase 67 scaffold engine), GET /{id} (list files + content for the Console editor), PATCH /{id}/files/{path} (path-traversal-safe per §7 rule 5), POST /{id}/preview (validation-only dry-run via internal/config.Load), POST /{id}/save (promote to operator-supplied output dir; refuses with ErrValidationFailed when the rendered harbor.yaml fails the validator), DELETE /{id} (idempotent discard). Five SafePayload bus events land per round-trip — dev.draft.{created,updated,previewed,saved,discarded} — registered with internal/events's exhaustive registry at init(). harbortest/devstack/devstack.go::Assemble mirrors the production wiring per D-094 (always constructs a DraftStore; mounts the handler when transports are enabled). test/integration/phase66_draft_save_test.go exercises the round-trip through the devstack helper with a real Bearer token, observes the five bus events, exercises path-traversal + missing-bearer failure modes, and runs an N=10 concurrency stress under -race. internal/devdraft/concurrent_test.go runs the D-025 N=128 concurrent-reuse test against one shared Store. scripts/smoke/phase-66.sh drives the round-trip against the live binary; the 404/405/501 → SKIP convention keeps the smoke harmless on builds that pre-date Phase 66. Coverage on internal/devdraft: ≥80% (master-plan target 75%).

67 — harbor scaffold (RFC §8)

Goal. Generate a new agent skeleton from a template (default = "minimal-react"). Templates discoverable; output passes harbor validate. Acceptance. harbor scaffold my-agent creates a buildable project; harbor validate returns 0. Tests. Golden output. Deps. 63.

§4.3 deviation (D-087). Phase 67 was dispatched in parallel with Phase 68 (harbor validate) per CLAUDE.md §17.7 step 3. At scaffold-time, harbor validate is still a Phase 63 stub — calling it would exit non-zero with not_implemented regardless of the scaffolded config's validity. Phase 67's acceptance criterion is therefore verified against internal/config.Load + Validate directly (the shipped subsystem the future harbor validate will call), via cmd/harbor/scaffold/scaffold_test.go::TestScaffold_RenderedConfig_PassesConfigValidate. The cross-phase CLI integration smoke step (running harbor validate ./harbor.yaml after a scaffold, asserting exit 0) lands in Phase 68's PR per §17.6. The §13 primitive-with-consumer rule is satisfied — the consumer-of-the-config-validator is a real shipped subsystem (internal/config), not a future CLI surface.

68 — harbor validate (RFC §8)

Goal. Validate config / skills / agent definitions without booting. Errors include file:line. Acceptance. Each error category produces a stable message; CI uses validate as a pre-flight check. Tests. Golden errors. Deps. 63, 02.

69 — harbor inspect-events / inspect-runs (RFC §8)

Goal. Tail/filter event bus; list recent runs + show trajectory. Acceptance. harbor inspect-events --session SID --type tool.completed filters server-side; harbor inspect-runs SID shows run trajectory. Tests. Golden CLI outputs. Deps. 63, 60.

70 — harbor inspect-topology (RFC §8)

Goal. Render run's node graph as ASCII; consumes topology.snapshot events. Acceptance. Sample run produces stable ASCII matching golden. Tests. Golden. Deps. 63, 60.

71 — harbortest test kit package (RFC §6.13)

Goal. Public harbortest package: RunOnce(ctx, agent, input) (Output, EventLog, error), AssertSequence(log, []EventType{...}), AssertNoLeaks(log) (cross-tenant/session leakage detector), SimulateFailure(toolName, code, n), RecordedEvents(runID) []Event. Acceptance. Flow-level test ≤ 10 lines; AssertNoLeaks catches a deliberate cross-session bug in a regression test. Tests. Self-test of the kit. Deps. 05, 09, 07.

Console wave — re-decomposition pending (tracked, not yet expanded). Phases 72–75 currently cover the Runtime-side Protocol hooks for a subset of the Console. RFC §7 now defines the full Console information architecture: a 14-page observability + control plane (Overview, Live Runtime, Sessions, Tasks, Agents, Tools, Events, Background Jobs, Flows, Memory, MCP Connections, Artifacts, Evaluations, Settings) organized as runtime lenses — every page is a projection over state snapshots + realtime events + control commands. The binding structuring rule (RFC §7, CLAUDE.md §13): no Console page phase ships without its feeding Protocol-surface phase landing first or in the same wave. When this wave is re-decomposed, the heavy pages (Live Runtime, Events, Agents) each become their own phase twinned with a Protocol-surface phase; the lighter pages cluster. The Agents page is a lens over the Agent Registry (phase 53a). The notification.* topic (Overview intervention queue) and search.* Protocol methods (global ⌘K) land as named acceptance criteria of their consuming page phases, not as free-floating primitives. Evaluations is explicitly post-V1 (D-064) — it is a subsystem, not a page. Re-decomposition itself follows the §16 phase-authoring ritual per new phase and is not done in this edit.

Console-wave deployment + shared-library posture (BINDING — D-091 / D-092 / D-093). Companion to the page-decomposition note above; this note locks in the how it's deployed and how it's built answers a future Console plan-author cannot relitigate. Departures from any item below require an RFC PR, not a phase-plan footnote.

  1. harbor console is the Console's deployment surface, not harbor dev. The full Console SvelteKit build is baked into cmd/harbor via embed.FS and served by a new cmd/harbor/cmd_console.go subcommand (a phase to be slotted at re-decomposition time). harbor dev (Phase 64, shipped) is and stays headless — embedding the Console into harbor dev is rejected (couples developer iteration to operator observability; wrong scope). A future packed dev UI for single-agent development reuses the Console's chat/playground components via a shared library; post-V1. Decision: D-091.
  2. Svelte 5 + runes mode only. web/console/svelte.config.js ships with compilerOptions: { runes: true }; package.json pins "svelte": "^5.0.0". Legacy Svelte 4 reactivity ($:, top-level let as state, export let props, store auto-subscription in scripts) is rejected by svelte-check --fail-on-warnings. Decision: D-092.
  3. Protocol TypeScript client is generated, not hand-written. cmd/harbor-gen-protocol-ts/ reads internal/protocol/singlesource.CanonicalWireTypes and emits web/console/src/lib/protocol.ts with a // CODE GENERATED ... DO NOT EDIT. header. A make protocol-ts-gen-check target asserts git diff --exit-code is clean in CI. Hand-rolled fetch in .svelte files is still rejected (§13). Decision: D-093.
  4. Stylelint enforces the no-raw-literals rule mechanically. The first Console phase that creates web/console/ lands web/console/.stylelintrc.cjs that disallows hex / rgb() / named colors and arbitrary px / rem / em outside the token surface (tokens.css). npm run lint fails CI on raw literals; reviewers no longer hunt for them by eye.
  5. Shared chat module — encapsulate first, extract on second consumer. The chat + playground + MCP-Apps renderer + file-upload + trace-toggle components ship as a self-contained module at web/console/src/lib/chat/. The introducing phase enforces: (a) no imports of other Console internals from the chat module; (b) a typed ProtocolClient interface the caller injects, not a Console singleton; (c) the MCP-Apps renderer registry lives at web/console/src/lib/chat/renderers/. The future packed dev UI extracts to web/shared/chat/ via git mv when its phase plan lands.
  6. Mockup inventory is complete for V1 (as of 2026-05-18). All 13 V1 sidebar pages plus the session-level Playground surface have canonical mockups at docs/rfc/assets/console-<slug>-page.png (14 PNGs; Evaluations excluded per D-064). Each docs/design/console/page-<slug>.md spec carries a §12. Mockup-aligned refinements (2026-05-18) section that reconciles its mockup against §3-§7. Each Console page phase plan MUST reference the canonical mockup for the view(s) it ships AND consume the §12 reconciliation directly — the §12 component table is the binding source for any [wave-13-extends] Protocol-surface additions. The superseded legacy docs/research/console-mockup-runtime-view.png is retained as a research artifact only; the canonical Live Runtime mockup is docs/rfc/assets/console-live-runtime-page.png.
  7. §17.7 dispatch-prompt forcing function. Every Console-wave dispatch prompt MUST name in its mandatory reading list: Brief 11, Brief 12, every docs/rfc/assets/console-*-page.png asset (the legacy docs/research/console-mockup-runtime-view.png is superseded — agents should not consume it), CLAUDE.md §4.5 + §13 frontend bullets, and the three decisions above (D-091, D-092, D-093). This note is binding, not advisory.
  8. Per-page Console specs live at docs/design/console/page-<slug>.md. The 14-page IA is decomposed into one self-contained spec per page (Overview, Live Runtime, Sessions, Tasks, Agents, Tools, Events, Background Jobs, Flows, Memory, MCP Connections, Artifacts, Settings, Playground) — each carries an eleven-section template with a [shipped] / [wave-13-extends] / [deferred] functionality matrix. These specs are the authoritative per-page mockup-authoring source for Wave 13 and MUST appear in every per-page agent's mandatory reading list alongside Brief 11, Brief 12, and the relevant mockup asset. The directory's README.md is the index.

72 — Console subscription protocol surface (RFC §5.2, §7)

Goal. Read-only event subscription scoped by identity triple; admin/console:fleet scope for cross-session/tenant. Acceptance. Console can subscribe to a session's events; cross-tenant call rejected unless scoped admin. Tests. Integration. Deps. 60, 05, 06. Plan file. docs/plans/phase-72-console-subscription-scope.md (shipped — D-105).

72a — events.subscribe filter extensions + events.aggregate (RFC §5.2, §6.13)

Goal. Extend the events.subscribe Protocol surface with a wire EventFilter struct (event-type / tenant / user / session / run / time-window) and add a new events.aggregate Protocol method returning time-bucketed event-type counts. Both methods use the closed two-scope set (auth.ScopeAdmin + auth.ScopeConsoleFleet) for cross-tenant fan-in per D-079 — NO new events.crosstenant scope. Acceptance. EventFilter + EventBucket + EventAggregateRequest + EventAggregateResponse ship in internal/protocol/types/events.go; events.aggregate route mounted on the wire; cross-tenant requests without the closed-set scope claim return 403 + CodeIdentityScopeRequired; bucket arithmetic deterministic (Window % Bucket == 0 or 400); concurrent-reuse pin under -race (N≥100). Tests. Unit (filter matrix, aggregate bucket arithmetic, concurrent-reuse) + integration (test/integration/events_filter_aggregate_test.go — real bus + real auth + real transports, scope-claim happy + reject paths, concurrent-reuse over the wire) + smoke (scripts/smoke/phase-72a.sh). Deps. 60, 61, 72. Plan. See docs/plans/phase-72a-events-filter-aggregate.md.

72e — pause.list snapshot Protocol method (RFC §5.2, §6.3)

Goal. Add the pause.list Protocol method (route POST /v1/pause/list) — a paginated, identity-scope-filtered snapshot of currently-paused tasks / sessions, projected from the shipped Phase 50 Pause/Resume Coordinator's in-memory registry. Read-only: it consumes the Coordinator state, it does not mutate the registry or call Resume. It is the snapshot half of the Console intervention-queue contract; live deltas continue to flow through events.subscribe on the pause.requested / pause.resumed topics. The Overview-page intervention queue (Phase 73a) is the UI consumer. Acceptance. MethodPauseList + the PauseSnapshot / PauseFilter / PauseListRequest / PauseListResponse / PauseArtifactRef wire types ship in internal/protocol/{methods,types}; the Coordinator.List interface extension + internal/runtime/pauseresume/list.go implementation; identity-mandatory (401 CodeIdentityRequired); cross-tenant filter without auth.ScopeAdmin → 403 CodeIdentityScopeRequired (D-079 closed-scope reuse, no new scope); the D-026 heavy-content bypass routes oversized pause payloads through the ArtifactStore and emits pause.payload_artifact_routed; pagination (PageSize default 50, max 200, out-of-range → 400, never silently clamped); concurrent-reuse pin under -race (N=128). Tests. Unit (list_test.go — filter combinations + pagination math + status semantics; pause_list_handler_test.go — identity / scope-claim / malformed / heavy-bypass; list_concurrent_test.go — D-025 N=128) + integration (test/integration/pause_list_test.go — real Coordinator + real transport + real auth, two-tenant scope, cross-tenant reject, admin-claim accept, heavy-payload bypass, concurrency stress, all -race) + smoke (scripts/smoke/phase-72e.sh). Deps. 50, 60, 61, 17 (all shipped). 73c / 73d for pagination-shape consistency only — same wave. Plan. See docs/plans/phase-72e-pause-list-snapshot.md (shipped — D-110).

72g — governance.posture + llm.posture (RFC §5.5, §6.15)

Goal. Two read-only posture Protocol methods feeding the Console Settings page (Phase 73m). governance.posture returns the D-081 IdentityTiers view (per-tier BudgetCeilingUSD + token-bucket RateLimit + MaxTokens) plus DefaultTier + the caller-resolved tier. llm.posture returns the bound LLM provider/model/region + a MockMode boolean — true iff the runtime booted with HARBOR_DEV_ALLOW_MOCK=1 (D-089). The two methods EXTEND the Phase 72f PostureSurface (one surface, not two — §13). Both are identity-mandatory; cross-tenant reads require auth.ScopeAdmin (D-079). Read-only — no mutation method. Acceptance. MethodGovernancePosture / MethodLLMPosture registered in internal/protocol/methods + folded into IsPostureMethod; wire types in internal/protocol/types/{governance,llm}.go; the Phase 72f PostureSurface dispatcher routes both new methods through the control transport via the same IsPostureMethod branch; cross-tenant non-admin → 403 CodeScopeMismatch; missing identity → 401; cross-tenant governance/llm admin reads emit a *.posture_read_admin audit event; MockMode reflects the D-089 boot-time capture; concurrent-reuse pin under -race (N≥100). Tests. Unit (posture providers, posture surface, control posture handler, concurrent-reuse) + integration (test/integration/phase72g_posture_test.go — real governance + llm + transports + ES256 auth, MockMode round-trip across two boot modes, cross-tenant reject, N≥10 stress) + smoke (scripts/smoke/phase-72g.sh). Deps. 36a, 36b, 64, 72f. Plan. See docs/plans/phase-72g-governance-llm-posture.md (shipped — D-112).

72h — Console DB local schema + SvelteKit scaffold (RFC §7)

Goal. Land the Console-local IndexedDB schema (per D-061 — Console-local state ONLY, never a shadow source of truth for runtime entities) AND introduce the web/console/ SvelteKit scaffold (audit-resolved A5) every Stage-2 Console page rides on. Eight V1 tables: saved_filters, saved_views, profiles, runtime_registry, auth_profiles, pat_store, notifications_routing, keybindings. Acceptance. web/console/src/lib/db/ ships as a self-contained TypeScript module behind a ConsoleDB driver interface (V1 default driver: IndexedDB); per-operator row scoping is structural ([operator_id, id] compound key); auth_profiles / pat_store blobs are AES-GCM ciphertext with a PBKDF2-derived KEK (crypto.ts); forward-only migrations; the §13 / D-061 carve-out is mechanically scanned (schema-carveout.spec.ts + smoke); the SvelteKit scaffold pins Svelte 5 runes (D-092) + ships the generated protocol.ts stub (D-093). Tests. Vitest unit (crypto.spec.ts, schema.spec.ts, schema-carveout.spec.ts, migrations.spec.ts) + in-package integration (tests/integration.spec.ts — real IndexedDB driver via fake-indexeddb, real WebCrypto, eight-table round-trip, cross-operator isolation, encrypted-blob round-trip, wrong-key fail-loud) + smoke (scripts/smoke/phase-72h.sh, static-only). Deps. 60 (Protocol auth for PAT identity scoping). Plan. See docs/plans/phase-72h-console-db-schema.md. Decision: D-113.

73 — Console state inspection surface (RFC §5.2, §7)

Status. Shipped*dissolved during Wave 13 (D-133). Phase 73 never landed as a standalone phase; its surface was decomposed across the Console page phases that consumed each slice. Shipped: sessions.inspect (Phase 73c), tasks.get (Phase 73d), artifacts.list / artifacts.put / artifacts.get_ref (Phase 73l, D-120). Deferred post-V1 (no V1 consumer — §13 no-primitive-without-consumer): state.history, state.list_trajectories, state.load_planner_checkpoint, artifacts.get, artifacts.delete — each lands additively with the first Console surface that consumes it. Goal. sessions.inspect, tasks.get, state.history, state.list_trajectories, state.load_planner_checkpoint, artifacts.list, artifacts.get, artifacts.get_ref, artifacts.delete — all scope-checked, redacted on emit. Acceptance. Each method enforces identity; redaction applied; pagination defined. Tests. Integration + scope mismatch. Deps. 60, 07, 17. Cross-reference. Phase 73l (Console Artifacts page) is the page-side consumer — it extends artifacts.list's filter shape and adds artifacts.put + the artifacts.get_ref presigned-URL resolver in the same wave (D-120).

73l — Console Artifacts page (RFC §5.2, §6.10, §7)

Goal. The Console Artifacts page — catalog + preview surface over the runtime's content-addressed artifact store — plus its feeding Protocol additions: the artifacts.list filter extensions (mime / source / size / created / tags), the artifacts.put upload pipeline (Brief 11 §PG-2), and the artifacts.get_ref presigned-URL resolver (D-022 / D-026). Ships the canonical renderer-registry SKELETON at web/console/src/lib/chat/renderers/ (dispatch table + six MIME renderers) — Phase 73l is the registry's first in-staging consumer; Phase 73n extends it. Acceptance. The three artifacts.* methods route through a sibling ArtifactsSurface; identity-mandatory + D-079 cross-tenant gating; artifacts.get_ref fails loud with CodePresignUnsupported on a non-S3 driver; the page dispatches previews through the canonical registry with no bespoke per-mime renderer; mutation surfaces render disabled-with-tooltip. See docs/plans/phase-73l-console-artifacts-page.md. Tests. Unit (internal/protocol/artifacts_test.go), concurrent-reuse N=100 (D-025), integration (test/integration/artifacts_page_test.go — in-mem + SQLite + fs drivers + real wire transport), renderer-registry Vitest, Playwright per-page spec. Deps. 73 (artifacts base methods), 75 (Playwright harness). Deviations (D-120). The surface lands at internal/protocol/artifacts.go (the codebase has no handlers/ sub-package — it follows the SearchSurface / PostureSurface convention); web/console/src/lib/protocol.ts is hand-extended (the cmd/harbor-gen-protocol-ts generator binary has not yet landed — Phase 72h committed protocol.ts as a hand-shaped stub). Both are recorded in the phase plan.

73j — Console Memory page (Protocol + UI) (RFC §5.2, §6.6, §7)

Goal. Bundle the Memory-page Protocol surface and UI into one Stage-2.1 phase (Wave 13 decomposition §5). Three read-only Protocol methods land — memory.list (paginated, identity-scope-filtered memory records + aggregate counters), memory.get (one record's full detail; heavy values routed through artifacts.get by reference per D-026), memory.health (aggregate counters + per-scope driver mapping). The methods compose over the shipped MemoryStore.Snapshot surface (Phases 23–25) + the events.aggregate 24h counters (Phase 72a). The UI is the SvelteKit Memory page (/memory) — catalog table + right-rail status cards (Memory health / Recent identity rejections / Recovery dropouts / Selected-item detail) + the disabled-with-tooltip bulk-action toolbar (V1 is view-only; the memory mutation surface is deferred to Phase 73 / post-V1). The page IS the consumer (§13 satisfied trivially); it also consumes memory.identity_rejected (D-033) + memory.recovery_dropped (D-035) events. Acceptance. MethodMemoryList / MethodMemoryGet / MethodMemoryHealth registered in internal/protocol/methods + folded into the new IsMemoryMethod predicate; wire types in internal/protocol/types/memory.go; the three routes (POST /v1/memory/{list,get,health}) mounted via transports.WithMemory; identity-mandatory (401 CodeIdentityRequired); cross-tenant filter without auth.ScopeAdmin → 403 CodeIdentityScopeRequired — NO new memory scope (audit B1; D-079 closed-set reuse); the D-026 heavy-value bypass routes oversized values through the ArtifactStore and memory.get ships ValueArtifact (never inline bytes); a constructed-driver negative test fails loud with ErrContextLeak; concurrent-reuse pin under -race (N≥100); the Memory page renders against the mockup with design-token-only styling; per-page Playwright spec web/console/tests/memory-page.spec.ts. Tests. Unit (internal/memory/protocollist_test.go / get_test.go / health_test.go / leak_internal_test.go / concurrent_reuse_test.go; internal/protocol/transports/stream/memory_handler_test.go) + integration (test/integration/memory_page_test.go — real MemoryStore + real transport + real ES256 auth + real artifact store + real events bus; happy path, cross-tenant reject, identity-required fail-loud with the D-033 bus assertion, D-026 heavy-value round-trip, N≥10 two-tenant concurrency stress, all -race) + Console-side Vitest (saved_filters_memory.spec.ts, protocol-memory.spec.ts) + Playwright (memory-page.spec.ts) + smoke (scripts/smoke/phase-73j.sh). Deps. 23, 24, 25, 60, 61, 72a, 72h, 73 (artifacts.get), 75 (all shipped or same-wave). Plan. See docs/plans/phase-73j-console-memory-page.md (shipped — D-118).

73i — Console Flows page (Protocol + UI) (RFC §5.2, §6.1, §7)

Goal. Ship the Console Flows page as a single Wave 13 Stage-2.1 phase: six NEW flows.* Protocol methods (flows.list with aggregate metrics, flows.describe engine-graph payload, flows.runs.list, flows.runs.describe, flows.run, flows.metrics) + the read-only Flows-page UI (catalog table + Flow Metrics card + the shared read-only engine graph canvas + per-flow Budget meter + run-history table + selected-run summary panel) + the per-page Playwright spec. Authoring is OUT of V1 per D-063 — the page is view-only with flows.run as the only mutating action, gated on auth.ScopeAdmin (D-079).

Acceptance. Six method names declared in internal/protocol/methods/methods.go; wire types in internal/protocol/types/flows.go; all six identity-mandatory + cross-tenant gated on auth.ScopeAdmin; flows.run gated on the same admin claim and degrades to 403 without it; flows.runs.describe ships heavy outputs via FlowArtifactRef (D-026); the shared EngineGraphCanvas + typed GraphInput interface published for Phase 73b; no authoring affordances render (D-063).

Deviations (D-117). The flows.run mutating gate reuses auth.ScopeAdmin (D-079 closed two-scope set — no new scope minted). The runtime side introduces a new flow.Registry subsystem as the source-of-truth (registered flows + bounded run-history ring). The typed Console client lives at web/console/src/lib/flows/client.ts as the hand-authored mirror of the flows.* surface until cmd/harbor-gen-protocol-ts (D-093) is extended to emit it — protocol.ts itself is not hand-edited.

Tests. Unit (flow/protocol/*_test.go — surface + catalog + invoker; flows_handler_test.go — identity / scope / decode; concurrent_reuse_test.go — D-025 N≥100) + integration (test/integration/flows_page_test.go — real registry + real transport + real auth, two-tenant scope, cross-tenant reject, flows.run reject without claim, D-026 heavy-output bypass, concurrency stress, all -race) + Console Vitest (format.spec.ts, layout.spec.ts, client.spec.ts) + Playwright (web/console/tests/flows-page.spec.ts) + smoke (scripts/smoke/phase-73i.sh).

Plan. See docs/plans/phase-73i-console-flows-page.md (shipped — D-117).

73g — Console Events page (RFC §5.2, §6.13, §7)

Goal. Ship the Console Events page — the runtime event-bus stream as a full-screen, query-driven investigative surface. This is a composition-only page phase: it ships NO new Protocol method. It consumes the shipped events.subscribe (GET /v1/events SSE table feed — Phase 72), events.aggregate (POST /v1/events/aggregate sparkline feed — Phase 72a), and artifacts.get_ref (heavy-payload Open artifact resolver — Phase 73l). The page IS the consumer Phase 72a's primitives waited for (§13 satisfied trivially). The UI is the SvelteKit Events page (/events) — faceted filter chips + Console-DB-backed saved-view chips + event-rate sparkline + virtualised event table + right-rail Event Details card + Pause-stream toggle + Export ▾ — built on the D-121 design-system foundation.

Acceptance. Route under (console)/events/ (no /console/ URL prefix — CONVENTIONS.md §1); the EventsNamespace joins the unified HarborClient; saved views persist in the shipped saved_filters Console DB table scoped to page='events' (no new table — D-061); the Pause-stream toggle is a Console-local render gate distinct from the runtime pause method; heavy payloads route through artifacts.get_ref, never inlined (D-026); cross-tenant Tenant ▾ gated on the D-079 closed scope set (no events.crosstenant minted); four-state PageState. See docs/plans/phase-73g-console-events-page.md.

Deviations (D-125). No new Protocol method (composition-only). The route ships at web/console/src/routes/(console)/events/ and the page components at web/console/src/lib/components/events/ — the phase plan (authored before D-121) named console/events/ and lib/events/components/; CONVENTIONS.md §1/§3 (D-121) is the binding cross-cutting authority and yields the corrected paths (CLAUDE.md §15).

Tests. Console Vitest (filters.test.ts, sparkline.test.ts, export.test.ts, taxonomy.test.ts, saved_filters_events.spec.ts, EventsNamespace cases in harbor-client.spec.ts) + integration (test/integration/events_page_test.go — real inmem bus + real SSE/aggregate handlers + real artifacts surface, subscribe filter narrowing, aggregate sparkline correctness, cross-tenant isolation, the truncated-payload artifacts.get_ref identity-rejection failure mode, N≥16 concurrent-subscriber stress, all -race) + Playwright (web/console/tests/events-page.spec.ts) + smoke (scripts/smoke/phase-73g.sh).

Plan. See docs/plans/phase-73g-console-events-page.md (shipped — D-125).

73a — Console Overview page (composition-only UI) (RFC §5.2, §6.13, §6.15, §7)

Goal. Ship the Console Overview page — the operator's at-a-glance hub and the default route on a fresh attach. This is a composition-only page phase: it ships NO new Protocol method. It composes the SHIPPED runtime.counters / runtime.health (Phase 72f), pause.list (Phase 72e), events.subscribe SSE (Phase 60 / 72), and the Phase 54 approve / reject control verbs into the 4-card counter row + sub-header health-chip strip + cost-rollup card + intervention queue + recent-activity feed + 2×3 Quick Links grid + the + New quick-create menu. The UI is the SvelteKit Overview page (/overview) built on the D-121 design-system foundation.

Acceptance. Route under (console)/overview/ (no /console/ URL prefix — CONVENTIONS.md §1); the RuntimeNamespace + PauseNamespace join the unified HarborClient; the counter sparklines / recent-activity feed / cost rollup fold client-side off the events.subscribe cursor (no new Protocol method — page-overview.md §12); the intervention queue's Approve / Reject invoke the SHIPPED Phase 54 control verbs and degrade to disabled-with-tooltip without the admin control-scope claim (D-066 / §13 — no parallel implementation); the Quick Links grid is exactly six tiles with no Evaluations tile (D-064); saved views persist in the shipped saved_filters Console DB table scoped to page='overview' (no new table — D-061); four-state PageState with nested PageState per panel. See docs/plans/phase-73a-console-overview-page.md.

Deviations (D-127). No new Protocol method, no new Go-side surface (composition-only — internal/ is unchanged). The route ships at web/console/src/routes/(console)/overview/ — the phase plan (authored before D-121) named web/console/src/routes/overview/ and the smoke probed /console/overview; CONVENTIONS.md §1 (D-121) is the binding cross-cutting authority and yields the corrected unprefixed (console)-group paths (CLAUDE.md §15).

Tests. Console Vitest (aggregations.test.ts, activity.test.ts, cost.test.ts, saved_filters_overview.spec.ts, RuntimeNamespace / PauseNamespace cases in harbor-client.spec.ts) + Playwright (web/console/tests/overview-page.spec.ts — depth-bar shell, counter row, scope-gated intervention actions, Quick Links navigation, + New deep-links, the Disconnected PageState) + smoke (scripts/smoke/phase-73a.sh). No Go-side integration test — Phase 73a adds no internal/ seam; the cross-stack integration assurance is the Playwright spec against a live harbor console plus the upstream 72e/72f integration tests.

Plan. See docs/plans/phase-73a-console-overview-page.md (shipped — D-127).

73c — Console Sessions page (Protocol + UI) (RFC §5.2, §6.9, §7)

Goal. Ship the Console Sessions page as a single Wave 13 Stage-2.1 phase: two NEW sessions.* Protocol methods (sessions.list — paginated + filtered SessionRegistry projection with the full filter set; sessions.inspect — full per-session snapshot) + the SvelteKit Sessions list/detail route + the per-page Playwright spec. Read-only — the bulk Cancel / Pause toolbar actions iterate the shipped per-row control methods (D-072) and render disabled-with-tooltip (D-066). The page IS the first consumer of sessions.list (§13 primitive-with-consumer).

Acceptance. Two method names declared in internal/protocol/methods/methods.go; nine wire types in internal/protocol/types/sessions.go; both identity-mandatory + cross-tenant gated on auth.ScopeAdmin (D-079); sessions.list emits Truncated bool not a silent total (D-026); the Sessions-page Identity column renders Phase 72b's IdentityScope impersonation triplet; no Priority surface (D-065); saved filters Console-DB-local (D-061); the page clears the CONVENTIONS.md §5 depth bar.

Deviations (D-122). The wire handler lands at internal/protocol/transports/stream/sessions_handler.go (the codebase has no internal/server/ package — the plan's path is stale; the handler follows the Phase 73f / 73i precedent). sessions.inspect ships whole, not as an additive extension of a Phase 73 parent method that has not landed. web/console/src/lib/protocol.ts is NOT hand-edited — the Sessions wire types live at web/console/src/lib/sessions/types.ts with a typed SessionsProtocol wrapper over the unified HarborClient, following the Phase 73i Flows-page precedent until cmd/harbor-gen-protocol-ts (D-093) lands.

Tests. Unit (sessions/protocol/protocol_test.go — Service filter/cursor/scope; concurrent_test.go — D-025 N≥100; sessions_handler_test.go — identity / scope / decode) + integration (test/integration/sessions_page_test.go — real registry + real transport + real auth, two-tenant scope, cross-tenant reject + audit emit, malformed cursor, N≥10 SSE-subscriber concurrency stress, all -race) + Console Vitest (sessions/tests/format.spec.ts, db/tests/saved_filters_sessions.spec.ts) + Playwright (web/console/tests/sessions-page.spec.ts) + smoke (scripts/smoke/phase-73c.sh).

Plan. See docs/plans/phase-73c-console-sessions-page.md (shipped — D-122).

74 — Console topology projection events (RFC §5.2, §6.13, §7.1)

Goal. topology.snapshot Protocol method + topology.changed event over the canonical engine-scoped TopologyProjection (static graph + live per-edge queue depth); the event emits on engine construction, the method serves on-demand cold-start. Acceptance. A Protocol client renders a topology view from the canonical projection alone (no internal access); identity-mandatory; cross-tenant requires auth.ScopeAdmin (D-079). See docs/plans/phase-74-console-topology.md. Tests. Unit (internal/protocol/types, internal/runtime/engine), concurrent-reuse N≥128 (D-025), integration (test/integration/phase74_topology_test.go — real engine + real bus + real wire transport). Deps. 05, 09. Deviations (D-114). The ControlSurface topology accessor wires via the WithTopologyAccessor functional option (not a positional NewControlSurface argument — keeps the Phase 54 signature stable); the nil-accessor / engine-less path returns CodeUnknownMethod (no CodeMethodNotSupported code exists); harbor dev hosts no engine-graph so its surface leaves the accessor nil; the decision number is D-114 (the plan's pre-assigned D-106 collided with a parallel Wave 13 phase).

75 — Console e2e Playwright harness baseline (RFC §7)

Goal. Playwright harness baseline under web/console/tests/ — config, fixtures, page-object base class, helpers, the meta-test, and the frontend-e2e CI hook. The harness runs against harbor console (D-091) — NOT harbor dev; the original master-plan wording is corrected per D-091 + Brief 12 (the Console static build is served exclusively by harbor console). Per the binding rule: every operator-facing flow shipped in a phase has a matching .spec.ts. Wave 13 (docs/plans/wave-13-decomposition.md §12 item 7) narrows this phase to baseline-only: per-page specs land alongside each Stage-2 page phase (73a–73n); the wave-end aggregator suite is Phase 75a (Stage 3). See D-115. Acceptance. A baseline harness exists at web/console/tests/ (config + fixtures + page-object base + helpers + meta-test); the frontend-e2e CI job runs it and skips gracefully when web/console/ is absent (directory-missing → SKIP); future Console page phases hook their per-page specs into it. Tests. Playwright meta-test (harness.spec.ts) — boots harbor console, asserts the index serves + the SvelteKit app hydrates; SKIPs cleanly before the harbor console subcommand (Phase 73m) and the SvelteKit scaffold (Phase 72h) land. Deps. 60, 72. (Narrowed from 64, 72, 73 per the Wave 13 decomposition §4 — per-page Protocol additions move into each Stage-2 page phase; 64 is transitively assumed via 60.)

75a — Console e2e Playwright wave-end suite (RFC §7)

Goal. The Wave 13 wave-end aggregator Playwright suite (web/console/tests/wave13.spec.ts) — full IA navigation across all 14 V1 Console pages, scope-claim degradation regression, cross-page identity isolation, saved-view persistence, notification routing end-to-end. Bundled with the final Stage-2 PR per CLAUDE.md §17.5. Includes test/integration/wave13_test.go (Go-side wire-type round-trip + cross-page identity isolation + N≥10 concurrent SSE subscriber stress). Enumerates the 14-page IA and asserts a matching <slug>-page.spec.ts exists for each — a missing page-spec pair is a build break (operator §12 item 7 binding amendment). Acceptance. Every one of the 14 V1 Console pages has a matching per-page spec; the aggregator walks them all; the page-coverage check (make wave13-coverage-check) is green. Tests. wave13.spec.ts + test/integration/wave13_test.go. Deps. 75, 73a-73n. Shipped notes (D-131). Three things landed beyond the original plan: (1) a §17.6 cross-phase fix of a Phase 73m build-pipeline gap — the frontend-e2e CI job now runs make console-build before make build so harbor console embeds the real SvelteKit bundle (it was embedding an empty consoledist/); (2) a dev-only runtime-entity fixture seeder (cmd/harbor/devseed.go, gated by HARBOR_DEV_SEED_FIXTURES=1) so the per-page Playwright specs render real rows — the 25 SEED_DEPENDENT per-page skips were un-skipped and pass; (3) six per-page tests (Live Runtime tab content ×2, Playground chat ×3, Events pause-toggle ×1) carry a documented §17.6 deferral skip — they need run-trajectory fixtures (a live topology.snapshot / chat history / SSE subscription), a larger seam than registry seeding, tracked as a follow-up.

76 — Cross-tenant isolation conformance harness (RFC §4.3)

Goal. A master conformance harness asserting cross-tenant + cross-session isolation across StateStore / ArtifactStore / MemoryStore / SkillStore / TaskRegistry / EventBus. 100 sessions × random ops under -race. Acceptance. Final invariant: every read's identity matches the caller's identity exactly; CI runs the harness on every PR. Tests. The harness is the test. Deps. 07, 17, 23, 37, 20. Risks. This is the integrity gate. A regression here is a security bug. Shipped notes (D-134). The harness lives at test/integration/isolation_conformance_test.go (package integration_test; no new top-level directory — AGENTS.md §3 / §17.2). Three shipped tests: TestE2E_Isolation_ConformanceHarness (the 100-session randomized soak), TestE2E_Isolation_CrossScopeReadIsBlind (targeted positive proof across the cross-session + cross-tenant boundaries), TestE2E_Isolation_FailClosedOnMissingIdentity (the §17.3 failure mode — every subsystem rejects an incomplete triple). Soak-window split (D-134): the every-PR default is a fast ~3 s window (100 workers × thousands of op-cycles still catch a leak with overwhelming probability); the master-plan 30 s soak is opt-in via HARBOR_ISOLATION_SOAK=<go-duration>, and -short forces the fast window. All six subsystems are opened through their production registry factories — no mocks at the seam; SkillStore runs against its only V1 driver, localdb SQLite (:memory: DSN). The dedicated isolation CI job runs the fast window on every PR.

77 — Goroutine leak conformance harness (RFC §5 Go conventions)

Goal. Harness wrapping every long-lived component asserting runtime.NumGoroutine returns to baseline after Stop(). Acceptance. All Runtime components pass; CI runs on every PR. Tests. The harness is the test. Deps. 10, 13, 50. Shipped. test/integration/phase77_goroutine_leak_test.go ships the table-driven TestE2E_Phase77_GoroutineLeakConformanceleakCases is a slice of {name, exercise} rows, one per long-lived Runtime component (Engine, inmem + durable EventBus, sessions.Registry, inprocess TaskRegistry). Each row constructs the real component with real drivers, runs 12 construct → exercise → teardown cycles, and asserts runtime.NumGoroutine() returns to baseline via a bounded eventually-poll (deadline + interval, never an instant snapshot — CLAUDE.md §17.4). A warm-up cycle precedes baseline capture; the suite is not t.Parallel (NumGoroutine is process-global). Passive registries with no background goroutines (pauseresume.Coordinator, steering Registry/Inbox/RunLoop) are deliberately not rows — they have no teardown seam to leak from; the Phase 50 dependency is satisfied by the pause primitive being exercised inside the Engine run lifecycle. A dedicated leak-harness CI job runs the suite under -race on every PR. All five V1 component rows pass on first run — no leaks found. See D-135.

78 — Chaos / fault injection harness

Goal. Kill mid-run, drop messages, simulate provider quirks, simulate StateStore disconnect, force pause-deserialize failures. Used in integration tests; not on hot path. Acceptance. Each failure mode produces the documented event + recovery path. Tests. Chaos suite. Deps. 76, 77. Shipped. test/integration/phase78_chaos_fault_injection_test.go ships the table-driven TestE2E_Phase78_ChaosFaultInjectionchaosCases is a slice of {name, inject} rows, one per master-plan failure mode. Each row wires the real Runtime component through its production factory / constructor (engine.New, events.Open, state.Open, pauseresume.New, retry.Wrap), injects one fault, and asserts BOTH the documented loud error / event AND the documented recovery path. The five rows: kill-mid-run (a run held in-flight by a blocking node is cancelled — asserts the engine's RunCancelledHandler seam fires, FetchByRun observes ErrRunCancelled, Engine.Stop tears down cleanly within a bounded deadline, no goroutine leak); drop-messages (a tiny-buffered subscription is saturated past the bus's drop-oldest backpressure — asserts the typed bus.dropped event carries a non-empty dropped sequence range); provider-quirks (a quirk LLM driver returns malformed output, wrapped in the real retry.Wrap retry-with-feedback layer with a rejecting Validator — asserts the llm.retry_with_feedback event fires + the call exhausts with llm.ErrRetryExhausted, plus a recovery sub-case that succeeds after one retry); statestore-disconnect (a fault-injecting decorator over the real in-mem StateStore returns a transport error — asserts the error surfaces loudly out of Save/Load, then the reconnect recovery path works); pause-deserialize-failure (a PauseRequest whose trajectory carries a live channel fails Coordinator.Request loud with trajectory.ErrUnserializable naming a non-empty field path — the D-069 / RFC §3.4 fail-loud contract, never a half-persisted checkpoint, plus a clean-trajectory recovery sub-case). Faults are injected by THIN DECORATORS over the real components (test/integration/phase78_faults_test.go) — they decorate, never replace, and live in *_test.go files, never registered as a driver default (the §17.3 "real drivers at the seam" pattern with a fault overlay, not the §13 "test stub as production default" anti-pattern — see D-137). Every row asserts the fault is SURFACED loudly; no silent degradation (CLAUDE.md §13). A dedicated chaos CI job runs the suite under -race on every PR. All five failure-mode rows pass under -race. scripts/smoke/phase-78.sh (static-only) asserts the harness + decorators files exist, declare the conformance test, are table-driven, and the chaos CI job is wired. See D-137.

79 — Performance benchmarks

Goal. Engine throughput (envelopes/sec under N runs); bus fan-out (subscribers vs latency); memory-strategy latency (truncation vs rolling_summary). Acceptance. Baseline numbers committed; perf regression threshold gates PRs (e.g. > 10% slowdown blocks). Tests. go test -bench. Deps. 10, 12, 05. Status. Shipped (D-136 — test/benchmarks/ suite over engine / bus / memory against real components; docs/perf/baseline.txt committed; scripts/perf/check-regression.sh benchstat gate wired into CI as the perf-regression job — fails on a statistically-significant slowdown past a noise-tolerant 30% threshold, an empirical calibration of the master plan's illustrative "10%"; make bench / make bench-check; phase plan phase-79-performance-benchmarks.md).

80 — Documentation hygiene polish

Goal. Every package has a doc comment; every exported symbol has godoc; example agents in examples/; recipe docs (docs/recipes/). Acceptance. golangci-lint's revive exported and package-comments clean; examples/ builds end-to-end. Tests. Lint + example builds in CI. Deps. All V1 phases. Status. Shipped (D-138 — the revive exported / package-comments documentation lint gate is now ENFORCED in CI: the lint job installs golangci-lint v1.64.8 and runs make lint-revive, which uses the dedicated .golangci-revive.yml config — previously make lint silently skipped because the binary was never installed. The exported rule keeps godoc-presence enforcement but gains disableStutteringCheck so the ~20 cross-package type renames the stutter sub-check would force stay out of a docs phase; the genuine doc gaps the rule surfaced — eight detached package comments, two malformed package comments, a handful of un-commented const/var blocks — are all fixed. examples/ gains worked, buildable code — examples/agents/echo/ (a harbortest.Agent + test) and examples/tools/weather/ (an inproc.RegisterFunc tool + register→resolve→invoke test) — exercised by a new CI examples job. docs/recipes/ ships five real-API-grounded how-to guides. The broader make lint backlog (~1000 issues across ~20 linters, accumulated while the gate silently skipped) is deliberately left to a separate release-hardening effort. Phase plan phase-80-documentation-hygiene-polish.md).

81 — Release engineering (versioning, changelog) (RFC §12)

Goal. Semver tagging, CHANGELOG.md, build provenance (SLSA-style attestations as a stretch). Acceptance. git tag v1.0.0-rc.1 produces a release artifact; CHANGELOG covers all V1 phases. Tests. Release dry-run. Deps. All V1 phases. Status. Shipped (D-139 — the product release version is stamped into the harbor binary at link time: cmd/harbor.HarborVersion becomes a var (a const cannot be -ldflags -X overridden), and scripts/release-build.sh — the single home of the build incantation — stamps it via go build -ldflags="-s -w -X 'main.HarborVersion=…'" from a git describe --tags-derived version, falling back to the v0.0.0-dev sentinel for an un-tagged build. The product release version is kept STRICTLY distinct from the Harbor Protocol version (internal/protocol/types.ProtocolVersion, RFC §5.3) — harbor version already prints both as separate fields; the two are versioned independently. CHANGELOG.md lands at the repo root in Keep-a-Changelog format, grouped by delivery wave / subsystem, covering every V1 phase (01–81 + the lettered phases). .github/workflows/release.yml triggers on a v* tag push — builds the CGo-free static binary, emits a SHA-256 checksum, attaches SLSA-style build provenance via GitHub's native actions/attest-build-provenance (the master-plan stretch — landed, not deferred, because the first-party action adds no framework dependency), and publishes a GitHub Release; a workflow_dispatch path runs the dry-run. scripts/release-dryrun.sh (the make release-dryrun target) is the master-plan "release dry-run" test — it exercises the exact release-build path with a synthetic version and asserts the artifact + checksum + version stamp, all without pushing a tag. Phase 81 creates NO v* tag — tagging is the operator's job in Phase 82. Phase plan phase-81-release-engineering.md.)

82 — V1 cut (RFC §1, §12)

Goal. v1.0.0 tag; release notes; migration notes (if any); blog/announcement scaffold. Acceptance. harbor version returns v1.0.0; preflight green; protocol conformance suite green; cross-tenant + leak harnesses green. Tests. Full preflight. Deps. 81.

Post-V1 follow-ups (83–90)

Listed for tracking. Not on the V1 critical path.

  • 83 — Auto-sequence detection. Skip the LLM call on deterministic single-tool transitions. Off by default. RFC §12. Deps: 45.
  • 83a — ReAct prompt structured sections. Refactor defaultBuilder to assemble the twelve XML-tagged sections from brief 13 §2.1 (<identity>, <output_format>, <action_schema>, <finishing>, <tool_usage>, <parallel_execution>, <reasoning>, <tone>, <error_handling>, <available_tools>, <additional_guidance>, <planning_constraints>); add WithSystemPromptExtra Option + PlannerConfig.ExtraGuidance config key; golden-fixture the default prompt. Foundation phase — 83b/c/d build on its section anchors. RFC §6.2. Deps: 45. See docs/plans/phase-83a-react-prompt-structured-sections.md.
  • 83b — ReAct tool schema injection (catalog rendering). Extend tools.Tool with Examples []ToolExample (tag-ranked minimal > common > edge-case); upgrade <available_tools> rendering to emit args_schema, side_effects, and curated examples per tool. Closes the args-validation-failure cascade caused by name+description-only catalog rendering. RFC §6.2, §6.4. Deps: 83a, 26. See docs/plans/phase-83b-react-tool-schema-injection.md.
  • 83c — ReAct dynamic repair guidance + planning hints. Add per-run RepairCounters{FinishRepair, ArgsRepair, MultiAction} on RunContext; render escalating reminder → warning → critical hints per turn when counters trip; wire RunContext.PlanningHints into <planning_constraints>. Closes the across-step feedback loop Phase 44 (per-step repair) leaves open. New decisions entry D-145 scopes counters to RunContext (not the planner struct) per D-025 concurrent-reuse contract. RFC §6.2. Deps: 83a, 44, 05. See docs/plans/phase-83c-react-dynamic-repair-guidance.md.
  • 83d — ReAct skills + memory injection (UNTRUSTED framing). Render RunContext.MemoryBlocks and RunContext.SkillsContext into the system prompt as separate llm.ChatMessage entries with the five-line anti-prompt-injection rule list from brief 13 §2.3. Distinct <read_only_external_memory> / <read_only_conversation_memory> wrappers preserved per tier; <skills_context> for pre-retrieved skill bodies. Serialisation failures fail loudly via ErrMemoryBlockUnserializable. RFC §6.2, §6.6, §6.7. Deps: 83a, 23, 37. See docs/plans/phase-83d-react-skills-and-memory-injection.md.
  • 83e — ReAct reasoning channel decoupling (capture-vs-replay). Drop Reasoning from Decision_CallTool; extend llm.CompleteResponse with Reasoning string; bifrost driver reads BifrostChatResponse.Choices[0].Message.ReasoningDetails — closing both the unary-path gap (today OnReasoning is streaming-only) and the Gemini-direct black hole (today bifrost populates reasoning_details[] on the message but Harbor drops it). Reasoning persists on TrajectoryStep.ReasoningTrace; replay is operator-controlled per agent via PlannerConfig.ReasoningReplay enum (never default for ALL models, text opt-in). No provider_native mode in V1 (Bifrost docs don't cover thinking-block round-trips). New decisions D-147 (schema narrowing) + D-148 (replay knob shape — two enum values, defer provider_native). RFC §6.2, §6.5. Deps: 45, 32, 33, 44. See docs/plans/phase-83e-react-reasoning-channel-decoupling.md.
  • 83f — Dev RunLoop populates the 83-band RunContext (D-149). Closes the Wave 15 §17.5 audit's W3/W4 (issue #208). cmd/harbor/cmd_dev_runloop.go::runOne now fetches the task's Query, session-scoped memory via MemoryStore.GetLLMContext, session-scoped skills via SkillStore.Search, allocates a per-run *RepairCounters, and projects operator-supplied planner.PlanningHints from the new planner.skills_context_max + planner.planning_hints YAML keys. Memory/skills fetch errors fail loud with MarkFailed(code=runtime_fetch_error); the LLM is never called on a degraded run. RFC §6.2, §6.6, §6.7. Deps: 83c, 83d, 23, 37, 20. See docs/plans/phase-83f-react-prompt-band-runtime-consumers.md.
  • 83g — MCP southbound consumer in harbor dev (D-150). Closes the parallel consumer gap for the Phase 28 MCP driver, surfaced during the 83f operator validation. cmd/harbor/cmd_dev.go::bootDevStack now iterates cfg.Tools.MCPServers[], spawns each via mcpdrv.New + Connect, discovers tools via Discover, and registers each ToolDescriptor on the tool catalog. Boot fails loud (mcp[<name>]: <stage>: <err>) on any connect / discover / register failure; the operator sees the error before the binary starts serving. The MCP Registry is constructed and populated so a small follow-up phase mounts the Console MCP-page surface without re-spawning. Devstack mirror per D-094; integration test spawns a real stdio subprocess via the cmd/harbor-mcptest-stdio test fixture. RFC §6.4. Deps: 28, 26. See docs/plans/phase-83g-mcp-dev-consumer.md.
  • 83h — Dev-binary fixes (D-151). Two hard-block bugs from the v1.1 operator validation: V1 — cmd/harbor/cmd_dev_hot_reload.go::shouldTrigger reboot-looped the binary every ~700ms on SQLite WAL/SHM/journal sidecars (fixed: extend with a dbSidecarSuffixes ignore list). V2 — internal/llm/safety.go rejected every real-bifrost request with CompleteRequest.Model is empty because the react planner never sets Model (fixed: default req.Model = c.cfg.Model before structural validation). The mock LLM driver used in every existing integration test does not enforce Model, which is why the gap escaped Wave 13/14/15 audits. Together unblock real-bifrost dev-binary runs. RFC §6.5, §8. Deps: 83g, 64, 32. See docs/plans/phase-83h-dev-binary-fixes.md.
  • 83i — RunContext wiring closure (D-152). Closes the four root causes of the Wave 17 operator-validation "64 steps, 0 tool calls" failure mode: (1) the steering RunLoop's default: case dropped every CallTool decision — fixed with a new steering.ToolExecutor seam + RunSpec.ToolExecutor field + trajectory-append on every dispatched step; (2) RunContext.Catalog was never populated — fixed with runtimeCatalogView projecting the production tools.ToolCatalog through a per-run identity filter; (3) heavy tool results (1.5 MB MCP responses) leaked verbatim into the LLM prompt — fixed with D-026-shaped artifact-store promotion in the dev binary's devToolExecutor (results above cfg.Artifacts.HeavyOutputThresholdBytes get stored + a small summary {tool, size_bytes, truncated, preview, artifact_ref} is rendered into the LLM observation); (4) MemoryStore.AddTurn had no production caller — fixed in runOne on FinishGoal. The runOne also populates RunContext.Emit so the planner's planner.decision / planner.finish / planner.repair_guidance_injected events reach the bus. Live validation: 2 LLM calls end-to-end against mcp-youtube. Devstack mirror per D-094. RFC §6.2, §6.6, §6.8. Deps: 83f, 83g, 83h, 26, 23. See docs/plans/phase-83i-runcontext-wiring.md.
  • 83n — harbor init + tiered yaml + docs/CONFIG.md + built-in tools (D-153). Introduces harbor init — the operator-facing entry point that drops a tiered, commented harbor.yaml plus AGENTS.md / CLAUDE.md / README.md companion files into a fresh directory. The yaml has three tiers: REQUIRED (identity + four commented LLM-provider examples — OpenRouter, Anthropic, OpenAI, NVIDIA NIM — all reachable through bifrost), COMMON KNOBS (memory, planner, tools, skills, governance) all commented with sensible defaults, and ADVANCED with a pointer to docs/CONFIG.md. Companion files explain the workflow (init → validate → scaffold → dev). Ships docs/CONFIG.md — the full operator-facing knob reference with one ### <yaml.path> heading per leaf field on Config{} — plus a Go test (internal/config/doc_drift_test.go) that fails CI when a new config field lands without documentation. Also ships the first two opt-in built-in tools at internal/tools/builtin/: clock.now (current UTC time) and text.echo (echo input verbatim). The new tools.built_in []string yaml field registers built-ins by name; the validator mirrors builtin.KnownNames() per the §4.4 seam pattern. bootDevStack calls builtin.Register(toolCat, cfg.Tools.BuiltIn) between catalog construction and the catalog-wiring step; devstack mirrors per D-094. RFC §8, §6.4. Deps: 67, 63, 26. See docs/plans/phase-83n-harbor-init.md.
  • 83u — Console DB chicken-and-egg fix (D-163). Closes round-2 walkthrough F3: the Connected Runtimes add-form on Settings called console_db::addRuntimeruntimes.upsert(...) on a DB that required an active RuntimeConnection to derive its per-operator AES key. Operator without a Runtime → no connection → DB stays closed → form threw "Console DB not open — attach to a Runtime first". The form was reachable (Phase 83p) but structurally non-functional. Fix: new attachConnection(baseURL, opts) helper in web/console/src/lib/connection.ts writes the harbor.runtime.* localStorage keys first (the operator's primary intent — "make the Console talk to this Runtime"); console_db.svelte.ts::addRuntime calls attachConnection() first, then attempts the DB upsert and degrades to a non-fatal warning if the DB is still locked. On the post-attach page reload, a new private #catchUpAddressBook() invoked from load() inserts the active connection into the address book if it's not already there (is_default: 1). Playwright test follows the disconnected-boot → form → reload → connected flow end-to-end. RFC §5, §7. Deps: 73m, 73p, 83p. See docs/plans/phase-83u-console-db-chicken-and-egg.md.
  • 83v — Runtime CORS allowlist (D-162). Closes round-2 walkthrough F4 — the showstopper that broke the D-091 multi-process Console+Runtime posture at the wire. The pre-83v grep -rn 'Access-Control\|cors' --include='*.go' returned zero matches anywhere in the Go codebase; cross-origin requests from a Console (:18790) to a remote Runtime (:18080) were blocked at preflight. Fix: new internal/protocol/transports/cors/ package with Wrap(next http.Handler, cfg Config) http.Handler middleware; new ServerConfig.AllowedOrigins []string + ServerConfig.CORSDevAllowAny bool yaml fields. Default deny (empty list = no CORS headers = same-origin only). Per-origin echo of the request's Origin header after exact-match allowlist check (never * in production). Access-Control-Allow-Credentials: true (incompatible with * per CORS spec, which forces the per-origin shape). Validator rejects * in the allowlist unless server.cors_dev_allow_any: true is also set; when the dev flag fires, every boot prints a stderr banner [DEV-ONLY CORS WILDCARD — DO NOT USE IN PRODUCTION]. Middleware wraps both REST + SSE handlers in cmd/harbor/cmd_dev.go::bootDevStack; devstack mirror per D-094. Integration test exercises cross-origin preflight end-to-end against an httptest origin. docs/CONFIG.md documents both fields with the production-security note. RFC §5, §7. Deps: 60. See docs/plans/phase-83v-runtime-cors.md.
  • 83w — Wire-surface gaps (D-164). Closes round-2 walkthrough F5 + F6 — two wire-surface gaps surfacing as scary red ERROR PageStates on the operator's most-used debugging surfaces. F5 (Console side): topology.snapshot returns unknown_method on a planner/RunLoop runtime (no engine graph); the Live Runtime + Playground pages routed the error through PageState's red ERROR branch with a Retry button that always failed. Fix: new 'info' branch added to PageState's PageStatus union (additive to disconnected/loading/error/empty/ready); new isUnknownMethod(err) helper in web/console/src/lib/protocol/errors.ts; both pages special-case unknown_method → route to PageState's info branch with "Topology view not available on this Runtime — planner/RunLoop runtime, not engine-graph" copy + no Retry button (retry is meaningless for not-applicable surfaces). F6 (Go side): mcp.servers.list was missing from the Runtime's wire surface even though the *mcp.Registry was already constructed at boot (Phase 83g) and Tools page rendered six youtube tools fine — just no method handler. Fix is wiring-only: cmd/harbor/cmd_dev.go::bootDevStack constructs the Phase 73k MCPSurface from the boot-time registry + threads it into transports.NewMux via transports.WithMCPSurface(mcpSurface); new mcpconsole.NoOAuthAccessor provides the read-only access pattern for V1 harbor dev (no OAuth providers); OAuth-flow methods fail loudly with ErrNoOAuthConfigured per §13. Devstack mirror per D-094. Integration test asserts the wire surface returns 200 (not unknown_method). RFC §5, §6.4, §7. Deps: 83g, 83m, 73k. See docs/plans/phase-83w-wire-surface-gaps.md.
  • 83x — Real-data layout polish (D-165). Closes round-2 walkthrough W4-W11 + N11-N14 — the "every page has a paper cut" backdrop. Twelve items spanning the Console (memory key ellipsis W4; artifacts grid layout 1fr × right-rail W5; tasks kanban Complete column W7; events empty-state names events.driver: durable W9; live-runtime session status derived from strip aggregate W10; agents synthetic-default copy W11; overview "(now)" suffix N11; tools "In-flight (now)" relabel N12; tools reliability column width token N13; live-runtime pillar labels "(now)" N14) plus two cross-stack Go-side fixes (§17.6): W6 — Artifact.created_at was the Go zero value 0001-01-01T00:00:00Z because two call sites populating the storage Source map silently omitted the timestamp — fixed at cmd/harbor/cmd_dev_executor.go::projectForLLM (heavy-tool promotion, time.Now().UTC()) + internal/protocol/artifacts.go::handlePut (artifacts.put upload, s.clock() so unit tests with injected clock stay deterministic). W8 — SessionRegistry held zero rows under the dev token so the Console Sessions page rendered "No sessions match these filters" even mid-task; fixed by bootDevStack opening the dev session right after constructing the registry, swallowing ErrSessionAlreadyOpen for idempotency. RFC §5, §6.4, §6.6, §6.10, §6.13. Deps: 73m, 73p, 83i, 83m. See docs/plans/phase-83x-real-data-layout-bugs.md.
  • 83s — Saved-views label + per-page footer dedup (D-161). Closes Nits N2 + N7 from the post-83k visual walkthrough. Settles on the canonical pair "Save view" (button) + "Save current as…" (input placeholder) across every page that surfaces a saved-view save gesture (eight pre-83s phrasings drifted to two). Removes every per-page inline Disconnected · no Runtime attached indicator — the viewport-fixed ConnectionFooter is the single source of truth; pages now show ONE disconnected indicator per viewport instead of two stacked ones. RFC §5. Deps: 73m, 73p. See docs/plans/phase-83s-savedviews-and-footer-dedup.md.
  • 83r — Disconnected-state hygiene (D-160). Closes Walkthrough Bugs W1 + W2 + W3 + Nits N4 + N5 + N8 + N9 + N10. New isDisconnected() predicate + DISCONNECTED_TOOLTIP constant in web/console/src/lib/connection.ts — every page composes it via $derived(connection === null) locally. Action buttons + filter controls reach the same predicate (disabled + tooltip when disconnected; W2/W3). The Overview Cost Rollup card stops rendering synthetic $0.00 data when disconnected (W1). The Tools page renders ONE empty-state message instead of two (N5). Agents KPIs use matching Tools (N4). MCP Connections status chips desaturate via a new desaturated prop that flips data-kind to neutral (N8). Artifacts subtitle reads "— no Runtime attached" when disconnected (N9). <PageState> adds vertical-centring CSS (min-height: 40vh) so empty-state placeholders centre in the viewport instead of hugging the top (N10). New web/console/tests/disconnected-state.spec.ts Playwright spec covers the cluster. Bundled §17.6 fix: pre-83r ESLint break in settings/+page.svelte:94 (Phase 83p placeholder) — fixed inline. RFC §5. Deps: 73m, 73p, 83p. See docs/plans/phase-83r-disconnected-state-hygiene.md.
  • 83q — Playground sidebar nav + breadcrumb (D-159). Closes Bug F2 + Nit N1 from the post-83k visual walkthrough. The Console's (console)/+layout.svelte defines the NAV constant (cluster → items) AND derives the breadcrumb's crumbLabel from the same NAV by URL-segment match — so adding { label: 'Playground', href: '/playground' } to the EXECUTION cluster fixes F2 (Playground unreachable from sidebar) AND N1 (lowercase playground breadcrumb) in one stroke. Also rewrites docs/design/console/CONVENTIONS.md §2 which explicitly declared "Playground is NOT a sidebar entry" (a stale Phase 73n design call). Playwright tests bump cardinality from ≥13 to ≥14 sidebar links + explicit Playground assertion. RFC §5, §7. Deps: 73n. See docs/plans/phase-83q-playground-sidebar-nav.md.
  • 83p — Settings two-group layout (D-158). Closes Bug F1 from the post-83k visual walkthrough: the Settings page wrapped its WHOLE cards loop in <PageState>, so the disconnected state short-circuited every section to the "Not connected — Attach one in Settings" placeholder — hiding the Connected Runtimes add-form an operator needs to fix the disconnection. SettingsState.load()'s docstring already documented the intended split ("Console-local sections do NOT depend on the runtime posture"); the template ignored it. Fix: each SETTINGS_SECTIONS entry now carries a group: 'console-local' | 'runtime-posture' discriminator; the page template renders console-local sections (Connected Runtimes, Per-Runtime Auth, API Tokens, Appearance, Time & Locale, Keybindings, Notifications Routing) unconditionally and routes only runtime-posture sections (Runtime Info, Governance Posture, Storage Drivers, LLM-Provider Posture, About) through <PageState>. Playwright test extension asserts the add-form is reachable + the input fields render in the disconnected state. RFC §5, §8. Deps: 73m, 73p. See docs/plans/phase-83p-settings-add-runtime-form.md.
  • 83k — Console release embed (D-157). Closes the operator-validation gap where cmd/harbor/consoledist/* is gitignored (only .gitkeep committed) — a fresh git clone + go build ./cmd/harbor (or go install github.com/.../cmd/harbor@latest) produces a binary that embeds an EMPTY Console bundle and serves the synthesized placeholder page. Fix: (1) make build now runs make console-build as a prerequisite (operators who git clone && make build get a working binary the first time); (2) a new make build-fast preserves today's no-Console-rebuild path for iterative dev work; (3) scripts/release-build.sh rebuilds the Console before go build, so tagged-release artifacts always carry a fresh Console; (4) a new scripts/check-console-bundle.sh staleness gate (wired into the CI frontend-e2e job) asserts two consecutive make console-build runs produce byte-identical outputs (catches non-determinism in the build); (5) the placeholder page copy is reworded with the exact rebuild commands + a go install workaround + a pointer to harbor init for first-time operators + a link to docs/CONFIG.md. The "first-run reach" polish half (favicon, brand tokens, empty-state copy) is intentionally deferred to a post-walkthrough follow-up — shipping release-embed first means the visual walkthrough runs against the fixed binary. RFC §5, §8. Deps: 73m, 81, 83n. See docs/plans/phase-83k-console-release-embed.md.
  • 83m — WARN cleanup band (D-156). Closes the eight WARN-tier items the §17.5 audit + Wave 17 operator validation surfaced. Item 1: MCP Config.DefaultIdentity reuse across pushes is now a fallback — the driver prefers identity.From(ctx) per call (multi-isolation footgun closed). Item 2: hot-reload watcher's dbSidecarSuffixes extended with .sqlite + .db main files (the reboot-loop was slower without it, not absent). Item 3: bootDevStack appends agentRegistry.Close + draftStore.Close to the closer chain (goroutine + file-handle leak on shutdown closed). Item 4: extractSkillKeywords helper drops English stopwords + 1-char tokens + dedupes before skills.Search (FTS5 ranker now sees keyword-shaped queries, not full sentences). Item 5: internal/llm/safety.go::Complete prefers c.cfg.Timeout over the 5-minute default (operator's harbor.yaml timeout was being silently ignored). Item 6: new tools.granted_scopes []string yaml field replaces the runloop's hard-coded nil pass to newRuntimeCatalogView (catalog filter now actually applies operator-declared scopes). Item 7: tasks.Task.ToolCount field + TaskRegistry.IncrementToolCount(ctx, id) error method (with conformance + N=128 D-025 concurrent-reuse test) + projectRow projection + runloop wiring closes the dead prototypes.Task.ToolCount wire field (Console rendered 0 forever before). Item 8: RunContext.OnReasoning func(string) callback (option b design — keeps the Decision sum sealed, treats reasoning as per-step observation rather than per-step instruction) + runloop's per-step closure capture + Step.ReasoningTrace copy on trajectory append makes Phase 83e's ReasoningReplay=text mode structurally effective in production for the first time. Shipped via two parallel worktree-agent buckets per §17.7 cadence + coordinator integration. Devstack mirror per D-094 for items 1, 3, 4, 6, 7, 8. RFC §6.2, §6.4, §6.5, §6.8, §8. Deps: 83g, 83h, 83i, 83l. See docs/plans/phase-83m-warn-cleanup.md.
  • 83l — Real-bifrost integration tests + production-bug fix (D-155). Closes the audit lesson D-151 named verbatim: every existing dev-binary integration test used the mock LLM, which masked two real-bifrost+real-stack bugs through Wave 13/14/15. Ships test/integration/phase83l_real_bifrost_test.go — two tests + a scripted OpenAI-compatible httptest.NewServer helper (scriptedLLMServer) — exercising the full cmd/harbor-shape stack (bifrost driver, safety/correction/retry wrapper chain, react planner, steering RunLoop, ToolExecutor seam, trajectory append, memory writeback). The first run of the tests immediately surfaced a latent production bug: cmd/harbor/cmd_dev.go::bootDevStack constructed the llm.ConfigSnapshot WITHOUT cfg.LLM.CustomProviders, cfg.LLM.NetworkDefaults, or cfg.LLM.Corrections — an operator who declared a custom provider (NIM / vLLM / ollama / in-house gateway) would pass config validation but fail at boot with bifrost: invalid provider … declared custom: (none). Fix lands in this PR per §17.6 (three new projection helpers + the snapshot wiring; D-094 mirror in harbortest/devstack/devstack.go). Exactly the failure mode D-151 predicted; the mock LLM hid it; the integration test caught it on first contact. RFC §6.5. Deps: 33, 33a, 45, 83h, 83i. See docs/plans/phase-83l-real-bifrost-tests.md.
  • 83o — scaffold reads operator yaml + per-custom-tool Go stubs + --patch (D-154). Closes the operator workflow Phase 83n opened. harbor scaffold now reads the operator-edited harbor.yaml (explicit --from-config <path> or auto-detected ./harbor.yaml), copies it verbatim into the output project (operator's comments + uncommented LLM block survive), and fans out one tools/<name>.go + matching _test.go per entry under a new tools.custom yaml field. Each custom tool stub carries typed Input/Output structs derived from the operator's flat field: type declarations (string / integer / number / boolean / []string) plus a TODO: implement Handle. The generated agent.go includes a RegisterTools(cat tools.ToolCatalog) error function that registers each built-in (from tools.built_in) + each custom tool's Handle so the operator's binary bootstrap is one call. A new --patch flag relaxes the refuse-overwrite default: existing files are skipped (Skipped slice on Result), only new tools materialise. The validator rejects name collisions between tools.custom and tools.built_in and rejects unknown shorthand types. docs/CONFIG.md documents tools.custom; the doc-drift gate caught the missing heading mid-implementation. RFC §8, §6.4. Deps: 67, 83n, 26. See docs/plans/phase-83o-scaffold-from-yaml.md.
  • 84 — Reflection / critique loop. Optional per planner. Self-critique before Finish. RFC §12. Deps: 45.
  • 84b — Multimodal attachment disposition policy (D-189). Turns the hardcoded MIME→disposition switch in materializeOne into declared policy: an AttachmentDisposition enum (ref / inline / provider_native / tool:<name>) resolved per-attachment caller hint > per-agent policy map > runtime default (ref) — the layers are semantic, the carriers (Protocol hint, harbor.yaml) are thin adapters over a planner-homed policy core (DispositionPolicy + the exported pure ResolveDisposition), so a headless library consumer authors the same policy directly. The default is byte-for-byte unchanged from today — so the existing developer-controllable ArtifactStub+Fetch.Tool path stays first-class for Playground, Protocol, and third-party clients, and 84c's provider-native upload becomes opt-in rather than forced. Ships no provider mechanism and no embeddings. §13 consumer: the materializer + the same-wave 84c. RFC §6.4, §6.5, §6.10. Deps: F11/D-166, 107c. Same wave as 84c. See docs/plans/phase-84b-multimodal-disposition-policy.md.
  • 84c — Provider-native multimodal mechanism (D-190). Implements the provider_native disposition: the bifrost driver hands an over-threshold attachment to the provider's own understanding via FileUploadRequestfile_id (already on core@v1.5.15), performed inside Complete so LLMClient stays one method. Priority order is deliberate — image/audio/video first (regain vision/audio/video capability the stub path loses), PDF/documents last (the ref/tool+84d route is preferred for docs). Adds the part-level ProviderNative flag (settable by any CompleteRequest builder — the driver is the ONLY seam; the run loop never pre-uploads), optional ProviderFileID/DocumentType content fields, a driver-owned identity-scoped file_id cache + lifecycle (TTL/evict + Close-time cleanup; observability via the llm.provider_file.uploaded event), and the streaming-with-multimodal residual that Phase 107's row forward-referenced (in 107's req.Stream+llm.completion.chunk vocabulary). ArtifactStub stays the universal degradation. Opt-in via 84b; never the default. Shipped deviation (§4.3, D-190): the optional run-loop cancel hook is not wired — the wrapped LLMClient chain would need a forwarding method through five wrapper layers; the driver-owned lifecycle (TTL/LRU evict + Close sweep, with per-key fill coalescing) is the authority, per the SDK-lens C3 guidance. RFC §6.5, §6.10, §11Q3. Deps: 84b, 107, 32. Same wave as 84b. See docs/plans/phase-84c-provider-native-multimodal.md.
  • 84d — Embedding client + semantic retrieval (D-191). Adds Harbor's first embeddings capability — an Embedder §4.4 seam wired to bifrost's EmbeddingRequest — with its §13 in-wave consumers being semantic memory retrieval and semantic skill retrieval (the direction set by the owner; not a standalone RAG tool). Both are opt-in modes composing with (not replacing) rolling_summary memory + token-savvy skill retrieval; vectors persist in the existing stores (brute-force similarity at V1 scale; ANN deferred). This is the primitive that makes 84b's ref/tool document path powerful. Requires a §6.5 RFC addendum (the Embedder seam) landed in the same PR. RFC §6.5, §6.6, §6.7. Deps: 32, 23, F11/84b. Follows the 84b+84c wave. See docs/plans/phase-84d-embedder-semantic-retrieval.md. Shipped as planned with four recorded §4.3 deviations (see D-191): the seam lives in internal/embeddings with the driver blank-imported via the D-196 internal/drivers/prod aggregator (the plan's pre-110c "blank-import at cmd/harbor" wording); the interface carries a lifecycle Close alongside Embed; the skills-side injection resolved to the store seam (skills.Deps.Embedder + localdb's semantic Search path) rather than a tool-constructor seam; the §6.5 addendum pre-landed with the D-189 plans PR, so this PR's RFC delta is the D-191 contract sentence + the §6.6/§6.7 consumer text.
  • 84e — Semantic memory consumption in the run loop (D-211). Closes the gap 84d (D-191) left by design: MemoryStore.SearchTurns shipped with store/SDK consumers only — nothing in the run loop called it, so the agent never semantically recalled earlier conversation turns and the planner prompt's memory injection (the 83d path) stayed rolling-summary-only. 84e makes the run loop, when memory.retrieval: semantic is on (the ONLY switch — no second knob), search the session's embedded turns with the task query and inject the top-k recalled turns into the prompt's <read_only_external_memory> tier — the planner.MemoryBlocks.External slot that was nil on every production path since 83d, so the planner gains zero new surface and recall inherits the UNTRUSTED framing for free. Composes with rolling_summary (the Conversation tier is byte-untouched); mode off → byte-for-byte prompt parity, zero embedder traffic. The fetch+recall step is promoted to ONE home, runctx.FetchMemoryBlocks, consumed by both cmd/harbor's runOne and the devstack mirror — the per-step GetLLMContext mirror copies collapsed. Knobs ride the existing memory block (retrieval_top_k reused; new retrieval_min_score similarity floor, default 0.0, range [-1,1], validated) via a 110c-shaped memory.RecallFromConfig exporter with field-parity test. Degradation posture is fail-loud: a recall error fails the run (runtime_fetch_error, LLM never called) — never a silent fall-back to rolling-summary-only. Recalled turns are text-only and per-turn capped (2 KiB per side; D-026 guard stays the backstop). Records the deferred sibling: a memory.search Protocol method must precede any Console memory-search page (D-062 ordering) — parked for post-109 planning. RFC §6.2, §6.5, §6.6. Deps: 84d, 83d, 83f, 110b, 110c, 107c. Shipped as planned. See docs/plans/phase-84e-semantic-memory-runloop.md.

110-band — Wave B SDK re-homing (production semantics out of cmd/harbor)

The 2026-06-09 SDK friction audit (docs/notes/sdk-friction-audit.md; program entry D-193) found a package-main stratum of production semantics that lives only in cmd/harbor with an already-diverged D-094 devstack mirror (two shipped/live silent-field-drop bugs — D-155, B3 — plus a degraded executor, a silently-dropped MCP ToolPolicy projection, and missing Emit/OnChunk/envelope wiring on the official test surface). The 110-band promotes that stratum into reusable internal/ packages and collapses the mirror to thin callers — every promotion deletes a devstack copy in the same phase (§13 primitive-with-consumer + §17.6 fix-both-sides). Staging: Stage 1 = 110a ∥ 110c (independent), Stage 2 = 110b ∥ 110d (after Stage 1 merges). The band is module-internal; the external-module facade (the audit's Wave D) is a future RFC-level program for which 110d is the named prerequisite.

  • 110a — Tool-executor promotion (SHIPPED — D-194). Promotes the only production steering.ToolExecutor (cmd/harbor/cmd_dev_executor.go, ~660 lines: catalog dispatch, D-026 heavy-result artifact promotion, CallParallel via internal/runtime/parallel, SpawnTask/AwaitTask with depth caps) to internal/runtime/dispatch.NewToolExecutor(catalog, artifacts, tasks, opts...). Also exports the Phase-106 answer envelope {answer, finish_reason, tool_calls_seen} + terminal task error-code constants as planner.AnswerEnvelope et al. (home picked by import direction — tasks stays planner-free), re-homes the catalog→planner view as tools.NewPlannerView (structural satisfaction of planner.ToolCatalogView; tools cannot import planner), re-points internal/planner/react/prompt.go's shape-contract citation away from cmd/harbor, switches the D-192 HITL E2E off its test-local executor shim onto the real promoted executor, and DELETES the devstack degraded executor (capability drift closed). D-025 concurrent-reuse test (N≥100, -race) mandatory. RFC §6.4, §6.5, §6.2. Deps: D-192 fix (merged), 107d, 107e, 83i. Stage 1, parallel with 110c. See docs/plans/phase-110a-tool-executor-promotion.md.
  • 110b — RunContext population + event-closure promotion (SHIPPED — D-195). Promotes the five RunContext-population helpers duplicated cmd↔devstack into internal/runtime/runctx (direction-safe: runtime imports planner/memory/skills; planner gains NO memory import): ProjectMemoryBlocks, ProjectSkillsContext, ExtractSkillKeywords + stopwords (the D-156 FTS5 query shaping — a third copy existed; godoc carries the "scheduled for deletion by Phase 111d (D-201); add no new consumers" notice per the owner's 2026-06-09 scope amendment), ExtractAssistantAnswer, and the D-166 ResolveInputArtifacts policy — following the planner.BuildArtifactManifest precedent. Adds events.IdentityStampingEmitter(bus, q, logger) for RunContext.Emit and llm.NewChunkPublisher(bus, q, taskID, logger) for OnChunk (the closures whose identity-envelope trap once produced 280+ bus-rejected chunks per task). cmd + devstack become callers; devstack ADDITIONALLY gains missing parity: Emit/OnChunk wired in its RunSpec and MarkComplete carrying the 110a answer envelope instead of an empty TaskResult{} (pinned by test/integration/phase110b_runctx_parity_test.go). Also wires the D-196 call-4 handoff one-liner: dispatch's spawn-depth default now references config.DefaultSpawnDepthCap. RFC §6.2, §6.5, §6.13. Deps: 110a, 83f, 83i, 83m, 107. Stage 2, parallel with 110d. See docs/plans/phase-110b-runcontext-population-promotion.md.
  • 110c — Config-projection exporters (SHIPPED — D-196). The five config→snapshot projections become exported helpers on the OWNING packages (settled direction: subsystem imports internal/config additively; config stays a leaf; the snapshot decoupling is preserved because FromConfig is optional sugar, never a required path): llm.SnapshotFromConfig (absorbing the four private copy* helpers — closing the D-155 recurrence class), memory.SnapshotFromConfig, skills.SnapshotFromConfig, planner.ConfigFromOperator (fixing the LIVE devstack drift B3 — four planner knobs silently dropped today, pinned by a reflection field-parity test), governance.ConfigFromOperator. Plus: config.Defaults() exported (hand-built configs start defaulted), planner-adjacent knob projections re-homed (skills_context_max default, planner.HintsFromConfig, spawn-depth default deduped), a headless validation profile (ValidateCore — config-without- binary stops demanding JWT identity fields), and ONE blank-import aggregator (internal/drivers/prod) imported by main.go and devstack — also closing devstack's missing-LLM-wrapper trap (no corrections/downgrade/retry on its chain today). cmd + devstack consume every projection; all duplicates deleted. Shipped as planned, plus one §17.6 cross-fix the parity gate surfaced: both copyModelProfiles copies silently dropped the per-model cost_overrides: / corrections: yaml (a third D-155-class drop); llm.SnapshotFromConfig maps both, pinned by the sub-struct parity tests. The spawn-depth constant is exported as config.DefaultSpawnDepthCap; the executor-side clamp (110a's internal/runtime/dispatch) references it at Stage 1 merge (parallel worktrees). RFC §6.5, §6.6, §6.7, §9, §10. Deps: 83l, 83f, 107d, 107e. Stage 1, parallel with 110a. See docs/plans/phase-110c-config-projection-exporters.md.
  • 110d — Assembly promotion (D-197). SHIPPED. Promoted devstack's tryAssemble shape into the exported, error-returning assemble.Assemble(ctx, *config.Config, Options) (*Stack, error) in internal/runtime/assemble; bootDevStack and devstack.Assemble(t, ...) are thin wrappers — the last of the D-094 subsystem-wiring mirror is collapsed. Promoted the remaining cmd-local assembly legs: mcpdrv.Attach next to the driver INCLUDING the config→ToolPolicy projection (the devstack silent drop is closed — regression pinned in phase83g's real-stdio-fixture E2E + the attach unit E2E), auth.BuildProviders (OAuth KEK→sealer→tokenstore→provider chain; per §4.3 it returns the provider map only — approval gates remain the catalog Builder's output, which the assembly invokes), and events.OpenWith(ctx, cfg, redactor, Deps{State}) + events.RegisterWithDeps so the durable event driver shares the runtime's StateStore through the factory path (recorded reconciliation: the assembly opens State BEFORE the bus so the shared store outlives it — production's pre-110d bus-first order swapped, behaviour preserved). The per-task run-loop driver (the task.spawned subscriber) stays per-caller (110b's seam); headless embedders drive Stack.RunLoop.Run directly. Ships docs/recipes/embed-harbor-headless.md, acceptance-gated by test/integration/phase110d_assemble_test.go (recipe path on real drivers + durable-store sharing + identity propagation + 2 failure modes + N=10 Assemble/Close cycles + N=100 concurrent runs on one stack). RFC §6.4, §6.13, §9, §10. Deps: 110a, 110b, 110c, 64, 83g, 30, 57. Stage 2, parallel with 110b. See docs/plans/phase-110d-assembly-promotion.md.

111-band — Wave C: finish (or formally defer) the half-shipped primitives (SDK friction audit §3)

The 2026-06-09 SDK friction audit (docs/notes/sdk-friction-audit.md) found a band of shipped primitives with zero production consumers anywhere — not even the dev binary — several behind config knobs that validate cleanly and then silently do nothing. These are standing §13 violations (primitive-without-consumer; test-stubs-as-defaults one layer up: seams never exercised under real call sites). The 111 band gives each its first production consumer or a recorded disposition. Staging: the band parallelizes freely after Wave B Stage 1 (110a + 110c) merges; 111a soft-depends on 110c; all six phases are mutually independent. D-numbers D-198–D-203 are reserved per phase (logged when each ships).

  • 111a — Governance enforcement assembly (D-198). SHIPPED. Closed the audit's headline gap: governance.SetFactory's only caller was a test; populated governance.identity_tiers drove the posture provider only — clean validation, zero enforcement. Shipped exported governance.NewSubsystemFromConfig(cfg, store, bus) (Compound in the documented MaxTokens→RateLimiter→CostAccumulator order; nil Subsystem on empty tiers preserving the D-044 latent default; ErrInvalidConfig on nil store/bus with tiers), called via SetFactory from the production assembly (assemble.Assemble, eager build → fail-loud boot → factory installed BEFORE llm.Open; empty tiers clear the factory and Stack.Close clears it again); consumes 110c's governance.ConfigFromOperator; documents governance.Wrap as the multi-runtime headless escape (SetFactory-vs-per-Open evaluated + decided: keep the global seam for the binary); removed the Wave A posture-only boot warning (§4.3 correction: the warning lived in assemble.go post-110d, not internal/config/validate.govalidateGovernance never carried one); E2E proves a configured tier actually rejects/limits (cost + rate + MaxTokens each exercised against the real assembled stack, with identity-propagated governance.* events, cross-session isolation, and D-025 concurrency). RFC §6.15, §6.5, §6.11. Deps: 32, 36a, 36b, 110c (soft). See docs/plans/phase-111a-governance-enforcement-assembly.md.
  • 111b — Tool-OAuth completion leg (D-199). SHIPPED. auth.CallbackHandler (state→PendingFlowCompleteFlow→typed 404/410/400/502 error mapping; no secret material in responses or logs) is CompleteFlow's first production caller, mounted by harbor dev at GET /v1/tools/oauth/callback (devstack mirrors; both over assemble.Stack.OAuthProviders — D-197) and mountable headless on any mux. The full choreography E2E (test/integration/phase111b_oauth_completion_test.go): gated tool → tool.auth_required + pause.requested → authorize → 302 redirect onto the handler → CompleteFlowpause.resumed{Decision: resume} → run re-enters and the tool succeeds USING the minted token; failure modes: expired flow (410 + pause parked), replayed callback (404 by consumption). §4.3 refinements recorded in D-199: PendingFlow returns (PendingFlowInfo, bool); DenyFlow added (upstream denial → DecisionReject resume); both on the OAuthProvider interface; the run-level re-entry rides the existing steering RESUME surface (the recipe documents the honesty note). Recipe: docs/recipes/steer-and-resume-a-run.md. Closed the InitiateFlow/CompleteFlow primitive-without-consumer pair (§13). RFC §6.4, §3.3, §6.3. Deps: 30, 50, 31, the D-192 steering fix. See docs/plans/phase-111b-tool-oauth-completion.md.
  • 111c — Durable pauses + pause lifecycle (D-200). WithCheckpointStore has zero production consumers (both assemblies construct the Coordinator storeless with the StateStore in scope); requestPause persists Trajectory: nil; no pause GC exists — DecisionTimeout has no producer and cancel-while-paused orphans records forever. Threads the run's Trajectory into requestPause; wires WithCheckpointStore(stateStore) in both assemblies; pause→new-Coordinator-over-same-store→Resume→trajectory-restored E2E (+ the §11 ErrUnserializable fail-loud test); ships WithMaxParkDuration + the exported pauseresume.RunSweeper emitting pause.resumed with DecisionTimeout — its first producer — started config-gated by the one merged assemble.Assemble site (cmd + devstack inherit as thin callers). Shipped deviation (§4.3, recorded in the plan + D-200 §5): the sweeper's SCAN is package-internal over the Coordinator's registry rather than Coordinator.List — List is §6-identity-scoped with no all-tenants wildcard (the plan's Risks anticipated exactly this); every MUTATION still goes through the public Resume under the pause's own scope. Timeout is terminal: the parked run finishes Finish{ConstraintsConflict} via the bus-event wake + Status.Decision fallback. RFC §3.3, §6.3, §6.11. Deps: 50, 51, the D-192 steering fix. See docs/plans/phase-111c-durable-pause-lifecycle.md.
  • 111d — Skills canonical surface + ingestion (SHIPPED — D-201). Three intertwined gaps closed: the rich Phase-38 planner tools (capability filter, redaction, budgeter) + Phase-41 generator were registered nowhere while production registered a thinner parallel builtin implementation (the §13 two-implementations smell); Skills.md ingestion had no shipped path; the Phase-39 Directory had only test consumers. Shipped: builtin skill_search/skill_get delegate to the exported Phase-38 handlers (duplicate bodies deleted; filter/redaction/budgeting on production; the capability envelope is server-computed from the run's visible-tool set via tools.VisibleNames, never LLM-supplied); skill_list + skill_propose register through the same carrier (skill_propose opt-in rides the existing tools.built_in names list — the plan's sketched …skill_propose.enabled key was dropped as a second parallel enablement mechanism, §4.3 deviation in the plan); harbor skill import/rm ship over exported importer.ImportAndStore (+ §18 SKILL.md restoration in the same PR); the Directory was WIRED per the recorded owner decision (2026-06-09) as the <skills_context> producer — runctx.ExtractSkillKeywords deleted per its D-195 deprecation notice; skills.directory.{pinned,max_entries,selection} config block added. Note: the plan's "stale cmd/harbor/main.go:76-90 promises" had already been rewritten by 110c — the surviving stale text lived in internal/skills/tools' package doc + internal/drivers/prod's honesty notes and was replaced there. RFC §6.7, §8. Deps: 37–41, 107c. See docs/plans/phase-111d-skills-canonical-surface.md.
  • 111e — Trajectory compression consumer (SHIPPED — D-202). planner.Summariser has only test implementations; MaybeCompress has zero call sites; Budget.TokenBudget is dead on every path — while the consumer half (the React prompt's Summary != nil branch) is already wired. Ships a real LLM-backed TrajectorySummariser (in internal/llm/summarizer, distinct from the unrelated memory.Summarizer — do not conflate), the MaybeCompress integration in steering.RunLoop's step loop gated on TokenBudget > 0, the planner.token_budget config → RunSpec.Budget production wiring, and the godoc un-dormanting. Long-trajectory E2E: compression fires, the prompt shrinks, the run completes correctly on summary-carried context. Scoped tight: one compression per run, no auto-cascade. RFC §6.2, §6.5. Deps: 46, 35, 107, the D-192 steering fix. See docs/plans/phase-111e-trajectory-compression-consumer.md. Shipped (D-202). One recorded §4.3 deviation: the compaction payload renders the trajectory's planner-facing projection (per-step action + LLMObservation, per-fragment capped) rather than the raw Serialize bytes — raw observations may carry heavy content that must never reach the LLM edge (D-026 / ErrContextLeak); the budget estimator still measures the full serialized trajectory.
  • 111f — Telemetry assembly + approval-gate authorizer seam (SHIPPED — D-203). Two halves, both closed. Telemetry: pre-111f, telemetry.New (redactor-mandatory, identity-attribute, bus-paired Logger) had zero production callers — cmd booted bare slog; engine.WithRunErrorHandler had no production caller; metrics got BridgeBusToMetrics but traces got no bridge and NewTracer was never constructed despite the blank-imported exporters. Shipped: telemetry.New + the Stack.RunErrorHandler wired into the production assembly (assemble.Assemble; cmd + devstack inherit as thin callers), telemetry.BridgeBusToTracer started symmetric with the metrics bridge (lifecycle-pair span model, DefaultTraceBridgeFilter() volume guard, both on the closer chain), and docs/recipes/observe-an-embedded-runtime.md. Approval: pre-111f, ResolveApproval hard-required internal/protocol/auth scopes and the runtime's own steering bridge self-elevated to pass its own gate — wire-layer auth vocabulary in an in-process control path. Shipped: the injected GateDeps.Authorizer seam (runtime identity/control-scope default approval.NewIdentityAuthorizer(); protocolauth via the server-side ProtocolScopeAuthorizer adapter), the protocol/auth import removed from internal/tools/approval (the steering bridge's self-elevation deleted outright), and the direction rule recorded: runtime may import protocol TYPES, never protocol auth/methods/transports (see D-203's 2026-06-10 addendum for the <area>/protocol adapter carve-out + the named internal/search standing violation). Three recorded §4.3 deviations (D-203): (a) assemble.Options gains TelemetryOptions / TracerOptions / ApprovalAuthorizer (the MetricsOptions precedent — the union of real-caller needs); (b) the Protocol-side adapter injects via assemble.Options rather than the plan's "server-side gate assembly" because gate assembly lives in the ONE D-197 fan-out (the adapter stays owned by internal/server); (c) engine/options.go's godoc rewritten, not merely "now true" (the Wave A honesty text had said "no production assembly installs one today"). RFC §6.14, §6.4, §5.1. Deps: 03, 04, 05, 31, 55, 56, the D-192 steering fix. See docs/plans/phase-111f-telemetry-assembly-approval-seam.md.

112-band — Wave D: the public SDK facade (D-204)

RFC §3.6 settles the design (alias-based sdk/ tree; curation over moves; D-204 records the rationale). The wave's §13 pairing: 112a ships the facade, 112b is its consumer in the same wave.

  • 112a — The public SDK facade (SHIPPED — D-205). The sdk/ tree of alias-based re-exports per RFC §3.6's inventory; the facade-integrity test runs the headless recipe via sdk/ imports only (grep-gated zero internal/ imports; deterministic-planner override over an offline custom-provider bifrost client so the path runs in CI without network or the mock driver); sdk/drivers/prod parity with the internal aggregator by construction (its only content is the internal aggregator's blank import); AGENTS/CLAUDE §3 amendment. No moves, no mechanism — forwards only, with ONE documented carve-out (sdk/tools/inproc.RegisterFunc, a generic wrapper Go cannot express as a var forward; smoke-gated as the sole func in the tree). See docs/plans/phase-112a-sdk-facade.md.
  • 112b — External consumers + the compile gate (SHIPPED — D-206). Scaffold templates emit sdk/ imports (the tool-declaring output compiles AND tests green as an external module — the audit's headline external break, now gated by scripts/smoke/phase-112b.sh: scaffold → replace directive → go build, bounded + self-tested, plus an external harbortest go test probe); harbortest vocabulary externally satisfiable via the aliases with signatures unchanged and zero kit constructors; the five consumer-facing recipes + README flipped to sdk/ paths (grep-gated). Two recorded §4.3 calls (D-206): (a) phase-67's smoke keeps the toolless build-check and this gate owns the tool-declaring shape (no duplication); (b) the conversions flushed out additive facade extensions — sdk/{audit,telemetry,telemetry/eventbus,governance,tools/auth,skills/{importer,tools,generator}} + sdk/tools.ErrorClass (RFC §3.6 item 3 amended) — while sdk/pauseresume was deliberately NOT added (D-205's curation; the steer recipe reworked to the config-driven assemble shape). Wave D and the SDK re-homing program close here. See docs/plans/phase-112b-external-consumers.md.

113-band — the Protocol adoption track on the docs site (D-209 / D-210 reserved)

docs/notes/protocol-docs-proposal.md (owner-approved; merged as PR #305) is the binding design: the Protocol is Harbor's ecosystem surface (RFC §5.1 — "the same surface powers a remote attach, a third-party dashboard, or an IDE/TUI client"), but a client author today must read Go source to answer what methods exist, what events arrive with what payloads, what an error looks like, how auth works, what a version bump means. The band serves the proposal's four audiences (evaluator / client builder / event integrator / control integrator) with a docs-site track whose center of gravity is a generated, gen-check-gated contract reference — the house single-source discipline applied to adopter docs. The owner resolved the proposal's open questions per its recommendations: Q1 event catalog is registry-read at gen time (the generator imports internal/drivers/prod and reads the populated events.EventTypes() registry, payload shapes via the CanonicalWireTypes-style reflection + lockstep-test treatment); Q2 OpenAPI emission deferred (recorded as a stretch in 113a's non-goals); Q3 the conformance suite is documented as the certification path in 113b but its sdk-export waits for a real third-party ask; Q4 versioned docs deferred to the first breaking Protocol change (recorded in both plans' risks). §13 pairing: 113a ships the generator + gate, and its own choreography guides + executed quickstart are the consumers in the same phase; 113b consumes 113a's reference pages (lockstep greps) and closes the track.

  • 113a — the floor (D-209, logged at ship; Shipped). cmd/harbor-gen-protocol-docs emits methods.md / events.md / errors.md / types.md into docs/site/protocol/ from the canonical sources (methods.go + the transports' *RoutePattern tables + IsControlMethod/cluster predicates + auth scopes; the Q1 registry-read event catalog; errors.go; CanonicalWireTypes) under generated-file headers; make protocol-docs-gen-check (git diff --exit-code) wired into the docs workflow — the gate shape D-093 specified, built here for a generator that actually exists (the TS generator stays deferred per D-132 / issue #179; no dependency); the "Speak Protocol in 15 minutes" quickstart whose curl steps the smoke EXECUTES against the preflight dev server (the recipe-cannot-lie pattern); choreography guides 1–3 (auth & identity incl. the D-171 session-blank model; streaming semantics; task control); the Protocol nav section + README Docs-table row; the §18 amendment putting the generated reference under the same-PR regeneration rule (AGENTS+CLAUDE, mirror-gated). See docs/plans/phase-113a-protocol-reference-and-quickstart.md. Shipped 2026-06-10 with two recorded §4.3 deviations (D-209 calls 3–4): control.HTTPStatus exported so the generated error page reads the wire transport's own code→status binding, and the executed quickstart's steering step accepts both documented wire outcomes (200 accepted / 404 not_found on a terminal run — the deterministic mock-path result, doubling as the §17.3 failure-mode leg).
  • 113b — the closer (D-210; Shipped). Choreographies 4–5: the pause model (pause.requested → approve / reject / OAuth-callback / plain resume; durable pauses across restarts; DecisionTimeout reaps — the wire view of RFC §3.3's unified primitive) and versioning & compatibility (RFC §5.3 made adopter-facing, incl. unknown-field tolerance and unknown-method 404/405 handling — the smoke SKIP convention promoted to adopter contract); the build-a-client guide around a ~150-line worked event-viewer at examples/protocol-clients/ (compile-gated in the smoke) with the hand-maintained TS wire-type module + the Console as reference implementations; the conformance-certification page (how to run internal/protocol/conformance, what passing claims — NO sdk-export per Q3). Deps: 113a. See docs/plans/phase-113b-protocol-choreographies-and-certification.md. Shipped 2026-06-11 with one recorded §4.3 deviation (D-210 call 2): the OAuth callback route lockstep-greps against the exported auth.CallbackPath source constant instead of the generated reference — the callback is a provider-redirect mount, deliberately not a canonical Protocol method, so it has no methods.md row. The pause guide's approve/reject/timeout wire examples are captured from a production-driver devstack assembly; the OAuth leg is transcribed from the handler + its tests and says so (D-210 call 1). A §17.6-posture docs fix rode along: task-control.md no longer claims task.paused/task.resumed fire on the live pause path (nothing calls MarkPaused/MarkResumed in production — a parked run's task stays running; pause.list is the authoritative park read).

85-band — MCP client/host compliance (prioritised first post-V1 work)

The integer Phase 85 (Skills Portico provider driver) is removed: Portico is an MCP gateway and speaks MCP like any server, so the generic MCP client driver consumes it — a Portico-specific driver would duplicate the driver and couple Harbor to one ecosystem tool. The 85-band closes Harbor's MCP-client-compliance gap (audit + decomposition in brief 14). This band is the first post-V1 work — ahead of 83/84 in execution priority.

MCP 2026-07-28 RC re-plan (effective 2026-05-28). The MCP Foundation published a release candidate locked 2026-05-21; final spec drops 2026-07-28; Tier-1 SDKs ship RC support within a 10-week window (≈ late July–early August 2026). The RC reshapes the 85-band:

  • Roots, Sampling, Logging are deprecated in the RC (annotation-only — functional 12+ months — but on death row). Phases that build operator-facing surface against them are cut.
  • Tasks moves from experimental core to an extension, redesigned: tasks/list removed; new method set is tools/call returns a task handle, then tasks/get / tasks/update / tasks/cancel. 85h's hand-transcription against 2025-11-25 would lock in the wrong shape.
  • Session handshake (initialize/initialized + Mcp-Session-Id) is removed, Streamable HTTP requires new Mcp-Method / Mcp-Name headers, error code -32002 flips to -32602, server-to-client requests restructure into InputRequiredResult. These cross-cutting changes land as a new sub-phase 85m.
  • Authorization hardens with six new SEPs (iss validation, DCR application_type, issuer-bound credentials, refresh-token docs, scope accumulation, .well-known clarification). 85b absorbs them; scope grows.
  • Cut phases (85c, 85e, 85h, 85i) keep their plan files as historical context — do not delete — but their Status reads Cut. The docs/decisions.md entry recording this re-plan is the canonical reference.
  • Lettering note: 85k (skills) already exists. The new RC-adoption sub-phase is 85m (skipping l to avoid l/I/1 ambiguity next to existing 85i).

Per-phase RC verdict + readiness:

  • 85a — MCP client core-compliance fixes. Pagination-truncation fix, *ListChanged handlers, resource Unsubscribe-on-close. The honest-empty roots capability advertisement is now permanent (not a stopgap; 85e is cut). RFC §6.4. Deps: 28. Ready now — uses go-sdk v1.6.0 surface that exists today; nothing the RC removes is in scope. See docs/plans/phase-85a-mcp-client-core-compliance.md.
  • 85b — MCP HTTP OAuth (scope ↑). Wire auth.Provider into the MCP driver; RFC 9728 protected-resource-metadata discovery; WWW-Authenticate 401 step-up; RFC 8707 resource indicators. Adds the six RC auth SEPs: SEP-2468 (iss validation per RFC 9207), SEP-837 (DCR application_type), SEP-2352 (credential binding to issuer + re-register on migration), SEP-2207 (OIDC refresh-token docs), SEP-2350 (scope accumulation during step-up), SEP-2351 (.well-known suffix). Also: token-store keying moves from session-scoped to per-request _meta since the RC removes Mcp-Session-Id. RFC §6.4, §3.3. Deps: 28, 30, 50, 85m (for the per-request keying). Ready now — OAuth flow is Harbor-side; SDK exposes the wire transport and WWW-Authenticate already; the per-request keying mechanic ships with 85m but the plan can be authored against the new shape now. See docs/plans/phase-85b-mcp-http-oauth.md.
  • 85c — MCP sampling provider (CUT). RC deprecates sampling/createMessage; replacement is "direct LLM provider API integration" — which is what llm.LLMClient already is. Building a CreateMessageHandler, pause-gated review surface, modelPreferences resolution and tool-enabled sampling would ship operator-facing surface for a 12-month-EOL feature. Servers needing an LLM bring their own provider per the RC's guidance. Plan file kept as historical context. No revisit.
  • 85d — MCP elicitation provider. Form vs URL mode and the secret-rejection rule survive; the wire mechanic does not. RC replaces the SSE-based wait with InputRequiredResult (inputRequests, requestState) + client retries the original call with inputResponses. The plan as written targets SSE — must be rewritten before implementation. The pause/resume primitive integration is still conceptually right. RFC §6.4, §3.3. Deps: 28, 50, 85m. Revisit after SDK-RC (≈ late Jul–Aug 2026). See docs/plans/phase-85d-mcp-elicitation-provider.md.
  • 85e — MCP roots provider (CUT). RC deprecates roots; replacement is "tool parameters, resource URIs, or server configuration." 85a's honest-empty advertisement is now the permanent posture. Plan file kept as historical context. No revisit.
  • 85f — MCP remaining server features (slim). Ship completions (completion/complete), resource templates (resources/templates/list), and progress (_meta.progressToken + notifications/progress). Drop the logging slice — RC deprecates logging/setLevel + notifications/message; replacement is stderr / OpenTelemetry, both of which Harbor already has. RFC §6.4. Deps: 28, 85a. Ready now — all three retained features are in go-sdk v1.6.0. See docs/plans/phase-85f-mcp-remaining-server-features.md.
  • 85g — MCP Apps host (DEPRECATED → superseded by 109a–c, D-172). The original premise — "revisit after RC-final because Apps is experimental and the RC may reshape _meta.ui.resourceUri" — was overturned: MCP Apps is a stable, independently-versioned extension (io.modelcontextprotocol/ui, the ext-apps repo), NOT gated on the July RC, and it ships an official framework-agnostic host bridge (@modelcontextprotocol/ext-apps AppBridge) that removes the hand-rolled-bridge risk this plan carried. A code audit also found 85g's "purely Console-side" non-goal factually wrong (the MCP driver doesn't parse _meta.ui.resourceUri, tool.completed carries no content, and ReadResource isn't exposed on the Protocol — so there is real runtime + Protocol work). Pulled forward into V1.1.x as the three-phase 109a–c "MCP Apps host" wave, scheduled immediately after Phase 108. Plan file kept as historical context, marked deprecated. RFC §6.4, §7. See D-172, D-173, and docs/plans/phase-109a-mcp-apps-runtime-protocol.md / phase-109b-console-mcp-apps-host.md / phase-109c-mcp-apps-displaymode-layout.md.
  • 109a — MCP Apps runtime + Protocol surface. The runtime/Protocol enablement layer the deprecated 85g plan wrongly assumed already existed. Parse _meta.ui.resourceUri on MCP tool results; recognise ui://-scheme resources; project the app reference (resourceUri + negotiated DisplayMode + RawHTMLTrusted) onto the tool-result Protocol surface; add mcp.servers.read_resource (identity-scoped, D-026 heavy-content aware) to fetch the ui:// HTML; negotiate DisplayModes from the server's io.modelcontextprotocol/ui capability (replacing the static registry.go placeholder); add an app-initiated-tool-call proxy that routes through the existing approval/OAuth/identity tool-safety path. §13 same-wave consumer: 109b. RFC §6.4, §6.5, §7. Deps: 28, 85a, 84a. See docs/plans/phase-109a-mcp-apps-runtime-protocol.md.
  • 109b — Console MCP Apps host. Sandboxed-iframe renderer in the shared chat module (web/console/src/lib/chat/renderers/mcp-app.svelte, D-091); strict CSP; postMessage origin validation; the official AppBridge wired in manual-handler mode (D-173) — every app→host call Protocol-proxied through 109a, never a direct MCP connection; honours RawHTMLTrusted → sandbox strictness; the inline DisplayMode via the renderer registry. Adds @modelcontextprotocol/ext-apps + @modelcontextprotocol/sdk to web/console (RFC §10 dependency-addition prerequisite). RFC §6.4, §7. Deps: 109a, 73n, 108. See docs/plans/phase-109b-console-mcp-apps-host.md.
  • 109c — MCP Apps DisplayMode layout. The Playground page-level layout state machine for fullscreen (app replaces chat + composer; multi-tab) and pip (50/50 resizable split, right rail hidden by default + toggle); inline already shipped in 109b; onrequestdisplaymode drives runtime transitions. Distinct from PG-6 two-agent comparison (post-V1, D-064). RFC §7. Deps: 109b. See docs/plans/phase-109c-mcp-apps-displaymode-layout.md.
  • 109d — Inline MCP-app discovery (D-215). Closes the dead seam the 109 wave's §17.5 audit pinned: the chain "a planner-initiated MCP tool result carrying _meta.ui.resourceUri → a chat message that mounts the 109b renderer → 109c's layout activates" was never wired, so the renderer + entire layout were unreachable in production. Three breaks closed: (1) the runtime emits a new canonical SafePayload event mcp.app_available at the MCP provider's invoke site whenever a tool result declares a ui:// app (carrying the server source id + resource URI + display-mode hint + run/identity correlation), registered alongside mcp.resource_offloaded; (2) the single-sourced wire MCPAppRef gains a server_id field (also populated on the app-tool-call proxy response) so the renderer resolves which server to read the ui:// document from; (3) the Console ChatMessage gains an app/serverID field, MessageBubble dispatches it under MCP_APP_INLINE_MIME to mount the real renderer, and the Playground page attaches the decoded mcp.app_available SSE event to the run's agent bubble. The §13 same-wave consumer is the discovery path itself; an inline app's onrequestdisplaymode (granted by the page's full available-mode set) opens the app through 109c's already-shipped layout reducer. The wave-end W3 weak synthetic-DOM Playwright test (which re-implemented the clamp) is replaced by a real-component Vitest guard that mounts the shipped MessageBubble / McpAppRenderer / AppPanel and fails if the discovery→render wiring is reverted. RFC §6.4, §6.5, §7. Deps: 109a, 109b, 109c. See docs/plans/phase-109d-inline-mcp-app-discovery.md.
  • 109e — MCP App discovery reads the tool-DEFINITION _meta.ui (D-216). A spec-conformance fix a live test against a real io.modelcontextprotocol/ui ext-apps server (go-study-mcp) surfaced: the 109 wave parsed the _meta.ui.resourceUri app reference from the tool RESULT (CallToolResult._meta), but the canonical SEP-1865 dialect (vendored McpUiToolMetaSchema: "UI-related metadata for tools") places it on the tool DEFINITION. A real ext-apps server binds the ui:// UI resource per tool in tools/list and returns an empty result _meta, so the result-parse found nothing and mcp.app_available never fired — the renderer (109b) + layout (109c) were unreachable against real servers, and every 109a–d test passed only because its hand fixture put _meta.ui on the RESULT (matching the buggy code, not the spec). The fix: buildToolDescriptor captures the tool-definition _meta.ui at discovery (immutable closure capture, D-025); callTool reconciles that binding with any optional per-result display-mode hint and fires mcp.app_available from the result, feeding BOTH the discovery event AND the app-tool-call proxy projection (mcpconsole/apps.go, which had the identical result-only bug — fixed in the same change per §17.6). DisplayMode defaults to inline when none is negotiated/declared (go-study-mcp advertises no UI capability; the Console renderer already mounts on a bare {resourceUri, serverID}). The fixtures are corrected to the canonical placement (tool-def _meta.ui, empty result _meta) and a HARBOR_LIVE_MCP-gated probe drives the real go-study-mcp binary over stdio (CI-skipped, verified green in dev). This PR also adds CLAUDE.md/AGENTS.md §17.8: external-protocol conformance fixtures must derive from the real spec, never a hand-built one. RFC §6.4, §6.5, §7. Deps: 109a, 109d. See docs/plans/phase-109e-mcp-app-tool-def-discovery.md.
  • 109f — Render heavy MCP App documents + operator "pop to side-by-side" affordance (D-217). Closes two gaps a live test against the real go-study-mcp ext-apps server surfaced. Gap A: go-study-mcp's ui://go-study-mcp/studio/index.html is 86.4 KB; the default heavy-content threshold is 32 KiB, so 109a's mcp.servers.read_resource correctly offloads the document to the ArtifactStore by reference (D-026) and returns an artifactRef instead of inline content. The 109b renderer treated that as a FATAL "server bug" and refused to render — which hits nearly every real App, since Svelte/React bundles routinely exceed 32 KiB. The renderer now resolves the by-reference stub to a presigned URL via a new injected MCPAppHostClient.resolveArtifact seam, fetches the bytes at the iframe edge, and loads them into the SAME sandboxed srcdoc (same CSP, sandbox tokens, wrapAppDocument, origin guard) the inline path uses — only the content source changes; the offload stays correct (heavy bytes never inline through the context plane). The real resolveArtifact impl lives in the Console adapter makeMCPAppHostClient (over artifacts.get_ref), OUTSIDE the chat module, so the renderer keeps zero $lib/ imports (D-091). A §17.6 bug-twin is fixed in the same PR: the playground ChatProtocolClient.resolveArtifact read the absent resp.url (the wire field is presigned_url), silently breaking every chat-bubble artifact preview. Gap B: a host-side operator "expand ⤢" affordance on the inline app frame pops the app to the 109c side-by-side (pip) / fullscreen layout WITHOUT the app asking, dispatched through the EXISTING injected onDisplayModeRequest seam → the 109c layout reducer (no parallel display-mode path; no chat-module reach into the page). Always-on Vitest guards: a heavy-document fetch test (realistic >32 KiB App fixture, §17.8) that fails if the artifactRef branch reverts to the error path, plus an inline-path regression and a Gap-B affordance→reducer test. Console-only — no Runtime endpoint or Protocol method. RFC §6.4, §6.5, §7. Deps: 109a, 109b, 109c, 109d. See docs/plans/phase-109f-heavy-app-doc-render.md.
  • 109g — MCP App documents render inline on every artifact driver (D-218). A spec-correctness fix a live test against the real go-study-mcp ext-apps server surfaced: the 109 MCP Apps host gated a ui:// App document on the D-026 LLM-context heavy-output threshold (32 KiB) in internal/mcpconsole/apps.go::ReadResource. go-study-mcp's studio App HTML is ~86 KB, so mcp.servers.read_resource offloaded it to the ArtifactStore by reference and returned an artifactRef — which the Console can only fetch via a presigned URL, and the read-side resolver fails loud (CodePresignUnsupported) on every non-S3 driver. So the App never rendered on the inmem / fs / sqlite / postgres stores. Root cause: the heavy-output threshold exists to keep bulky bytes OUT of the LLM context window, but a ui:// App document NEVER enters the LLM context — the tool result carries only the tiny _meta.ui.resourceUri reference; the HTML is fetched ONLY by the Console and rendered in a sandboxed iframe. The fix re-scopes the threshold OUT of App documents: ReadResource checks mcp.IsUIResourceURI(resourceURI) and rides a ui:// document inline up to a dedicated appDocumentInlineCap (2 MiB) instead of the 32 KiB heavy threshold, so the common case (all real apps) renders on EVERY driver with no presigning. Above the cap, the existing D-026 offload→artifactRef path (the loud mcp.resource_offloaded bypass) is preserved for pathologically large apps. An ordinary (non-ui://) resource keeps the heavy threshold. The tests use a REAL inmem ArtifactStore on the seam (109f's fetch test stubbed the resolver and so never hit the presign-unsupported driver — the §17.8 failure mode): the below-cap revert-guard reads an 86 KiB ui:// doc and asserts it rides inline with no offload event (it fails if reverted to the 32 KiB gate — verified); the above-cap test asserts a >2 MiB doc still offloads + fires the event; a HARBOR_LIVE_MCP-gated probe drives the real go-study-mcp studio doc through ReadResource and asserts it returns inline. No Protocol wire-shape change — ReadMCPResourceResponse.Content already carries inline bytes. RFC §6.5, §7. Deps: 109a. See docs/plans/phase-109g-app-doc-inline-read.md.
  • 109h — MCP Apps UI-host capability advertisement (D-224). Closes brief 14's "Extension negotiation — Absent: ClientCapabilities.Extensions never populated" gap on the MCP southbound driver. The 109 wave shipped the READ side — negotiateDisplayModes reads a server's io.modelcontextprotocol/ui capability — but the driver never advertised its OWN: ClientCapabilities.Extensions shipped empty, so a spec-conformant ext-apps server could not learn the Harbor host renders apps and could not tailor the app references it returns. 109h adds the symmetric WRITE side: the driver advertises the io.modelcontextprotocol/ui extension carrying the host's renderable displayModes during the initialize handshake (hostCapabilities / filterHostDisplayModes in mcp.go, reusing the existing uiExtensionKey + closed validDisplayModes set), sourced from a new deployment-level tools.mcp_app_host.display_modes config field (MCPAppHostConfig + ToolsConfig.MCPAppHostDisplayModes(), defaulting to the inline baseline [inline]) threaded once through AttachDeps.HostDisplayModes — which is also the programmatic SDK seam an embedder sets without YAML. The roots-regression trap (brief 14 §2 row 4 / §3): the go-sdk advertises {"roots":{"listChanged":true}} by default when ClientOptions.Capabilities is nil; setting Capabilities to add the UI extension OVERRIDES that default AND the SDK ignores the deprecated Roots field in favour of RootsV2, so the code MUST replicate the current roots advertisement (RootsV2.ListChanged=true) or opting into the extension silently drops roots. This phase PRESERVES current roots behaviour exactly — it does NOT fix the roots honesty defect (brief 14 §3 / the separate 85a stopgap). Sampling/elicitation stay inferred from their handlers, unaffected. The integration test (§17.8) builds two providers from one resolved config value, pairs them to real SDK in-memory transports, and asserts each server's captured InitializeParams.Capabilities echoes the configured modes AND still advertises roots — the fixture derives from the SDK's real InitializeParams shape, not a hand blob; an opt-out provider advertises roots with no UI extension (the failure mode). No new inbound Protocol method or REST endpoint — the capability is an OUTBOUND client→server advertisement on the MCP handshake; the smoke is static-only. RFC §6.4, §7. Deps: 109a. See docs/plans/phase-109h-mcp-apps-host-capability.md.
  • 109i — MCP Apps tool-context capture + mcp.apps.tool_context (D-225). The BACKEND half of the MCP Apps "Data Delivery" lifecycle. The 109 wave lets the Console discover (mcp.app_available), fetch (mcp.servers.read_resource), and render a ui:// MCP App in a sandboxed iframe — but a rendered app had no way to read the tool context (the input + the lowered result) that produced it. This phase captures that context at the tool-invocation site (internal/tools/drivers/mcp/mcp.go::callTool, the same site that emits mcp.app_available) whenever a result declares a ui:// app, and exposes a new identity-scoped Protocol read method, mcp.apps.tool_context. Capture rides the EXISTING StateStore — all three persistence drivers + identity isolation come free, NO new driver and NO new migration — keyed by the caller's identity triple (with empty RunID; session-scoped) under kind = "mcp.apps.tool_context/<serverID>/<toolCallID>"; the input and result are heavy-content-aware at WRITE (a payload ≥ the heavy threshold offloads to the ArtifactStore by reference through the SAME loud-bypass path the resource read uses, refactored into a shared offloadHeavy helper). The tool_call_id is a deterministic content hash of run | server | tool | args (NO mutable Provider field — D-025); it is stamped on the mcp.app_available event (alongside tool_name; the payload stays SafeSealed — ids/names are not content), on the wire MCPAppRef, and on the app-tool-call proxy projection, so a client correlates a discovered app to its captured context. The new method routes through the AppsSurface dispatcher (IsMCPAppsMethod); an unknown or cross-identity (server_id, tool_call_id) fails with CodeNotFound (existence never revealed across identities — proven by a ≥2-identity isolation test). A capture failure is logged loudly but never fails the tool call (the planner's result is the source of truth); a missing identity fails closed. The capturer is wired into every MCP Provider in internal/runtime/assemble (mirrored in harbortest/devstack + cmd/harbor), and the read seam onto the AppsAccessor. New wire types (ToolContextRequest / ToolContextPayload / ToolContextResponse) + the new method are single-sourced, hand-mirrored into web/console/src/lib/protocol/mcp.ts, and the generated Protocol docs + wire manifest regenerated. The §13 same-wave consumer is the read path itself (capture → read, exercised end-to-end in Go); the Console UI consumption lands in 109j. Concurrent-reuse tests (N=128) on the shared Provider + AppsAccessor + ToolContextStore pass under -race; a HARBOR_LIVE_MCP-gated probe drives a real ext-apps server through capture → read. RFC §6.4, §6.5, §7. Deps: 109a, 109d, 109g. See docs/plans/phase-109i-mcp-apps-tool-context.md.
  • 109j — Console pushes tool-input/tool-result into the app (D-226, Reverted in #346 — re-land tracked in #347). The Data Delivery Console half (Stage 2 of the spec-compliance wave), consuming the 109i mcp.apps.tool_context surface now on main. After the sandboxed app sends ui/notifications/initialized, the host fetches the originating tool's context through the injected MCPAppHostClient and pushes it via the official AppBridge sendToolInput() then sendToolResult() (the SDK requires initialized before sendToolResult). Heavy-aware (resolves an artifact_ref to bytes at the iframe edge like 109f, else a faithful by-reference stub — never silently empty); a missing/evicted context (CodeNotFound) mounts with no push and no thrown error. The tool_call_id flows event → ChatMessage app ref → renderer. No new sandbox/CSP/origin change; the no-direct-transport invariant (D-173) holds — the push uses only the injected client. Status: the Console data-delivery push was reverted to v1.4 in #346 because it broke the ui/initialize handshake (handshake regression). The BACKEND tool-context surface (109i) is unaffected and remains Shipped. Re-landing the Console push is tracked in #347. RFC §6.4, §7. Deps: 109i, 109b. See docs/plans/phase-109j-mcp-apps-data-delivery-push.md.
  • 109k — MCP Apps spec-conformance hardening (D-227, Shipped V1.1.x). Closes the wave-end adversarial spec-review's findings — two conformance-breaking FAILs (green vs Harbor's own fixtures, broken vs a real ext-apps server — the D-216 class) plus host-obligation gaps. FAIL-1: the UI capability is advertised as the spec mimeTypes: ["text/html;profile=mcp-app"] (the field a conformant server gates on via getUiCapability(caps).mimeTypes), NOT the hand-rolled displayModes 109h shipped (not a McpUiClientCapabilities field). FAIL-2: an app→host tools/call (bare server tool name) resolves against the calling app's <serverID>_ namespace, so it hits the right catalog tool AND an app is confined to its own server's tools. Also: the non-spec displayModes read off ServerCapabilities is removed; display modes move to the spec slot (ui/initialize host-context availableDisplayModes, sourced from the 109h display_modes config via runtime.info); and the host honours ui/notifications/size-changed (iframe height), graceful ui/resource-teardown on unmount, live Console theme + host-context-changed, host-context toolInfo/containerDimensions, and resources/templates/list. Sanctioned deviations (D-173 bridge-proxy, D-224 deployment-declaration intent, D-225, D-218) are preserved. The FAIL revert-guards are HARBOR_LIVE_MCP probes against a real ext-apps server that gates on mimeTypes + exposes a callback tool. Before merge, the orchestrator live-tests the full MCP Apps surface against the test agent + Console (regression guard — it worked pre-109). RFC §6.4, §7. Deps: 109a, 109b, 109h, 109i, 109j. See docs/plans/phase-109k-mcp-apps-conformance-hardening.md.
  • 85h — MCP Tasks wire types (CUT). RC redesigns Tasks (moved to extension; tasks/list removed; new method set; new lifecycle around tools/call returning a task handle). Hand-transcribing the 2025-11-25 shape now locks in code that the extension SEP and Dockyard's port will both diverge from. Plan file kept as historical context. Revisit when Tasks extension SEP stabilizes + Dockyard ports + SDK adds support — refile as a new band, not 85h. No revisit on this slot.
  • 85i — MCP Tasks client (CUT). Same reasoning as 85h. Polling loop, tasks/list consumption, input_required → elicitation composition all targeted the old shape. No revisit on this slot.
  • 85j — MCP client conformance (target: RC). Conformance harness + scoped, substantiated compliance statement at docs/design/mcp-compliance.md. Statement target bumps from MCP 2025-11-25 to MCP 2026-07-28 (RC) and ultimately the final spec. Wording obligation: never "fully compliant" unqualified; the scoped sentence enumerates exactly what's wired. Drops the cut areas (sampling, roots, logging, original Tasks) from the claim; adds the 85m transport / auth / schema / cache / trace items. RFC §6.4. Deps: 85a, 85b, 85d, 85f, 85g, 85m. Revisit after RC-final (2026-07-28) and after the dependent phases land. See docs/plans/phase-85j-mcp-client-conformance.md.
  • 85m — MCP 2026-07-28 RC adoption (NEW). Absorbs the RC's cross-cutting breaking changes the other phases can't carry on their own:
    • Remove initialize / initialized handshake plumbing and Mcp-Session-Id header dependence from internal/tools/drivers/mcp and all transports; client info moves to per-request _meta.
    • Streamable HTTP: set Mcp-Method and Mcp-Name on every outbound request; assert the server's reject-on-mismatch behaviour in tests.
    • Error code flip: every -32002 (resource-not-found) callsite → -32602 (Invalid Params).
    • Server-to-client request restructuring: server-initiated requests only issuable while server is actively processing a client request; SSE elicitation polling removed (composes with 85d's rewrite).
    • JSON Schema 2020-12 (SEP-2106): full draft support in tool / resource-template schema validation (composition, conditionals, $ref).
    • Cache directives (SEP-2549): respect ttlMs and cacheScope on list / resource reads.
    • W3C Trace Context propagation (SEP-414): wire Harbor's existing OTel traceparent / tracestate / baggage into MCP _meta.
    • Capability discovery via server/discover (replaces handshake-time advertisement). RFC §6.4. Deps: 28, 85a. Revisit after SDK-RC (≈ late Jul–Aug 2026) — every item above needs go-sdk RC support; Harbor's plan can be authored now (transcribe the RC SEPs into a phase plan) so implementation can start the day the SDK lands. New plan file: docs/plans/phase-85m-mcp-rc-2026-07-28.md (to author).
  • 86 — Durable distributed bus driver. NATS / Redis Streams / Postgres-as-queue behind MessageBus. RFC §12. Deps: 22.
  • 87 — Durable TaskService backend. Background tasks survive restart. RFC §12. Deps: 20, 22.
  • 88 — Episodic memory tier. Durable summaries promoted from session → user/tenant scope. RFC §11 Q-4. Deps: 24, 25.
  • 89 — A2A northbound. Expose Harbor as an A2A server. RFC §11 Q-2. Deps: 29.
  • 90 — Additional planner concretes. PlanExecute, Workflow, Graph, Supervisor, MultiAgent, HumanApproval. RFC §12. Deps: 49.
  • 91 — Console-driven key rotation (Protocol). governance.rotate_key Protocol method; Account impl atomically swaps the live key set; bifrost picks up the new key on the next Account.GetKeysForProvider lookup (no ReloadConfig race). RFC §6.15, D-019. Deps: 36a, 60 (Protocol transport), 73 (Console-attaching).
  • 92 — Console-driven mid-session model swap. governance.swap_model Protocol method; future runs in a session use the swapped model; the planner sees the change via RunContext. Audited. RFC §6.15. Deps: 36a, 60, 73.
  • 92a — Agent-config control plane (extends 91/92). Generalises the 91/92 "mutate desired-state via Protocol, reconcile into the runtime" pattern from governance config to AGENT-DEFINITION config: live, audited control of (a) MCP server-connection enablement — pause / resume / remove — plus per-individual-tool policy (the active / deferred / disabled loading_mode vocabulary from 107c + 26b, made mutable per <source>_<tool>); (b) the skill set, over the existing SkillStore (Phase 37); and (c) a layered system prompt — an operator-owned base layer plus an optional session-scoped instruction layer composed above it, respecting the 83a–f structured prompt sections. The unifying primitive is a durable, identity-scoped, VERSIONED desired-state registry on the StateStore: each edit is an immutable revision (content-addressed, parent pointer), the active config is a revision pointer, rollback = repoint, and a server-side diff between revisions is exposed as a read method plus an agent.config.revised event. Next-turn-only, snapshot-immutable semantics (the D-025 alignment): a change affects ONLY the next run — in-flight / concurrent runs keep the immutable view they snapshotted at run-start; the runtime projects the per-run tool/config view from the registry by extending tools.NewPlannerView (Phase 110a) to read desired-state instead of boot config, so there is no mid-flight mutation, no draining, and no forcible teardown (a paused connection's transport may stay warm — pause is a projection-time decision). Two decisions to settle when the band is expanded under §16: (1) app→host tools/call callbacks from a rendered MCP App (109i, D-173) are gated against CURRENT desired-state — a paused server rejects them and the host surfaces a "paused by an administrator" advisory — while in-flight PLANNER calls keep their snapshot (an intentional asymmetry); (2) the authorization-scope matrix — base-prompt edits, connection add/remove, and the per-tool allowlist are tenant/deployment-level capability changes requiring an elevated (fleet / tenant-admin) scope plus audit (adding a stdio server is approval-gated / allowlist-only), while session-scoped callers get only the safe subset (the user instruction layer, enable/disable among already-allowed sources, ephemeral skills). Adding a brand-new connection (async dial + initialize + OAuth via the unified pause/resume primitive) is the separable hard sub-phase. Decomposes under §16 into: registry + Protocol surface + diff/rollback → skills control → layered prompt → MCP connection pause/resume + per-tool policy → (separable) add-connection. RFC §6.15, §6.16. Deps: 86, 87, 91, 92, 53a, 37, 28, 26b, 110a, 109i.
  • 93 — Failover chains as Harbor policy. Operator-defined chain [primary, secondary, ...] per identity / model; orchestrated at the Governance layer with audit per hop; NOT pushed into bifrost's per-call Fallbacks. RFC §6.15, D-018. Deps: 36a, 33.
  • 94 — Provider circuit breakers per (provider, key). Aggregate error rate; trip on threshold; auto-recover on cool-down; events emitted. Builds on 93. RFC §6.15. Deps: 33, 93.
  • 95 — LLM cache (exact-match + semantic). Plugin pre-hook checks the cache; semantic uses an embedding similarity threshold. Big complexity; deferred. RFC §6.15. Deps: 33.
  • 96 — PII redaction at the LLM boundary. Audit subsystem owns the redactor; Governance hooks it into the LLM call path. Outgoing prompts are scrubbed; raw forms are never persisted. RFC §6.15, D-020. Deps: 03 (audit redactor), 33.
  • 97 — Media-input tool wrappers. Bifrost-backed tools that accept ArtifactRefs and pass image/audio/file content to LLM-side analysis (e.g. a generic image.analyze wrapper that accepts an image artifact + a text prompt and routes through the planner's normal LLM call). Mostly a convention layer — the plumbing already exists once D-021 + Phase 33 ship. RFC §6.5, D-021. Deps: 17 (artifacts), 33 (bifrost), 26 (tool catalog).
  • 98 — Media-output tool wrappers. Image generation, speech synthesis, transcription, and video tools that wrap bifrost's media APIs (SpeechRequest, TranscriptionRequest, ImageGenerationRequest, etc.) and return ArtifactRefs. Each tool is a separate registration; they share a common MediaTool helper. The planner invokes them as ordinary tool calls; no LLMClient change. RFC §6.5, D-021. Deps: 17, 33, 26.
  • 99 — Vision-aware memory summarization. Extends the rolling_summary memory strategy to call a vision model when summarizing turns that include ImageParts, replacing the V1 placeholder ([image: <ref>]) with a generated description. Optional per identity tier; off by default for cost. RFC §6.6, D-021. Deps: 24 (memory strategies), 33 (bifrost), 97 (media-input tools).

Wave / parallelism map

The phase queue is a DAG, not a line. Here are the parallelizable waves; phases inside a wave can be implemented in parallel by separate workers, phases in later waves wait for earlier waves' completion (or for the specific phases their Deps column names).

Wave 1 — Pure foundation (no upstream Harbor deps): 01 (identity), 02 (config), 03 (audit redactor) — three independent, parallelizable.

Wave 2 — Logger + bus skeleton: 04 (slog Logger; needs 03), 05 (Event taxonomy + InMem bus; needs 01, 03), 07 (StateStore iface + InMem; needs 01, 03). Parallelizable across three workers.

Wave 3 — Bus replay + sessions: 06 (replay; needs 05), 08 (SessionRegistry; needs 01, 07). Parallelizable.

Wave 4 — Core runtime serial chain (mostly): 09 (envelopes; needs 01, 08) → 10 (engine; needs 09) → 11 (reliability; needs 10) → 12 (streaming; needs 10, 11) → 13 (cancel; needs 10, 12) → 14 (routers; needs 10, 11). 11+14 can parallelize once 10 lands; 12, 13 serialize after 11.

Wave 5 — Persistence drivers (parallelizable across drivers): 15 (SQLite state), 16 (PG state), 17 (Artifacts iface + InMem + FS — needs 01, 07). Three parallel.

Wave 6 — Tasks + remaining persistence: 18 (Artifact SQLite/PG; needs 17, 15, 16), 19 (Artifact S3; needs 17), 20 (TaskRegistry; needs 01, 07), 21 (TaskGroup + WatchGroup + retain-turn + patches; needs 20), 22 (Distributed contracts; needs 09, 20). Stage 1 (18, 19, 20) parallelizable; Stage 2 (21, 22) once 20 lands.

Wave 7 — Memory + tools core + LLM core (parallel tracks):

  • Memory track: 23 → 24 → 25
  • Tools track: 26 → 27 / 28 / 29 (HTTP, MCP, A2A in parallel after 26)
  • LLM track: 32 → 33 → 34 → 35 → 36 (largely serial)
  • Governance track (slots in after 33): 33 → 36a → 36b (serial; relies on cost-passthrough from bifrost integration)

Wave 8 — Skills + planner core (after wave 7's foundations):

  • Skills track: 37 → 38 / 39 / 40 / 41 (after 37, the four can run in parallel-ish)
  • Planner track: 42 → 43 / 44 (parallel) → 45 → 46 / 47 (parallel) → 48 → 49

Wave 9 — Pause/Resume + Steering + Telemetry + Protocol (cross-track):

  • 50 (needs 07, 09, 13) → 51 → 52 → 53 → 54
  • 53a (Agent Registry; needs 01, 05, 07, 08) — parallelizable with the 50→54 chain; its deps are all long-shipped. Must land before 54 and the Console-attaching wave (72–75).
  • 55 (OTel; after 04, 05) parallel with 56 (metrics; after 55, 05); 57 (durable event log; after 05, 07, 15, 16)
  • 58 (protocol types) → 59 (versioning) → 60 (transport) → 61 (auth) → 62 (conformance)
  • 30 (Tool OAuth/HITL; needs 26, 50, 53a), 31 (approval gates; needs 30) slot in once 50 + 53a are up

Wave 10 — CLI + test kit: 63 → 64 → 65 / 66 / 67 / 68 / 69 / 70 (mostly parallel after 64). 71 (test kit; needs 05, 09, 07) parallel.

Wave 11 — Console-attaching + hardening: 72 / 73 / 74 (parallel; need 60, 05, 06, 07, 17, 09). 75 (e2e gate; needs 64, 72, 73). 76, 77, 78, 79 (parallel; need their respective subsystems). 80 (docs polish; needs all V1).

Wave 12 — Release: 81 → 82 (serial).

Practical reading: with three or four engineers (or three concurrent worker subagents), waves 5–8 hide enormous parallelism behind their tracks. The serial sections that resist parallelism are: the core runtime chain (09→10→11→12→13), the LLM-client chain (32→33→34→35→36), and the Protocol chain (58→60→61→62).


Open architectural follow-ups feeding next-wave scoping

The Wave 11 §17.5 audit (PR #117) surfaced four architectural gaps tracked as GitHub issues. Three closed in Wave 11.5 (issues #112, #113, #114, #115 via PRs #119, #120, #121, #122; the wave-end E2E now exercises production end-to-end). Issue #116 (tools.oauth_providers[] operator config) shipped in PR #119 alongside Wave 11.5 Stage A. One open follow-up remains:

This section accumulates audit-surfaced follow-ups that warrant tracking issues but haven't been promoted to phase plans yet. When the next wave scopes, this is the first list to reconcile against docs/plans/README.md's pending-phase block.


V1 cut line

V1 ships phases 01–82 + 36a + 36b + 53a. The follow-ups (83–100) are intentionally deferred to post-V1: the original band (83, 84, 86–90 — integer 85 was removed, see below), six Governance (91–96), three Multimodality follow-ups (97–99) for media-input/output tool wrappers and vision-aware memory summarization, and the Recipe loader (100). Two lettered bands sit inside this range: 83a–e (ReAct prompt depth + reasoning-channel decoupling) and 85a–j + 85m (MCP client/host compliance — the prioritised first post-V1 work; 85k is the separate Harbor agent-builder skills phase). The 85-band was re-shaped on 2026-05-28 against the MCP 2026-07-28 RC (sampling / roots / logging deprecated, Tasks redesigned to an extension, session handshake removed); see the 85-band detail block for the per-phase verdict. Multimodal inputs ship in V1 (RFC §6.5 + D-021); only multimodal outputs and richer memory handling are post-V1. The Evaluations subsystem and code-mode (Starlark) are also post-V1 — see RFC §12.

The cut line is justified by RFC §12 (Out of Scope for V1):

  • Auto-sequence + reflection (83, 84) — explicit RFC §12 entries: "optional optimization, off by default" and "optional per concrete; not on V1's critical path." Shipping the planner without them does not weaken the swappable-planner property; both can land as planner-internal upgrades without runtime change.
  • MCP client/host compliance (85-band, 85a–j + 85m) — post-V1 by deferral, not by architecture: the V1 MCP southbound driver (Phase 28) is core-functional; the 85-band raises it to feature-complete. Prioritised as the first post-V1 work. The integer Phase 85 (Skills Portico provider driver) was removed — Portico speaks MCP like any server, so the generic MCP client driver is its consumer; no Portico-specific driver is built. Per the MCP 2026-07-28 RC re-plan (2026-05-28), the band scopes as: HTTP OAuth (now covering six RC auth SEPs), elicitation (RC InputRequiredResult shape), the surviving server features (completions / templates / progress), MCP Apps host, conformance (target: RC), and a new 85m absorbing the RC's cross-cutting transport / session / error / schema / cache / trace changes. Sampling, roots, the original Tasks pair (85h/85i) are cut.
  • Durable distributed bus + durable TaskService backend (86, 87) — RFC §6.12 settles "V1 ships contracts only; in-process default." A durable backend is a driver phase, not a runtime-architecture phase. Phase 87 SHIPPED (D-228): a durable TaskRegistry driver (internal/tasks/drivers/durable) persists task/group/patch records through the shared StateStore (per-record slots, replayed on open) so they survive a restart, with an open-time recovery sweep that fails a crash-left Running task to Failed{runtime_restarted}. Two recorded deviations from the merged plan: the task lifecycle was extracted into a shared internal/tasks/engine package (inprocess + durable are thin wrappers over one state machine) rather than duplicated; and the driver reuses the runtime's shared StateStore (no StateDriver/StateDSN config fields — tasks.Open already passes the store). Opt-in via tasks.driver: durable; fail-loud when no StateStore is wired. Single-instance restart-survival of records only (no auto-re-drive); the queue-backed / distributed driver remains future work behind the unchanged seam. Phase 86 SHIPPED (D-229): a durable MessageBus driver (internal/distributed/drivers/durable) persists every BusEnvelope through the shared StateStore and projects it onto the local events.EventBus, with a background poller that delivers cross-instance envelopes + replays persisted history after a restart (at-least-once; consumers dedupe on (TaskID, Edge, EventID)). StateStore-backed (Postgres-as-queue on a shared Postgres store); NATS / Redis Streams deferred (new deps → RFC §10 PR first). The bus-projection contract (EventTypeDistributedBusEnvelope / BusEnvelopePayload) was promoted from the loopback driver into the distributed package so both drivers share it. Opt-in via distributed.bus_driver: durable; loopback stays the default; fail-loud when no StateStore is wired. The MessageBus seam itself is still contracts-only in production (no OpenBus consumer yet, like loopback), so the driver is registered + conformance/integration-tested, ready for a future bus consumer.
  • Distributed task dispatcher (86a) — the consumer that makes the Phase 86 durable bus load-bearing: it wires OpenBus into the runtime, publishes task-lifecycle envelopes (task.spawned + terminal) to the bus, and runs a fleet RunLoop driver that claims a spawned task (a StateStore compare-and-swap lease, so exactly one worker drives it despite at-least-once fan-out) and drives it — turning the durable bus into the fleet work queue (a task spawned on any worker is driven on any worker and survives a restart). It is the distributed evolution of the single-instance per-task RunLoop driver (cmd/harbor/cmd_dev_runloop.go). Carries the high-level multi-worker deployment topology (N stateless Harbor workers behind a shared Postgres StateStore + durable bus + durable tasks; EKS / multi-container). Deps: 86, 87. Without it, the durable bus is registered-but-unconsumed in production — 86a closes the "primitive without a consumer" gap (§13). RFC §6.8 + §6.12, D-230.
  • Episodic memory tier (88) — RFC §11 Q-4 leans post-V1 unless V1 user feedback demands it.
  • A2A northbound (89) — RFC §11 Q-2 leans V1.1 unless an early adopter demands it.
  • Additional planner concretes (90) — RFC §12 explicitly: "wait on V1 evidence that the interface holds." V1 ships React + Deterministic; the rest land as evidence accrues.

If under calendar pressure, phase 19 (ArtifactStore S3-style) and phase 75 (Playwright CI gate) are the most reasonable V1 → V1.1 slip candidates inside the V1 list, in that order.


Critical path

The longest dependency chain to V1, in order:

00 → 01 → 03 → 04 → 05 → 07 → 08 → 09 → 10 → 11 → 12 → 13 → 50 → 51 → 52 → 53 → 54 → 26 → 32 → 33 → 34 → 35 → 36 → 42 → 43 → 44 → 45 → 49 → 60 → 61 → 62 → 64 → 76 → 80 → 81 → 82.

That is 36 phases on the critical path out of 84 V1 phases. (Governance phases 36a/36b sit on the LLM track but are not themselves on the critical path; they branch off after phase 33 and rejoin via the StateStore conformance suite.) Practical implications:

  • The runtime kernel chain (09→14) is six phases of deeply serial work — half a critical-path month if one engineer.
  • The pause/resume coordinator chain (50→54) is the second cluster of serial work — and depends on the runtime chain landing through 13.
  • The LLM client chain (32→36) must complete before the planner reference (45) lands.
  • The protocol chain (58→62) is independent until 60 needs a wire decision (Q-1) — which can block the Console-attaching wave.

Highest-risk phases on the critical path (in priority order):

  1. Phase 12 (Streaming + per-run backpressure) — the predecessor's deadlock-under-streaming sharp edge; if shipped wrong, parallel runs deadlock.
  2. Phase 33 (bifrost integration)Q-3 is resolved. The phase is now a routine implementation rather than a decision gate. Risk dropped to "ordinary integration risk" — driver translation correctness + cancellation-timing diligence on long streams. See docs/research/08-llm-client-validation.md.
  3. Phase 50 (Pause/Resume Coordinator) — the unified primitive; if it leaks abstractions to planner code, the swappable-planner property regresses.
  4. Phase 60 (Protocol wire transport) — Q-1; locking the wrong transport now means a v1→v2 migration later.
  5. Phase 76 (Cross-tenant isolation harness) — the integrity gate. If it lands late, regressions are not detected.

Risk-mitigation strategy: front-load Q-1 and Q-3 decisions so phases 33 and 60 don't enter implementation with open architecture questions.


Open RFC questions affecting the plan

The RFC's open questions (RFC §11) directly gate or shape these phases:

  • Q-1 (Protocol wire transport). Gates phase 60. Lean is SSE+REST. If the answer becomes WebSocket+JSON-RPC or gRPC, phase 60 forks accordingly; phases 64–75 (CLI + Console-attaching) inherit the new transport but their shapes do not change materially.
  • Q-2 (A2A northbound at V1). Determines whether phase 89 is V1 or post-V1. Default plan keeps it post-V1.
  • Q-3 (LLM client choice). RESOLVED 2026-05-08. Replaced the original CGo-required candidate with github.com/maximhq/bifrost/core (pure Go). Empirically validated against six OpenRouter-routed models — 23/24 gating items pass. Phase 33 is now a routine integration; phases 34–36 carry only ordinary implementation risk. See docs/research/08-llm-client-validation.md.
  • Q-4 (Episodic memory tier). Determines whether phase 88 is V1 or post-V1. Default plan keeps it post-V1.
  • Q-5 (Skill versioning model). Shapes phase 41 (generator persistence) — content-hash-as-version is the V1 default; explicit semver is V1.5.
  • Q-6 (Second V1 planner concrete). Settled in RFC as deterministic. Phase 48 is locked.

Action: Q-1 and Q-3 should be resolved before the corresponding phases enter the implementation queue. Q-2, Q-4 can be resolved at V1 cut.


Notes

  • Phase numbers are stable once shipped. A phase number is reused only via a phase-NN-supersedes-MM.md PR per AGENTS.md §15.
  • Phase plans are immutable post-ship, except for typo/clarification fixes. Material change = new RFC PR + new phase plan that supersedes.
  • If the RFC switches to subsystem-prefixed numbering (e.g. R-01, P-01), all phase plans rename in a single PR and this README reorganizes; phase numbering is therefore deliberately stable but not load-bearing for code or filenames in internal/.
  • Cross-references: RFC Appendix A (subsystem ↔ brief table) is the canonical map for "which brief informs which RFC section." Use it when reaching for context on any phase.
  • Coverage targets in the index column are starting points; per-phase plans may raise them. They never lower.
  • Smoke scripts: every phase has scripts/smoke/phase-NN.sh. The skeleton lands when the phase begins; assertions land as the surface implements.
  • Phase 0 already passes. Per phase-00-skeleton.md: 24 OK / 0 SKIP / 0 FAIL on the doc & mirror invariants. Subsequent phases inherit that gate.

Appendix: runtime tool-dispatch trio mapping (post brief 07)

Brief 07 codified Harbor's "code-level tool calling" principle (RFC §6.4) and surfaced four discrete runtime components: ActionParser, Dispatcher (single + parallel folded), RepairLoop, ObservationRenderer. The current phase set covers them across existing phases — no renumbering required, but reviewers should anchor on this mapping when authoring per-phase plans:

Trio componentOwner phase(s)Notes
ActionParser (internal/runtime/planner/parser/)44 (Schema repair pipeline) + 45 (Reference ReAct planner)The parser belongs with the repair loop; the ReAct phase wires it into the planner step.
Dispatcher — single tool path26 (Tool catalog core + InProcess)Validation, identity stamping, cancellation hooks.
Dispatcher — parallel branches47 (Parallel-call execution + JoinSpec)Same validation/identity/cancel plumbing as 26; the two phases ship the same dispatcher, not two dispatchers.
RepairLoop44 (Schema repair pipeline)Drives parser → validator → planner-prompt-on-failure cycles up to RepairAttempts.
ObservationRenderer (internal/runtime/planner/observation/)45 (Reference ReAct planner) + 46 (Trajectory compression / summariser)Renderer interleaves assistant/user messages from (action, observation | error | failure) pairs; compression in 46 plugs into the same renderer.
SchemaSanitizer (internal/llm/correction/)34 (Provider correction layer)Lives between runtime and LLM client; per-provider response_format adjustments.

If a future PR renames the package layout from internal/runtime/planner/... to a flatter internal/dispatch/ etc., the mapping table above moves with it and the phases retain their numbers. The trio is a design unit; splitting a single phase into "parser" + "dispatcher" + "renderer" sub-phases is allowed but not required.

Apache-2.0 licensed — see LICENSE.