Harbor — Master Phase Plan
How to read this file
This is the canonical execution index for Harbor's V1 build. Every individual phase plan (docs/plans/phase-NN-<slug>.md) lives under it and inherits its done-definition, dependency declarations, and coverage discipline.
- Source of truth:
/RFC-001-Harbor.md(referenced as RFC §X.X). Every phase below traces to one or more RFC sections; if a phase plan and the RFC drift, the RFC wins (AGENTS.md§2). - Research substrate: the eleven briefs in
docs/research/01..11.md(canonical index:docs/research/INDEX.md). Decisions on shape, sharp edges, and Go-flavored types come from there. - Numbering:
phase-NN-<slug>.md, two-digit zero-padded; lettered suffixes (26a,33a,36a,36b,53a,64a,83a–83e,85a–85j,85k,85m) insert work into an existing band without renumbering. Phases 01–82 + 26a + 33a + 36a + 36b + 53a + 64a are V1; 83–100 + 83a–e + 85a–j + 85k + 85m are post-V1 follow-ups listed for completeness so we don't lose track. The integer phase 85 (Skills Portico provider driver) was removed — Portico is an MCP gateway and speaks MCP like any server, so the generic MCP client driver is its consumer; the 85-band is now MCP client/host compliance (85a–j + 85m; 85k is Harbor agent-builder skills). Per the MCP 2026-07-28 RC re-plan (2026-05-28), phases 85c / 85e / 85h / 85i are Cut and 85m is new; see the 85-band detail block. See brief 14. - Done-definition (binding, from
AGENTS.md§4.2): (a) all acceptance criteria pass; (b) coverage targets met; (c)scripts/smoke/phase-NN.shshowsOK ≥ count(criteria)andFAIL = 0; (d) prior phases' smoke scripts still pass. - Coverage defaults (override per phase): 80% for new packages; 85% for persistence drivers and conformance-tested subsystems; 70% for CLI/tooling.
- Predecessor name: does not appear in this repository, ever. (
AGENTS.md§13.)
Phase index
| # | Name | Subsystem | RFC § | Deps | Cov. | Status |
|---|---|---|---|---|---|---|
| 00 | Skeleton | repo / hygiene | n/a | — | n/a | Shipped |
| 01 | Identity & isolation triple | identity | §4 | 00 | 90% | Shipped |
| 02 | Configuration loader | config | §10 | 00 | 85% | Shipped |
| 03 | Audit redactor | audit | §6.4, §6.15 | 00 | 90% | Shipped |
| 04 | slog Logger + standard attribute set | telemetry | §6.14 | 03 | 85% | Shipped |
| 05 | Event taxonomy + InMem EventBus + isolation | events | §6.13 | 01, 03 | 85% | Shipped |
| 06 | Bus replay + ring buffer + cursor | events | §6.13 | 05 | 85% | Shipped |
| 07 | StateStore iface + InMem + conformance suite | state | §6.11, §9 | 01, 03 | 85% | Shipped |
| 08 | SessionRegistry + lifecycle + GC | sessions | §6.9 | 01, 07 | 85% | Shipped |
| 09 | Envelopes, Headers, Identity quadruple | runtime/messages | §6.1 | 01, 08 | 85% | Shipped |
| 10 | Engine + workers + cycle detection | runtime/engine | §6.1 | 09 | 85% | Shipped |
| 11 | Reliability shell (timeout/retry/validate) | runtime/engine | §6.1 | 10 | 85% | Shipped |
| 12 | Streaming + per-run capacity backpressure | runtime/streaming | §6.1 | 10, 11 | 85% | Shipped |
| 13 | Cancellation + per-run fetch dispatcher | runtime/engine | §6.1 | 10, 12 | 85% | Shipped |
| 14 | Routers + concurrency utils + subflows | runtime/routers | §6.1 | 10, 11 | 85% | Shipped |
| 15 | SQLite StateStore driver | state/sqlite | §6.11, §9 | 07 | 90% | Shipped |
| 16 | Postgres StateStore driver | state/postgres | §6.11, §9 | 07 | 90% | Shipped |
| 17 | ArtifactStore iface + InMem + FS drivers | artifacts | §6.10, §9 | 01, 07 | 85% | Shipped |
| 18 | ArtifactStore SQLite-blob + Postgres-blob | artifacts | §6.10, §9 | 17, 15, 16 | 85% | Shipped |
| 19 | ArtifactStore S3-style driver | artifacts | §6.10 | 17 | 80% | Shipped |
| 20 | TaskRegistry iface + InProcess + lifecycle | tasks | §6.8 | 01, 07 | 85% | Shipped |
| 21 | TaskGroup + retain-turn + patches | tasks | §6.8 | 20 | 85% | Shipped |
| 22 | MessageBus + RemoteTransport contracts | distributed | §6.12 | 09, 20 | 85% | Shipped |
| 23 | MemoryStore iface + InMem + conformance | memory | §6.6 | 01, 07 | 85% | Shipped |
| 24 | Memory strategies (truncation, summary) | memory | §6.6 | 23 | 85% | Shipped |
| 25 | SQLite + Postgres memory drivers | memory | §6.6, §9 | 23, 15, 16 | 90% | Shipped |
| 25a | Durable memory strategies (truncation + rolling_summary on SQL drivers; Summarizer through memory.Open) | memory | §6.6, §9 | 23, 24, 25, 15, 16 | n/a | Shipped (V1.1.x) |
| 26 | Tool catalog core + InProcess registration | tools | §6.4 | 01, 05, 09 | 85% | Shipped |
| 26a | Flow-as-Tool registration + per-flow Budget | runtime/flow + tools | §6.1, §6.4 | 14, 26 | 85% | Shipped |
| 26b | Per-MCP-server + per-tool tool-policy config (policy: / tool_policies:) | tools + config | §6.4 | 26, 28 | n/a | Shipped (V1.1.x) |
| 27 | HTTP tool driver | tools/http | §6.4 | 26 | 85% | Shipped |
| 28 | MCP southbound driver | tools/mcp | §6.4 | 26 | 80% | Shipped |
| 29 | A2A southbound driver (full spec) | tools/a2a | §6.4 | 26, 22 | 80% | Shipped |
| 30 | Tool-side OAuth + HITL via pause/resume | tools/auth | §6.4, §3.3 | 26, 50, 53a | 85% | Shipped |
| 31 | Tool-side approval gates | tools/approval | §6.4, §3.3 | 30 | 80% | Shipped |
| 32 | LLM client core + StreamSink contract | llm | §6.5 | 09 | 85% | Shipped |
| 33 | bifrost integration | llm | §6.5, §11Q3 | 32 | 80% | Shipped |
| 33a | Custom OpenAI-compatible providers + timeouts | llm | §6.5 | 33 | 80% | Shipped |
| 34 | Provider correction layer (one mode, baked) | llm | §6.5 | 33 | 85% | Shipped |
| 35 | Structured output strategies + downgrade | llm | §6.5 | 33, 34 | 85% | Shipped |
| 36 | Retry with feedback | llm | §6.5 | 35 | 85% | Shipped |
| 36a | Cost accumulator + per-identity ceilings | governance | §6.15 | 11, 15, 33 | 85% | Shipped |
| 36b | Per-identity rate limits + per-call MaxTokens | governance | §6.15 | 36a | 85% | Shipped |
| 37 | Skill store + LocalDB driver + FTS5 ladder | skills | §6.7 | 01, 07, 15 | 85% | Shipped |
| 38 | Skill planner tools (search/get/list) | skills/tools | §6.7 | 26, 37 | 85% | Shipped |
| 39 | Virtual directory subsystem | skills | §6.7 | 37 | 80% | Shipped |
| 40 | Skills.md importer (gap-closer) | skills/importer | §6.7 | 37 | 90% | Shipped |
| 41 | In-runtime skill generator with persistence | skills/generator | §6.7 | 37, 38, 03 | 90% | Shipped |
| 42 | Planner iface + Decision sum + RunContext | planner | §6.2, §3.2 | 09, 13, 26, 32 | 90% | Shipped |
| 43 | Trajectory + serialise (fail-loudly contract) | planner/trajectory | §6.2, §3.4 | 42, 07 | 90% | Shipped |
| 44 | Schema repair pipeline | planner/repair | §6.2 | 42, 32 | 85% | Shipped |
| 45 | Reference ReAct planner (minimum viable) | planner/react | §6.2 | 42, 43, 44, 32 | 85% | Shipped |
| 46 | Trajectory compression / summariser | planner | §6.2 | 43, 32 | 80% | Shipped |
| 47 | Parallel-call exec + ReAct emission upgrade | planner+runtime | §6.2 | 45, 14, 42, 20, 21 | 85% | Shipped |
| 48 | Deterministic planner (proves the iface) | planner/deterministic | §6.2, §11Q6 | 42 | 85% | Shipped |
| 49 | Planner conformance pack | planner | §6.2 | 42, 45, 48 | 90% | Shipped |
| 50 | Pause/Resume Coordinator + handle registry | runtime/pauseresume | §6.3, §3.3 | 07, 09, 13 | 90% | Shipped |
| 51 | Pause-state serialise contract (fail-loud) | runtime/pauseresume | §6.3, §3.4 | 50, 43 | 90% | Shipped |
| 52 | Steering inbox + control taxonomy | runtime/steering | §6.3 | 50, 05 | 85% | Shipped |
| 53 | Steering wiring (9 control events) | runtime/steering | §6.3 | 52, 13 | 85% | Shipped |
| 53a | Agent Registry (registration identity + IDs) | runtime/registry | §6.16, §7 | 01, 05, 07, 08 | 85% | Shipped |
| 54 | Protocol task control surface | protocol | §5.2, §6.3 | 50, 53, 20 | 85% | Shipped |
| 55 | OTel traces + propagation conventions | telemetry | §6.14 | 04, 05 | 85% | Shipped |
| 56 | Metrics + OTLP + Prometheus drivers | telemetry | §6.14, §11Q5 | 55, 05 | 85% | Shipped |
| 57 | Durable event log driver (StateStore-backed) | events | §6.13 | 05, 07, 15, 16 | 85% | Shipped |
| 58 | Protocol types/methods/errors single source | protocol | §5, §8 | 01 | 90% | Shipped |
| 59 | Protocol versioning + deprecation policy | protocol | §5.3 | 58 | 85% | Shipped |
| 60 | Protocol wire transport (SSE + REST) | protocol | §5.4, §11Q1 | 58, 05 | 85% | Shipped |
| 61 | Protocol auth + identity-scope enforcement | protocol | §5.5, §4 | 58, 60, 01 | 90% | Shipped |
| 62 | Protocol conformance suite | protocol | §5 | 58, 60, 61 | 85% | Shipped |
| 63 | Harbor CLI skeleton (harbor + cobra) | cmd/harbor | §8 | 60 | 70% | Shipped |
| 64 | harbor dev v1 (boot runtime + protocol) | cmd/harbor | §8 | 63, 60 | 75% | Shipped |
| 64a | Tool catalog OAuth + approval wiring | tools/catalog | §6.4 | 26, 30, 31, 50, 64 | 80% | Shipped |
| 65 | harbor dev hot-reload | cmd/harbor | §8 | 64 | 75% | Shipped |
| 66 | harbor dev draft-save scaffolding | cmd/harbor | §8 | 64 | 75% | Shipped |
| 67 | harbor scaffold | cmd/harbor | §8 | 63 | 70% | Shipped |
| 68 | harbor validate | cmd/harbor | §8 | 63, 02 | 75% | Shipped |
| 69 | harbor inspect-events / inspect-runs | cmd/harbor | §8 | 63, 60 | 70% | Shipped |
| 70 | harbor inspect-topology (ASCII renderer) | cmd/harbor | §8 | 63, 60 | 70% | Shipped |
| 71 | harbortest test kit package | testing | §6.13 | 05, 09, 07 | 85% | Shipped |
| 72 | Console subscription protocol surface | protocol | §5.2, §7 | 60, 05, 06 | 85% | Shipped |
| 72a | events.subscribe filter ext + events.aggregate | protocol+events | §5.2, §6.13 | 60, 61, 72 | 85% | Shipped |
| 72b | IdentityScope admin-impersonation extension | protocol | §5.5, §7 | 60, 61 | 89% | Shipped |
| 72c | search.* cluster (5 methods) | protocol+search | §5.2, §7 | 60, 61, 08, 20, 05 | 85% | Shipped |
| 72d | notification.* event topic + mapper | protocol+events | §5.2, §6.13 | 05, 06, 20 | 85% | Shipped |
| 72e | pause.list snapshot Protocol method | protocol | §5.2, §6.3 | 50, 60, 61, 17 | 90% | Shipped |
| 72f | Runtime posture surface (runtime.*/metrics.snapshot) | protocol | §5.3, §6.15, §7 | 60, 61, 56 | 85% | Shipped |
| 72g | governance.posture + llm.posture | protocol | §5.5, §6.15 | 36a, 36b, 64, 72f | 85% | Shipped |
| 72h | Console DB local schema + SvelteKit scaffold | web/console | §7 | 60 | 85% | Shipped |
| 73 | Console state inspection surface | protocol | §5.2, §7 | 60, 07, 17 | 85% | Shipped* |
| 73l | Console Artifacts page | web/console | §5.2, §6.10, §7 | 73, 75 | 80% | Shipped |
| 73i | Console Flows page (Protocol + UI) | protocol+web/console | §5.2, §6.1, §7 | 73, 75, 26a | 85% | Shipped |
| 73g | Console Events page | web/console | §5.2, §6.13, §7 | 72a, 73, 75 | 80% | Shipped |
| 73c | Console Sessions page (Protocol + UI) | protocol+web/console | §5.2, §6.9, §7 | 08, 60, 61, 72a, 72b, 72c, 75 | 80% | Shipped |
| 74 | Console topology projection events | protocol | §5.2, §6.13 | 05, 09 | 85% | Shipped |
| 75 | Console e2e Playwright harness baseline | testing | §7 | 60, 72 | n/a | Shipped |
| 73k | Console MCP Connections page | web/console | §6.4, §7 | 28, 30, 50, 60, 61, 64a, 72a, 75 | 80% | Shipped |
| 73d | Console Tasks page (kanban + bulk control) | protocol+web/console | §5.2, §6.8, §7 | 20, 21, 54, 60, 61, 72c, 75 | 85% | Shipped |
| 73b | Console Live Runtime page (Protocol + UI) | protocol+web/console | §5.2, §6.3, §6.13, §7 | 60, 61, 72a, 73, 73i, 74, 75 | 85% | Shipped |
| 73n | Console Playground page (Protocol + UI) | protocol+web/console | §5.1, §6.4, §6.13, §7 | 54, 60, 61, 72b, 73l, 74, 75 | 85% | Shipped |
| 73a | Console Overview page (composition-only UI) | web/console | §5.2, §6.13, §6.15, §7 | 54, 60, 61, 72a, 72e, 72f, 73d, 75 | 70% | Shipped |
| 73m | Console Settings page + harbor console subcommand | protocol+web/console+cmd | §5.3, §5.5, §6.15, §7 | 72d, 72f, 72g, 72h, 75 | 75% | Shipped |
| 75a | Console e2e Playwright wave-end suite | testing | §7 | 75, 73a-73n | n/a | Shipped |
| 76 | Cross-tenant isolation conformance harness | testing | §4.3 | 07, 17, 23, 37, 20 | 95% | Shipped |
| 77 | Goroutine leak conformance harness | testing | §5(Go) | 10, 13, 50 | n/a | Shipped |
| 78 | Chaos / fault injection harness | testing | n/a | 76, 77 | n/a | Shipped |
| 79 | Performance benchmarks | testing | n/a | 10, 12, 05 | n/a | Shipped |
| 80 | Documentation hygiene polish (godoc, recipes) | docs | §2 | all V1 | n/a | Shipped |
| 81 | Release engineering (versioning, changelog) | release | §12 | all V1 | n/a | Shipped |
| 82 | V1 cut | release | §1, §12 | 81 | n/a | Shipped |
| 83 | Auto-sequence detection (planner opt.) | planner | §12 | 45 | n/a | Post-V1 |
| 83a | ReAct prompt structured sections | planner/react | §6.2 | 45 | 85% | Shipped |
| 83b | ReAct tool schema injection (catalog rendering) | planner/react | §6.2, §6.4 | 83a, 26 | 85% | Shipped |
| 83c | ReAct dynamic repair guidance + planning hints | planner/react | §6.2 | 83a, 44, 05 | 85% | Shipped |
| 83d | ReAct skills + memory injection (UNTRUSTED) | planner/react | §6.2, §6.6 | 83a, 23, 37 | 85% | Shipped |
| 83e | ReAct reasoning channel decoupling | planner/react+llm | §6.2, §6.5 | 45, 32, 33, 44 | 90% | Shipped |
| 83f | Dev RunLoop populates 83-band RunContext | runtime/dev | §6.2, §6.6 | 83c, 83d, 23, 37, 20 | 80% | Shipped |
| 83g | MCP southbound consumer in harbor dev | runtime/dev | §6.4 | 28, 26 | 80% | Shipped |
| 83h | Dev-binary fixes (hot-reload sqlite + LLM Model) | runtime/dev + llm | §6.5, §8 | 83g, 64, 32 | 80% | Shipped |
| 83i | RunContext wiring closure (Catalog/Trajectory/Memory/Emit) | runtime/dev + steering | §6.2, §6.6, §6.8 | 83f, 83g, 83h, 26, 23 | 80% | Shipped |
| 83n | harbor init + tiered yaml + docs/CONFIG.md + built-in tools | cli + tools/builtin | §8, §6.4 | 67, 63, 26 | 85% | Shipped |
| 83o | scaffold reads operator yaml + per-custom-tool Go stubs + --patch | cli/scaffold + config | §8, §6.4 | 67, 83n, 26 | 85% | Shipped |
| 83l | real-bifrost integration tests + snapshot CustomProviders bug fix | test/integration + cli | §6.5 | 33, 33a, 45, 83h, 83i | 80% | Shipped |
| 83m | WARN cleanup band (MCP push id, sqlite watcher, closers, skills kw, llm timeout, scopes, tool_count, reasoning) | cmd/harbor + mcp + llm + tasks + steering + planner | §6.2, §6.4, §6.5, §6.8, §8 | 83g, 83h, 83i, 83l | 85% | Shipped |
| 83k | Console release embed (make build + release pipeline rebuild Console; staleness gate; placeholder copy) | cmd/harbor + Makefile + release pipeline | §5, §8 | 73m, 81, 83n | n/a | Shipped |
| 83p | Settings two-group layout (console-local always; runtime-posture wrapped) — closes walkthrough F1 | web/console + Settings page | §5, §8 | 73m, 73p | n/a | Shipped |
| 83q | Playground sidebar nav + breadcrumb case — closes walkthrough F2 + N1 | web/console + (console) layout | §5, §7 | 73n | n/a | Shipped |
| 83r | Disconnected-state hygiene + isDisconnected() predicate — closes W1/W2/W3 + N4/N5/N8/N9/N10 | web/console (cross-page) | §5 | 73m, 73p, 83p | n/a | Shipped |
| 83s | Saved-views label "Save view" + per-page footer dedup — closes N2 + N7 | web/console (cross-page) | §5 | 73m, 73p | n/a | Shipped |
| 83u | Console DB chicken-and-egg fix — attachConnection() + best-effort DB upsert (closes round-2 F3) | web/console + Settings page | §5, §7 | 73m, 73p, 83p | n/a | Shipped |
| 83v | Runtime CORS allowlist — default-deny + per-origin echo + dev-only escape (closes round-2 F4) | internal/protocol/transports + config + cmd/harbor | §5, §7 | 60 | 90% | Shipped |
| 83w | Wire-surface gaps — friendly unknown_method info banner (F5) + mcp.servers.list (F6) | web/console + cmd/harbor + mcpconsole | §5, §6.4, §7 | 83g, 83m, 73k | n/a | Shipped |
| 83x | Real-data layout polish — W4-W11 + N11-N14 (incl. W6 created_at + W8 session-row Go fixes) | web/console + cmd/harbor + internal/protocol/artifacts | §5, §6.4, §6.6, §6.10, §6.13 | 73m, 73p, 83i, 83m | n/a | Shipped |
| 84 | Reflection / critique loop | planner | §12 | 45 | n/a | Post-V1 |
| 84a | Runtime-capability gate + session aggregates (round-8 F1+F8 closeout) | internal/protocol + web/console | §5.3, §6.4, §7 | 72f, 73c, 73d, 72b, 83w | 90% | Shipped |
| 84b | Multimodal attachment disposition policy (mechanism→policy; default ref) | internal/planner + internal/config + internal/protocol + web/console | §6.4, §6.5, §6.10 | F11/D-166, 107c | n/a | Shipped (V1.1.x) |
| 84c | Provider-native multimodal mechanism (image/audio/video first, files/PDF last; opt-in via 84b) | llm/drivers/bifrost + planner | §6.5, §6.10, §11Q3 | 84b, 107, 32 | n/a | Shipped (V1.1.x) |
| 84d | Embedding client (Embedder→bifrost) + semantic memory & skill retrieval (opt-in) | internal/embeddings + internal/memory + internal/skills | §6.5, §6.6, §6.7 | 32, 23, F11/84b | n/a | Shipped (V1.1.x) |
| 84e | Semantic memory consumption in the run loop (SearchTurns recall → <read_only_external_memory>; opt-in via 84d) | internal/runtime/runctx + internal/memory + internal/config + cmd/harbor + harbortest | §6.2, §6.5, §6.6 | 84d, 83d, 83f, 110b, 110c, 107c | n/a | Shipped (V1.1.x) |
| 85a | MCP client core-compliance fixes (roots-empty now permanent) | tools/mcp | §6.4 | 28 | 85% | Ready now |
| 85b | MCP HTTP OAuth (RFC 9728 + 8707 + RC auth SEPs) | tools/mcp+auth | §6.4, §3.3 | 28, 30, 50 | 85% | Ready now (scope ↑) |
| 85c | tools/mcp+llm | §6.4, §6.5 | 28, 32, 50 | — | Cut — RC deprecates sampling | |
| 85d | MCP elicitation provider (RC InputRequiredResult shape) | tools/mcp | §6.4, §3.3 | 28, 50, 85m | 85% | Revisit after SDK-RC |
| 85e | tools/mcp | §6.4 | 28, 85a | — | Cut — RC deprecates roots | |
| 85f | MCP remaining server features (sans logging) | tools/mcp | §6.4 | 28, 85a | 85% | Ready now (slim) |
| 85g | ui:// renderer) | web/console | §6.4, §7 | 28, 85a | — | Deprecated → superseded by 109a–c (D-172) |
| 85h | tools/mcp | §6.4 | 28 | — | Cut — RC redesigns Tasks | |
| 85i | tools/mcp | §6.4 | 85h, 28 | — | Cut — RC redesigns Tasks | |
| 85j | MCP client conformance + compliance statement (target: RC) | tools/mcp + docs | §6.4 | 85a, 85b, 85d, 85f, 85g, 85m | 85% | Revisit after RC-final |
| 85m | MCP 2026-07-28 RC adoption (sessions, headers, errors, schema, cache, trace) | tools/mcp | §6.4 | 28, 85a | 85% | Revisit after SDK-RC |
| 85k | Harbor agent-builder skills (adoption surface, ~10 SKILL.md playbooks; MCP wiring is one of them) | docs/skills + scripts | §1, §7, §6.4 | V1.1 closure, 85a (for the MCP skill), sibling Dockyard skills/ | n/a | Pending (V1.1.x) |
| 86 | Durable distributed bus driver | distributed | §6.12, §12 | 22 | 85% | Shipped |
| 87 | Durable TaskService backend | tasks | §6.8, §12 | 20, 22 | 85% | Shipped |
| 86a | Distributed task dispatcher (the MessageBus consumer) + multi-worker deployment | distributed + tasks + runtime | §6.8, §6.12, §12 | 86, 87 | 85% | Post-V1 |
| 88 | Episodic memory tier | memory | §6.6, §11Q4 | 24, 25 | n/a | Post-V1 |
| 89 | A2A northbound (Harbor as A2A server) | tools/a2a | §6.4, §11Q2 | 29 | n/a | Post-V1 |
| 90 | Additional planner concretes | planner | §12 | 49 | n/a | Post-V1 |
| 91 | Console-driven key rotation (Protocol) | governance | §6.15 | 36a, 60, 73 | n/a | Post-V1 |
| 92 | Console-driven mid-session model swap | governance | §6.15 | 36a, 60, 73 | n/a | Post-V1 |
| 92a | Agent-config control plane (extends 91/92) | governance/agentcfg | §6.15, §6.16 | 86, 87, 92, 53a, 37, 110a, 109i | n/a | Post-V1 |
| 93 | Failover chains as Harbor policy | governance | §6.15 | 36a, 33 | n/a | Post-V1 |
| 94 | Provider circuit breakers (provider, key) | governance | §6.15 | 33, 93 | n/a | Post-V1 |
| 95 | LLM cache (exact-match + semantic) | governance/cache | §6.15 | 33 | n/a | Post-V1 |
| 96 | PII redaction at the LLM boundary | audit | §6.15 | 03, 33 | n/a | Post-V1 |
| 97 | Media-input tool wrappers | tools/media | §6.5, D-021 | 17, 26, 33 | n/a | Post-V1 |
| 98 | Media-output tool wrappers | tools/media | §6.5, D-021 | 17, 26, 33 | n/a | Post-V1 |
| 99 | Vision-aware memory summarization | memory | §6.6, D-021 | 24, 33, 97 | n/a | Post-V1 |
| 100 | Recipe loader (declarative YAML flows) | runtime/flow/recipe | §6.1, D-023 | 26a | n/a | Post-V1 |
| 101 | GitHub Actions Node 24 modernisation | .github/workflows | §12 | 81 | n/a | Shipped (V1.1.x) |
| 102 | Godoc hygiene — strip internal phase jargon | internal/ + cmd/ | §1, §12 | (none hard) | n/a | Shipped (V1.1.x) |
| 103 | GitHub Pages docs site (Dockyard parity) | docs/site + workflows | §1, §7, §12 | 85k (102 soft — see D-208) | n/a | Shipped (V1.3) |
| 104 | Composable resilient flows — value proposition | RFC §1 + README + docs/skills | §1, §6.1 | 85k | n/a | Pending (V1.1.x) |
| 105 | Console first-attach UX (zero-clicks-to-attached) | web/console + cmd/harbor + internal/server | §1, §7 | 85k, 73m | n/a | Shipped |
| 106 | Playground displays the real assistant response | cmd/harbor + internal/tasks + web/console | §1, §6.5, §7 | 73 | n/a | Shipped |
| 107 | Streaming completion pipeline (bifrost → events bus → Playground) | internal/llm + internal/planner + cmd/harbor + web/console | §1, §6.5, §7 | 106, 105, 84b, 83e | n/a | Shipped |
| 107a | Reasoning trace projection (tasks.get enricher + Playground accordion) | internal/protocol + internal/tasks + cmd/harbor + web/console | §1, §6.5, §6.8, §7 | 73d, 83e, 106 | n/a | Shipped |
| 107b | Streaming answer extractor (React planner streamAnswerFilter) | internal/planner/react | §1, §6.2, §6.5, §7 | 107, 83a, 83b, 83c, 83d, 83e | n/a | Superseded by 107c (not shipped) |
| 107c | Native tool-calling + deferred tools/skills + search meta-tools (alt to 107b — collapses Path B into one wave) | internal/llm + internal/tools + internal/planner/react + internal/config + cmd/harbor | §1, §6.2, §6.4, §6.5, §6.7, §7 | 107, 83a, 83b, 83c, 83d, 83e, 83n, 37, 26, 32, 33, 33a | n/a | Shipped |
| 107d | Native parallel tool-calls (dev executor CallParallel branch + React CallParallel emission + default flip; closes 107c's serialization carve-out) | cmd/harbor + internal/runtime/parallel + internal/planner/react + internal/config | §6.2, §6.5 | 107c, 47, 83i | n/a | Shipped (V1.1.x) |
| 107e | SpawnTask + AwaitTask dev-executor dispatch (background-task execution; closes the last ErrDecisionShapeUnsupported carve-out) | cmd/harbor + internal/config | §6.2, §6.5, §6.8 | 107c, 47, 83i, 83f | n/a | Pending (V1.1.x) |
| 107f | Session artifact manifest (read-only <session_artifacts> prompt block + provenance canonicalisation) | internal/planner + cmd/harbor + internal/protocol + internal/runtime/flow | §6.2, §6.4, §6.5 | 107c, 17, 33 | n/a | Shipped (V1.1.x) |
| 108 | Playground page polish + Console shell layout (first of 14 page-polish phases) | web/console | §1, §7 | 73n, 105, 106 | n/a | Pending (V1.1.x) |
| 109a | MCP Apps runtime + Protocol surface (_meta.ui.resourceUri parse, ui:// projection, mcp.servers.read_resource, real DisplayMode negotiation, app-tool-call proxy) | internal/tools/drivers/mcp + internal/protocol + cmd/harbor | §6.4, §6.5, §7 | 28, 85a, 84a | n/a | Shipped (V1.1.x) |
| 109b | Console MCP Apps host (sandboxed iframe + CSP + official AppBridge in manual-handler mode + inline DisplayMode) | web/console | §6.4, §7 | 109a, 73n, 108 | n/a | Shipped (V1.1.x) |
| 109c | MCP Apps DisplayMode layout (fullscreen tab + pip 50/50 split + rail toggle) | web/console | §7 | 109b | n/a | Shipped (V1.1.x) |
| 109d | Inline MCP-app discovery (mcp.app_available event + MCPAppRef.server_id + ChatMessage app ref + MessageBubble renderer mount) | internal/tools/drivers/mcp + internal/protocol + web/console | §6.4, §6.5, §7 | 109a, 109b, 109c | 85% | Shipped (V1.1.x) |
| 109e | MCP App discovery reads the tool-DEFINITION _meta.ui (spec-conformance fix — discovery fires against real ext-apps servers; live-test-found) | internal/tools/drivers/mcp | §6.4, §6.5, §7 | 109a, 109d | 85% | Shipped (V1.1.x) |
| 109f | Render heavy MCP App documents (fetch the offloaded artifact) + operator "pop to side-by-side" affordance (live-test-found) | web/console | §6.4, §6.5, §7 | 109a, 109b, 109c, 109d | n/a (Console) | Shipped (V1.1.x) |
| 109g | MCP App documents render inline on every artifact driver (read_resource scopes the heavy threshold out of ui:// app docs; live-test-found) | internal/mcpconsole | §6.5, §7 | 109a | 55% | Shipped (V1.1.x) |
| 109h | MCP Apps UI-host capability advertisement (the driver advertises io.modelcontextprotocol/ui displayModes on the initialize handshake — the write side of negotiateDisplayModes — preserving roots) | internal/tools/drivers/mcp + internal/config | §6.4, §7 | 109a | n/a | Shipped (V1.1.x) |
| 109i | MCP Apps tool-context capture + mcp.apps.tool_context (the Data-Delivery backend — capture input+result behind a declared ui:// app, identity-scoped read) | internal/tools/drivers/mcp + internal/mcpconsole + internal/protocol + cmd/harbor | §6.4, §6.5, §7 | 109a, 109d, 109g | 85% | Shipped (V1.1.x) |
| 109j | Console pushes tool-input/tool-result into the app after ui/initialize (official AppBridge sendToolInput/sendToolResult) — the Data Delivery Console half | web/console | §6.4, §7 | 109i, 109b | n/a (Console) | Reverted (#346 — handshake regression; re-land #347) |
| 109k | MCP Apps spec-conformance hardening (wave-end audit fixes: mimeTypes UI capability not displayModes; server-namespaced app→host tool calls; size-changed + teardown + live-theme host obligations) | internal/tools/drivers/mcp + internal/mcpconsole + internal/protocol + web/console | §6.4, §7 | 109a, 109b, 109h, 109i, 109j | 100% | Shipped (V1.1.x) |
| 110a | Tool-executor promotion (internal/runtime/dispatch + exported answer envelope + tools.NewPlannerView; devstack degraded executor deleted) | internal/runtime/dispatch + internal/planner + internal/tools + cmd/harbor + harbortest | §6.4, §6.5, §6.2 | D-192 fix, 107d, 107e, 83i | 85% | Shipped (V1.1.x) |
| 110b | RunContext population + event-closure promotion (internal/runtime/runctx + events.IdentityStampingEmitter + llm.NewChunkPublisher; devstack Emit/OnChunk/envelope parity) | internal/runtime/runctx + internal/events + internal/llm + cmd/harbor + harbortest | §6.2, §6.5, §6.13 | 110a, 83f, 83i, 83m, 107 | 90% | Shipped (V1.1.x) |
| 110c | Config-projection exporters (five FromConfig + config.Defaults() + ValidateCore + internal/drivers/prod aggregator; fixes live devstack planner drift B3) | internal/llm + internal/memory + internal/skills + internal/planner + internal/governance + internal/config + internal/drivers/prod | §6.5, §6.6, §6.7, §9, §10 | 83l, 83f, 107d, 107e | 95% | Shipped (V1.1.x) |
| 110d | Assembly promotion (exported error-returning assemble.Assemble + MCP attach + auth.BuildProviders + events.OpenWith; D-094 mirror collapses to thin callers; headless recipe) | internal/runtime/assemble + tools/mcp + tools/auth + internal/events + cmd/harbor + harbortest | §6.4, §6.13, §9, §10 | 110a, 110b, 110c, 64, 83g, 30, 57 | 80% | Shipped (V1.1.x) |
| 111a | Governance enforcement assembly (identity_tiers actually enforce; SetFactory's first production caller) | internal/governance + cmd/harbor + harbortest | §6.15, §6.5, §6.11 | 32, 36a, 36b, 110c (soft) | 90% | Shipped (V1.1.x) |
| 111b | Tool-OAuth completion leg (auth.CallbackHandler + full pause→callback→resume choreography E2E) | internal/tools/auth + cmd/harbor | §6.4, §3.3, §6.3 | 30, 50, 31, D-192 fix | 85% | Shipped (V1.1.x) |
| 111c | Durable pauses + pause lifecycle (checkpoint-store wiring, trajectory threading, max-park sweeper → DecisionTimeout's first producer) | internal/runtime/pauseresume + internal/runtime/steering + cmd/harbor | §3.3, §6.3, §6.11 | 50, 51, D-192 fix | 90% | Shipped (V1.1.x) |
| 111d | Skills canonical surface + ingestion (builtin→Phase-38 delegation; harbor skill import/rm; Directory disposition decision) | internal/skills + internal/tools/builtin + cmd/harbor | §6.7, §8 | 37, 38, 39, 40, 41, 107c | 85% | Shipped (V1.1.x) |
| 111e | Trajectory compression consumer (LLM-backed planner.Summariser + RunLoop MaybeCompress + token_budget wiring) | internal/llm/summarizer + internal/planner + internal/runtime/steering + cmd/harbor | §6.2, §6.5 | 46, 35, 107, D-192 fix | 85% | Shipped (V1.1.x) |
| 111f | Telemetry assembly + approval-gate authorizer seam (telemetry.New in production; BridgeBusToTracer; protocolauth out of approval) | internal/telemetry + internal/tools/approval + internal/runtime/steering + cmd/harbor | §6.14, §6.4, §5.1 | 03, 04, 05, 31, 55, 56, D-192 fix | 85% | Shipped (V1.1.x) |
| 112a | The public SDK facade (sdk/ alias-based re-export tree per RFC §3.6) | sdk/ (new top-level) | §3.6, §1 | 110a-d, 111a-f, D-204 | n/a | Shipped (V1.2) |
| 112b | External consumers on the facade + the external-module compile gate (scaffold templates, harbortest vocabulary, recipes/README, the standing gate) | cmd/harbor/scaffold + harbortest + docs | §3.6, §8 | 112a | n/a | Shipped (V1.2) |
| 113a | Protocol adoption track — generated contract reference (cmd/harbor-gen-protocol-docs + protocol-docs-gen-check gate) + the executed quickstart + choreographies 1–3 + nav/README/§18 | cmd/harbor-gen-protocol-docs + docs/site + workflows | §5, §3.6 | 103, 58, 59, 60, 61, 62, 110c | 70% | Shipped |
| 113b | Protocol adoption track — pause + versioning choreographies, build-a-client (worked event-viewer + compile gate), conformance-certification page | docs/site + examples/protocol-clients | §5, §3.3, §6.3 | 113a, 50, 72e, 111b, 111c, 84a | n/a | Shipped |
| 114 | Steering verified-identity authority (control surface derives caller scope + tenant from the verified ctx, not the request body — closes a steering privilege escalation) | internal/protocol | §6.3, §5.5 | 52, 55, 56 | 85% | Shipped (V1.1.x) |
| 115 | Production JWT verification (JWKS) + harbor serve (the JWKSURL/JWKSFile config fields gain a consumer; a production auth path beyond the dev signer) | internal/protocol/auth + cmd/harbor + internal/config | §5.5 | 114, 55, 56 | n/a | Shipped (V1.1.x) |
| 116 | Non-admin session-scoped token contract (lesser-privileged tokens — the steering-authority consumer that makes 114 load-bearing; safe session_user derivation) | internal/protocol/auth + internal/protocol | §5.5, §6.3 | 114, 115 | n/a | Shipped (V1.1.x) |
| 117 | Chat module encapsulation hardening (self-contained theming contract + font-family inheritance + host/theme parameterization per D-091; no Console look-and-feel leakage) | web/console | §7 | 109b, 108 | n/a (Console) | Shipped (V1.1.x) |
| 118 | Protocol TS lockstep gate (cmd/harbor-protocol-ts-lockstep emits a committed wire manifest; protocol-ts-gen-check VERIFIES the hand-maintained per-page TS client field-by-field — D-093's "generate" half deferred, D-223; generator name reserved) | cmd/harbor-protocol-ts-lockstep + web/console + workflows | §5 | 113a | n/a | Shipped (V1.1.x) |
V1 critical path: phases 01–82 + 26a + 36a + 36b (85 phases beyond skeleton). Post-V1 follow-ups: phases 83–84, 86–100, plus the lettered bands 83a–e (ReAct prompt depth + reasoning-channel decoupling) and 85a–j + 85m (MCP client/host compliance — the prioritised first post-V1 work; 85k is the separate Harbor agent-builder skills phase). The integer phase 85 (Skills Portico provider driver) was removed; the 85-band is now MCP compliance. Per the MCP 2026-07-28 RC re-plan (2026-05-28) the 85-band re-shapes: 85a / 85b / 85f are ready now; 85d / 85m revisit after SDK-RC (≈ Aug 2026); 85g / 85j revisit after RC-final (2026-07-28); 85c / 85e / 85h / 85i are cut. Governance is 91–96, Multimodal-output 97–99, Recipe loader 100. The next release tag is V1.1.x — both the hygiene + positioning + UX band (101–104 + 108) and the Playground-depth band (105 + 106 + 107 + 107a + 107c + 107d) roll up under it; the previously-sketched V1.2 / V1.3 splits collapse. Phases 105–107c ship with this release: Console first-attach UX (105), Playground real assistant response (106), the streaming completion pipeline (107), reasoning trace projection (107a), and native tool-calling + deferred tools/skills + search meta-tools (107c) — the four built-in *_search/*_get meta-tools plus the optional declarative_action escape-hatch tool preserving brief 07's prompt-engineered path for weaker models. The 107b streaming answer extractor was deliberately superseded by 107c (one cutover instead of stop-gap-then-replace); the file at docs/plans/phase-107b-streaming-answer-extractor.md is kept as historical context. Phase 107d (shipped) is the native-tool-calling follow-up that closes 107c's documented serialization carve-out: it wires the already-shipped internal/runtime/parallel.Executor (Phase 47 / D-056) into the dev ToolExecutor, flips the React planner to native CallParallel emission for N>1 tool-calls, and pins the JoinKind-collapses-to-JoinAll-on-native semantic (D-169). Phase 107e (pending) closes the last ErrDecisionShapeUnsupported carve-out the dev ToolExecutor carries: it wires planner.SpawnTask + planner.AwaitTask dispatch through the already-shipped tasks.TaskRegistry (Phase 47 / D-056) and teaches the per-task RunLoop driver to drive KindBackground tasks (closing the D-097 dead-task gap for the background kind), bounded by a new planner.absolute_max_spawn_depth recursion cap; on the synchronous V1.1.x runloop a retain-turn spawn blocks in-decision and a non-retain-turn spawn is joined by an explicit AwaitTask (eager push wake-on-resolution is a documented steering-runloop follow-up). SpawnTask + AwaitTask dispatch land together per §13 (D-170). Phase 108 starts a 14-round page-by-page visual-polish series (one phase per Console page, anchored to docs/design/console/page-*.md + docs/design/console/CONVENTIONS.md) and is the largest piece still pending under V1.1.x. Background context for the native-tool-calling cutover: research brief 15. Immediately after Phase 108, the three-phase "MCP Apps host" wave 109a–c lands (D-172): 109a (MCP Apps runtime + Protocol surface — _meta.ui.resourceUri parse, ui:// projection, mcp.servers.read_resource, real DisplayMode negotiation, app-tool-call proxy), 109b (Console sandboxed-iframe host + the official ext-apps AppBridge in manual-handler mode + inline DisplayMode), 109c (fullscreen-tab + pip-split DisplayMode layout). This wave deprecates and supersedes Phase 85g, pulling MCP Apps forward from the post-V1 85-band: Apps is a stable independent extension (io.modelcontextprotocol/ui), not gated on the July RC, and ships an official host bridge that removes 85g's hand-rolled-bridge risk. The architectural invariant is D-173 — the AppBridge runs in manual-handler mode and every app→host call is Protocol-proxied through the Runtime, never a direct MCP connection, so an in-iframe app stays inside the (tenant, user, session) isolation boundary and the unified approval/OAuth gates. The 14-round page-polish series continues from the next free integer after the 109 band; the band precedes it in execution order, it does not displace it. Live Runtime reframe (2026-06-01, D-177): after 108d shipped the topology-first Live Runtime page, an operator review found it low-value and Playground-overlapping on the dominant planner/RunLoop runtime (no engine graph). Phase 108e supersedes the topology-first composition (D-126) with a single-runtime capability-adaptive cockpit — the runtime's advertised runtime.info capabilities compose the page (an always-present spine + capability-gated topology / health / cost panels), so it is full on a planner runtime and richer on engine/multi-agent shapes with no rebuild. Plan: docs/plans/phase-108e-live-runtime-capability-cockpit.md. Protocol auth-hardening sequence (114–116, D-219): a planning + adversarial review of the Protocol surface found a steering-control privilege escalation — dispatchControl derived caller scope + tenant from the request body instead of the verified context identity, so a caller could assert scope:"admin" in the body and the cross-tenant gate could never fire. Phase 114 (shipped) closes it: the control surface now reads authority from identity.From(ctx) + the JWT scope claims, fails closed when no verified identity is present, and a non-admin caller can steer only runs it owns (admin for cross-tenant). 114 is the prerequisite hardening for the lesser-privileged-token work: Phase 115 adds production JWKS verification + a harbor serve auth path (giving the inert JWKSURL/JWKSFile config fields a consumer), and Phase 116 introduces the non-admin session-scoped token contract — the consumer that makes 114's derivation load-bearing and the seam where the session_user tier becomes safe to grant. Independently, Phase 117 hardens the chat module's encapsulation boundary (D-091) so it renders self-contained — its own theming contract, font-family inheritance, and host/theme parameterization — with no Console look-and-feel leakage, and Phase 118 builds the long-tracked protocol-ts-gen-check gate as a field-level lockstep VERIFICATION of the hand-maintained per-page TS client against a committed, Go-generated wire manifest (cmd/harbor-protocol-ts-lockstep) — a D-093 deviation (D-223): the "generate" half (per-domain generated TS type modules) is a deferred future phase and the cmd/harbor-gen-protocol-ts name stays reserved for it.
Shipped* (Phase 73): the phase was dissolved — its surface was decomposed across the Console page phases that consumed each slice; the methods with no V1 consumer are deferred post-V1. See the Phase 73 detail block and D-133.
Per-phase detail
Format: Phase NN — Name (RFC §X.X). Each entry is the stub the per-PR plan file expands. Acceptance criteria are binding once the phase ships.
01 — Identity & isolation triple (RFC §4)
Goal. Provide the identity package: Identity{TenantID, UserID, SessionID}, From / MustFrom / With(ctx). The triple flows through every layer. Acceptance. MustFrom panics in handler-only paths; From returns ok-bool elsewhere; round-trips through JWT claims and JSON; identity scopes can be derived (admin / console:fleet). Smoke. phase-01.sh asserts the package exists and tests pass; no protocol surface yet. Tests. Unit + property (round-trip). Risks. None significant.
02 — Configuration loader (RFC §10)
Goal. YAML + env + flag layering; per-key annotation restart_required vs live; structured validation errors that point to the offending source. Acceptance. Loader returns typed Config; missing required keys fail with file:line; examples/harbor.yaml round-trips. Smoke. harbor validate --config examples/harbor.yaml returns 0 (subcommand auto-skip until phase 68). Tests. Unit on layering precedence; golden tests on validation errors.
03 — Audit redactor (RFC §6.4, §6.15)
Goal. A single audit.Redactor that summarizes/truncates/redacts payloads before persistence or emission. Used by Logger, EventBus persistence, tool audit. Acceptance. Redactor handles nested maps, byte arrays, secret-shaped strings (bearer/api-key/jwt), and oversize payloads; configurable allowlist/denylist; audit emits audit.redacted events for inspection. Smoke. N/A (library only). Tests. Unit + golden (fixed-input fixed-output).
04 — slog Logger + standard attribute set (RFC §6.14)
Goal. Logger wrapper around log/slog; pinned attribute set (tenant_id, user_id, session_id, run_id, task_id, trace_id, span_id, tool); JSON in production, text in dev; emits a paired runtime.error bus event on Error. Acceptance. Loggers accept WithIdentity(Identity); no log carries unredacted secret payloads (uses phase 03); CLI flag --log-format=text|json selects handler at process start. Smoke. N/A. Tests. Unit; integration with phase 03 redactor. Deps. 03.
05 — Event taxonomy + InMem EventBus + isolation (RFC §6.13)
Goal. Event, EventType (exhaustive sealed enum), EventPayload sealed interface, EventBus.Publish/Subscribe, Filter with server-enforced identity gates. In-memory MPSC ingress + per-subscriber bounded fan-out + drop-oldest with bus.dropped events. Acceptance. Subscribe rejects filters that elide the identity triple unless the caller has admin scope; identity-scope mismatches are audited; cardinality lint check fails CI on RunID/TraceID metric labels. Smoke. phase-05.sh asserts EventType exhaustiveness via go test; protocol smoke skips. Tests. Unit + fan-out + drop-policy + cross-tenant isolation; goroutine leak test. Deps. 01, 03.
06 — Bus replay + ring buffer + cursor (RFC §6.13)
Goal. Replay(from Cursor, filter) against an in-memory ring (default 10k events, configurable). Cursor = (SessionID, Sequence); gap-free guarantee within a RunID. Acceptance. Late subscriber resumes cleanly; no duplicates; documented loss when ring overrun (durable log handled in phase 57). Tests. Unit + concurrency (subscribe-during-publish); idle-subscription reaper test. Deps. 05.
07 — StateStore iface + InMem + conformance suite (RFC §6.11, §9)
Goal. Single mandatory StateStore interface (no Supports* ceremony). InMem driver. conformance.RunSuite(t, factory) covering save/load/idempotency/identity-mandatory/cross-tenant-isolation/cross-session-isolation/concurrency/leak. Acceptance. InMem passes the suite; the suite is the gate every later driver must pass; documented EventID (ULID) idempotency. Smoke. N/A. Tests. Unit + the conformance suite itself. Deps. 01, 03.
08 — SessionRegistry + lifecycle + GC (RFC §6.9)
Goal. SessionRegistry over phase 07 store. Open/get/touch/close/inspect/GC. Identity triple captured on Open and immutable; reopen-after-close rejected; GC sweeps idle sessions but never reaps RUNNING. Acceptance. Defaults: idle 24 h, hard cap 30 days, sweep 15 min; configurable via GCPolicy. Tests. Unit + integration; cross-tenant isolation test on Open. Deps. 01, 07.
09 — Envelopes, Headers, Identity quadruple (RFC §6.1)
Goal. Envelope{Payload, Headers, RunID, SessionID, Timestamp, DeadlineAt, Meta}. Headers{TenantID, UserID, Topic, Priority}. RunID is the runtime concurrency boundary; TraceID reserved for OTel. Acceptance. WithRunID returns a copy; (Tenant, User, Session, Run) round-trips through JSON; Meta last-write-wins on collision (until merge function lands as RFC follow-up). Tests. Unit + JSON round-trip. Deps. 01, 08.
10 — Engine + workers + cycle detection (RFC §6.1)
Goal. Engine with one goroutine per node, bounded channels per adjacency (default 64), cycle detector at construction (AllowCycle opt-in), Run / Stop / Emit / Fetch. Egress dispatcher always-on. Acceptance. Linear graph end-to-end works; Stop joins all workers; goroutine-leak test passes; cycle detector rejects without AllowCycle. Smoke. harbor dev boots an empty engine; /healthz returns 200 (gated by phase 64). Tests. Unit + integration + leak. Deps. 09.
11 — Reliability shell (RFC §6.1)
Goal. Per-node NodePolicy{Validate, TimeoutMS, MaxRetries, BackoffBase, BackoffMult, MaxBackoff}. RunError{Code, Message, Cause, Metadata}. Errors route to Protocol unconditionally; egress emission is opt-in via engine option. Acceptance. Timeout produces RunError(NodeTimeout); retries respect MaxRetries; validate=both rejects malformed envelopes. Tests. Unit on backoff math; integration per error code. Deps. 10.
12 — Streaming + per-run capacity backpressure (RFC §6.1)
Goal. StreamFrame{StreamID, Seq, Text, Done, Meta}. EmitChunk honors per-run capacity waiters keyed by RunID. Backpressure baked in, not bolted on — the seam closes the predecessor's deadlock-under-streaming gap. Acceptance. N parallel runs × K frames each: ordering preserved per StreamID; no cross-run deadlock; goroutine-leak under streaming returns to baseline after Stop. Tests. Integration + concurrency + leak. Deps. 10, 11. Risks. This is Brief 01's "must bake in." Don't accept a "we'll add it later" PR.
13 — Cancellation + per-run fetch dispatcher (RFC §6.1)
Goal. Cancel(runID) is idempotent, drops queued envelopes for that run only, cancels in-flight invocations, drains per-run egress. FetchByRun(runID) demuxes via per-run dispatcher (always-on, no dual mode). Acceptance. Two concurrent runs; cancelling one leaves the other completing; FetchByRun never returns frames from another run. Tests. Concurrency + property (cancel idempotency). Deps. 10, 12.
14 — Routers + concurrency utils + subflows (RFC §6.1)
Goal. PredicateRouter, UnionRouter, RoutePolicy, MapConcurrent, JoinK, Subflow(factory, parent, opts...) (mirrors parent cancellation; runs to first egress payload). Acceptance. Each pattern matches its specified behavior; subflow cancellation mirrors parent. Tests. Integration per pattern. Deps. 10, 11.
15 — SQLite StateStore driver (RFC §6.11, §9)
Goal. modernc.org/sqlite (CGo-free), WAL journal, forward-only migrations under internal/state/sqlite/migrations/. Acceptance. Passes the phase 07 conformance suite end-to-end; clean DB starts cleanly; existing DB at version N migrates to N+1 idempotently. Tests. Conformance suite + migration tests. Deps. 07.
16 — Postgres StateStore driver (RFC §6.11, §9)
Goal. pgx/v5/stdlib-backed state.StateStore, embedded forward-only migrations gated by pg_advisory_lock for safe multi-replica boot, opaque BYTEA payloads (per RFC §6.11 + D-027 — superseding the older brief 05 §1 "JSONB payloads" narrative). Acceptance. Passes the phase 07 conformance suite end-to-end; CI matrix exercises against a containerized Postgres. Tests. Conformance suite + migration tests (clean-start, idempotency, advisory-lock concurrent boot) + Postgres-specific concurrent-reuse stress. Deps. 07.
17 — ArtifactStore iface + InMem + Filesystem drivers (RFC §6.10, §9)
Goal. Mandatory routing above heavy-output threshold (default 32 KB, runtime-configurable, per-tool overridable). ScopedArtifacts facade auto-stamps identity. Content-addressed IDs. Acceptance. Re-uploading identical bytes returns the existing ref; cross-scope reads rejected; NoOp fallback explicitly absent. Tests. Unit + isolation; dedup test. Deps. 01, 07.
18 — ArtifactStore SQLite-blob + Postgres-blob (RFC §6.10, §9)
Goal. Persistent artifact lifetimes that survive restart; same conformance suite as InMem + FS. Acceptance. Bytes round-trip; deletion is scope-checked; size enforcement matches thresholds. Tests. Conformance suite. Deps. 17, 15, 16.
19 — ArtifactStore S3-style driver (RFC §6.10)
Goal. S3-compatible driver behind the same interface (suitable for MinIO/AWS/R2/GCS-via-compat). Acceptance. Conformance suite; lifecycle integration; presigned-URL GetRef path. Tests. Conformance + integration against MinIO container. Deps. 17. Risks. V1 stretch — can slip to V1.1 if calendar pressure builds.
20 — TaskRegistry iface + InProcess + lifecycle (RFC §6.8)
Goal. Single TaskID namespace unifying foreground + background; lifecycle state machine (PENDING → RUNNING → COMPLETE, with PAUSED → RUNNING, FAILED|CANCELLED terminal); idempotency via IdempotencyKey; cancellation propagates per PropagateOnCancel. Acceptance. Spawning with same IdempotencyKey returns same handle; cascade vs isolate behave per spec. Tests. Unit + concurrency + isolation. Deps. 01, 07.
21 — TaskGroup + retain-turn + patches (RFC §6.8)
Goal. Group resolution/sealing/cancel/apply; retain-turn semantics block foreground until group completes; ApplyPatch for human-approved context patches; AcknowledgeBackground. Acceptance. Group sealing freezes membership; retain-turn correctly blocks; patches transition through pending → applied/rejected. Tests. Integration; group lifecycle property tests. Deps. 20.
22 — MessageBus + RemoteTransport contracts (RFC §6.12)
Goal. Contract definitions + in-process MessageBus (loopback) + RemoteTransport capable of A2A. Publish is at-least-once; handlers idempotent on (TaskID, Edge, EventID). No durable distributed driver at V1. Acceptance. In-process loopback delivers; RemoteTransport returns request/reply and stream with final done=true. Tests. Unit + integration; contract tests for distributed driver (skip when no driver wired). Deps. 09, 20.
23 — MemoryStore iface + InMem + conformance (RFC §6.6)
Goal. MemoryStore interface with mandatory identity (require_explicit_key=true, no opt-out). Strategy=none only. Conformance harness includes fail-closed-on-missing-SessionID test. Acceptance. Missing identity fails closed + emits audit event; InMem passes the suite. Tests. Conformance suite. Deps. 01, 07.
24 — Memory strategies (RFC §6.6)
Goal. Add truncation and rolling_summary. Health states healthy → retry → degraded → recovering → healthy. Summarizer is an injectable Summarizer interface (LLM call lives in phase 32+). Acceptance. Strategy matrix tested; degraded mode falls back to recent-window + queues recovery loop bounded by RecoveryBacklogMax; memory.health_changed events emitted. Tests. Strategy matrix + property + integration with a stub summarizer. Deps. 23. Status. Shipped (D-035 — OverflowDropOldest-only enum, bounded recovery loop with memory.recovery_dropped overflow emit, retry/backoff/cadence constants not exposed as config; phase plan phase-24-memory-strategies.md).
25 — SQLite + Postgres memory drivers (RFC §6.6, §9)
Goal. Persistent memory state across restarts; same conformance suite. Acceptance. All three drivers (InMem, SQLite, PG) pass; Snapshot/Restore round-trips byte-stable. Tests. Conformance + Snapshot round-trip. Deps. 23, 15, 16.
26 — Tool catalog core + InProcess registration (RFC §6.4)
Goal. Tool, ToolDescriptor, ToolCatalog, ToolProvider interfaces + the ToolPolicy reliability shell (D-024). In-process registration via Go generics + reflection (schemas derived from input/output types) — tools.RegisterFunc(name, fn, opts...) is the minimum-expression API. CatalogFilter keyed on (tenant, user, session) triple plus GrantedScopes. Argument validation at the catalog edge using santhosh-tekuri/jsonschema. Dispatcher wraps every invocation in the ToolPolicy shell (timeout / retry-with-exponential-backoff / validation) regardless of transport — so even a zero-config RegisterFunc is production-resilient. Acceptance. A registered Go function appears in cat.List(filter) for the matching identity; arg validation produces typed tool.invalid_args events on failure; default ToolPolicy (zero-value) yields a 3-retry / 100ms→30s exponential backoff / 30s timeout shell on transient errors; tools.WithPolicy(...) overrides each axis. Tests. Unit (filter combinations + ToolPolicy default firing); integration; concurrency (N concurrent calls under a misbehaving tool — backoff respected). Deps. 01, 05, 09.
26a — Flow-as-Tool registration + per-flow Budget (RFC §6.1, §6.4, D-023)
Goal. flow.Definition shape (entry/exit nodes, node specs, optional intrinsic Budget). flow.Compose(def) → Engine builds a runnable engine reusable across invocations. flow.RegisterAsTool(catalog, def, eng) wires the Engine into the Tool catalog with Transport: Flow and schemas derived from entry/exit types. Per-flow Budget (deadline / hop-budget / cost-cap) composes with parent run + identity-tier ceilings via min(); whichever fires first aborts the flow with ErrFlowBudgetExceeded. Reliability shell: per-node NodePolicy from §6.1 still applies inside the flow; no double-wrapping. Acceptance. A 3-node flow registers as a Tool whose schema reflects entry-input → exit-output; planner invokes it through the standard dispatcher; per-flow budget exceedance emits flow.budget_exceeded and produces ErrFlowBudgetExceeded; identity-tier governance can still abort the same flow via ErrBudgetExceeded. Tests assert both abort paths fire correctly under contention. Tests. Unit (Definition validation; min() composition math). Integration (flow-as-tool round-trip via planner mock; budget-exceedance events). Concurrency (parallel flow invocations don't bleed budget state across runs). Smoke additions. flow.budget_exceeded event observable; ErrFlowBudgetExceeded mappable to a tool.error payload. Coverage target. internal/runtime/flow: 85%. Deps. 14 (subflows + reliability shell), 26 (tool catalog + ToolPolicy). Briefs. brief 01 §6.1 / §6.5 (subflow lifecycle and reliability shell). Risks. Budget-composition math under concurrent flow invocations — must be lock-free / atomic, same pattern as 36a's accumulator. Document. RFC anchor. §6.1 (Flow-as-Tool subsection) + §6.4 (Flow transport variant).
27 — HTTP tool driver (RFC §6.4)
Goal. Inline (RegisterHTTPTool(name, method, urlTemplate, ...)) and out-of-process via UTCP-style manifest. Static auth (API key, bearer, cookie). Retry + rate-limit handling. Acceptance. Both inline + manifest paths drive the same ToolDescriptor; integration against httptest.Server. Shipped — internal/tools/drivers/http exports RegisterHTTPTool, LoadManifest, RegisterManifest, three AuthKinds; URL/body/header templates use text/template with urlquery escaping and reject {{ .Auth.* }} references at load time (AGENTS.md §7 — no credential passthrough). Retry-After (seconds-integer + HTTP-date) honoured before returning the rate-limit error so the policy shell's exponential backoff stacks on top — driver consumes ONE retry budget per Invoke (D-024 no double-wrap). 4xx maps to ErrToolInvalidArgs (planner-reformulation channel); 5xx + transport errors are transient. ToolsConfig.HTTPManifests []string added to internal/config. Coverage: 88% (target 85%). D-025 concurrent-reuse test exercises N=128 invocations against a shared httptest.Server under -race; no context bleed, no goroutine leaks. Tests. Integration; retry test. Deps. 26.
28 — MCP southbound driver (RFC §6.4)
Goal. Go MCP client over stdio + streamable-HTTP + SSE. Auto-detect via MCPTransportMode = Auto | SSE | StreamableHTTP. Tool/resource/prompt mapping into Tool. Transport-level reconnect lives in ToolPolicy (D-024 retry shell), not in a parallel state machine inside the driver (D-037). Acceptance. Mock MCP server (in-process) integration tests pass; resource subscriptions emit a separate event topic (mcp.resource_updated). Tests. Integration + transport-fallback test; D-025 concurrent-reuse (N=100) against the in-process mock server pair. Deps. 26. Implementation note. Wraps github.com/modelcontextprotocol/go-sdk@v1.6.0 — the official Go SDK. Auto-mode fallback (streamable-HTTP → SSE) lives at Provider.Connect, not at Transport.Connect, so failures during the MCP initialize handshake (a client.Connect error) trigger the fallback the same as transport-level connect errors. See docs/decisions.md D-037.
29 — A2A southbound driver (full spec) (RFC §6.4)
Goal. Agent Card discovery (GET /.well-known/agent-card.json); JSON-RPC message/send, message/stream (SSE), tasks/get, tasks/cancel, tasks/pushNotificationConfig/*. Registry with route scoring (trust tier, latency tier, capability match). Acceptance. Mock A2A server integration (full Agent Card); registry resolves remote skills; A2A peers appear as Tool entries via ToolProvider. Tests. Integration + spec-compliance suite. Deps. 26, 22.
30 — Tool-side OAuth + HITL via pause/resume (RFC §6.4, §3.3)
Goal. TokenStore interface (InMem + SQLite + Postgres drivers) with encryption-at-rest for token material. OAuthProvider covering both user-bound and agent-bound binding scopes — BindingScope is a declared config field, not inferred. On tool.auth_required, the tool driver emits a typed ErrAuthRequired carrying a structured payload (provider, scope, binding-scope, flow-initiation URL); the runtime pauses via the unified pause/resume primitive (phase 50). Resume reattaches the token; A2A AUTH_REQUIRED converges on the same primitive. Authorization flows use PKCE; RFC 7591 dynamic client registration and authorization-server metadata discovery are supported. Agent-bound tokens are keyed by the Agent Registry's registration agent_id (phase 53a, D-059) — never by an isolation-tuple element, since agent_id is not part of the isolation tuple. Acceptance. OAuth full pause/resume cycle round-trips for both binding scopes; A2A AUTH_REQUIRED triggers an identical event shape; ErrAuthRequired payload is typed and audit-redacted (no raw token material in events); PKCE challenge/verifier round-trips; dynamic registration + discovery exercised against a test authorization server; token material is encrypted at rest (driver conformance asserts ciphertext on disk); admin-scope authz gates protect provider configuration; cross-tenant / cross-user / cross-agent isolation conformance — one identity's tokens never resolve for another; user-bound and agent-bound tokens coexist for the same tool without collision; initiate-then-cancel emits no goroutine leak. Tests. Integration end-to-end (both binding scopes); conformance with phase 50; isolation conformance (cross-tenant/user/agent); encryption-at-rest driver conformance; goroutine-leak (initiate-then-cancel). Deps. 26, 50, 53a. Briefs. brief 09 (docs/research/09-mcp-oauth-from-bifrost.md) — documents bifrost's OAuth surface (OAuth2Provider, OAuth2Config, OAuth2Token, OAuth2FlowInitiation, MCPUserOAuthRequiredError, MCPClientConfig OAuth fields) as a Go-shaped reference for what to lift, what to leave, and what Harbor must add. Bring back into the conversation when authoring the per-phase plan file (§"Re-discussion checklist" at the bottom of the brief). §4.3 deviation (shipped). The master-plan line "TokenStore (InMem + SQLite + Postgres drivers)" was implemented as a typed wrapper over the existing state.StateStore §4.4 seam (D-027) — the same approach Phase 50 (D-067) and Phase 53a (D-068) took for their persistence layers. Driver pluralism (in-mem / SQLite / Postgres) is inherited from the StateStore triad; the Phase 30 conformance suite runs the same TokenStore assertions against every StateStore driver to prove parity. This avoids the §13 two-parallel-implementations smell. Documented in D-083.
31 — Tool-side approval gates (RFC §6.4, §3.3)
Goal. Synchronous "approve this tool call" gates using the same pause/resume primitive — distinct from OAuth, simpler payload shape. Acceptance. APPROVE/REJECT round-trip via the protocol; reject path raises typed tool.rejected events. Tests. Integration. Deps. 30. §4.3 deviation (shipped). The master-plan row's owning-subsystem tools/auth was the right home for "approval as another consumer of the OAuth machinery." The implementation chose a SIBLING package internal/tools/approval under internal/tools/ so the approval gate has zero OAuth baggage (no TokenStore, no Sealer, no PKCE / RFC 7591 / discovery surface — none of which an HITL approval gate needs). The two siblings (auth/ + approval/) share the Coordinator + bus + redactor seams via the public pauseresume / events / audit packages; nothing else. The master-plan row's subsystem column was updated tools/auth → tools/approval in the same PR. Documented in D-086 §1 ("the approval-gate package is a SIBLING of internal/tools/auth, not a subpackage"). Settled decisions: D-086. See also. docs/plans/phase-31-tool-approval-gates.md.
32 — LLM client core (RFC §6.5)
Goal. LLMClient interface — one method, Complete(ctx, req) (resp, error). CompleteRequest carries Messages whose Content is a sum-type (Text *string for the common case, or multimodal Parts []ContentPart for image/audio/file inputs — D-021), optional ResponseFormat, optional OnContent/OnReasoning streaming callbacks, cancellation via ctx, reasoning-effort hint. No Tools, no ToolChoice, no FunctionCall — tool dispatch lives in the runtime (RFC §6.4 "Code-level tool dispatch"). Inline DataURL content above the heavy-output threshold is auto-materialized to ArtifactRef before persistence/emit (D-022). Context-window safety net (D-026): a catch-all pass at the LLM-client edge walks the assembled CompleteRequest immediately before the driver call and (a) fails loudly with ErrContextLeak if any message field carries raw bytes/strings ≥ heavy-output threshold that aren't ArtifactStub-shaped, (b) estimates total tokens against the model's configured context limit and fails with ErrContextWindowExceeded when the estimate is within ContextWindowReserve (default 5%) of the cap. V1 fails loudly; auto-cascade is post-V1. Acceptance. Mock LLM client passes round-trip with text-only AND multimodal payloads (text + image part). Cancellation aborts streaming cleanly. Interface compiles without any tool-calling type ever appearing in internal/llm/.... Auto-materialization of oversized DataURL content is observable via llm.image.materialized event. Safety-net catch-all pass exists; planted-leak test (a deliberately-buggy producer that emits ≥-threshold raw bytes) triggers ErrContextLeak + llm.context_leak audit event. Token-budget test (a synthetic huge prompt) triggers ErrContextWindowExceeded cleanly with a reservedness margin matching config.Tests. Unit + integration with mock (text + multimodal); assert no Tool* symbol leaks into the LLM package; auto-materialize threshold test; planted-leak test (raw bytes survive a producer); token-budget test (synthetic big prompt); ArtifactStub round-trip test (a stub renders to the model-agnostic JSON shape and parses back).Deps. 09.
33 — bifrost integration (RFC §6.5, §11 Q-3)
Goal. Wire github.com/maximhq/bifrost/core (pure Go LLM gateway library) behind LLMClient. Implement a thin Driver adapter that translates Harbor's CompleteRequest ↔ bifrost's BifrostChatRequest / BifrostChatResponse, and a minimal schemas.Account providing API keys. Translation includes multimodal ContentParts (D-021): map Harbor's ImagePart/AudioPart/FilePart (with URL / DataURL / Artifact supply forms) to bifrost's per-provider content shapes; auto-materialize oversized DataURL content to ArtifactRef (D-022) before sending. Bifrost's Tools / ToolChoice parameters are intentionally NOT used — Harbor's runtime owns tool dispatch (RFC §6.4). Q-3 is resolved; this is a normal implementation phase, not a decision gate. Acceptance. Six-provider smoke green: basic chat + json_object response_format + streaming with content callback + ctx cancellation accepted by the runtime + token usage parsed + cost parsed + one multimodal text+image round-trip against a vision-capable model. Driver registers via init() blank-import per AGENTS.md §4.4. The driver package contains zero references to bifrost's Tools / ToolChoice types. Tests. Unit (request/response translation); integration with mock; six-provider live conformance test (gated behind HARBOR_LIVE_LLM=1 so CI does not burn API credits by default — the local dev loop and harbor dev do exercise it). Deps. 32. Risks. Bifrost requires Go 1.26+; Harbor's go.mod was bumped during validation. Stream-channel close timing on long streams may exceed naive cancel budgets — mitigation is ctx.Done()-driven channel-reader abandonment + goroutine-leak tests. See also. docs/research/08-llm-client-validation.md (full validation report and results).
33a — Custom OpenAI-compatible providers + per-provider timeouts (RFC §6.5)
Goal. Extend Phase 33's bifrost driver so operators can wire any OpenAI-compatible LLM endpoint (NIM, vLLM, ollama, lm-studio, in-house gateways) via harbor.yaml without per-provider Go code. Adds LLMConfig.CustomProviders []LLMCustomProviderConfig (Name / BaseURL / APIKeyEnvVar / Models / per-provider Timeout / retry/backoff/concurrency knobs / RequestPathOverrides) + LLMConfig.NetworkDefaults (global fallthrough for native + custom). When llm.provider names a custom entry, the entry's network knobs apply and legacy llm.api_key / llm.base_url / llm.timeout are ignored. Phase 33a supports only base_provider_type: openai; future phases widen. Acceptance. Account widened to multi-entry (single-PRIMARY contract per D-040 preserved — GetConfiguredProviders returns the one configured primary). GetConfigForProvider returns *ProviderConfig with CustomProviderConfig.BaseProviderType = schemas.OpenAI when the primary is a custom entry. Missing env var fails closed at New with ErrMissingAPIKey naming the var. httptest integration (happy / timeout / 5xx) green. D-025 N≥100 concurrent stress green on mixed config. No tool-call API symbol leak (extends Phase 33 static guard). Tests. Unit (custom-provider construction + validation; NetworkDefaults fallthrough + per-provider override; native-and-custom coexist). Integration (httptest.Server mimicking OpenAI-compatible /v1/chat/completions: happy + 5xx + timeout). Concurrency (D-025 mixed config). Smoke scripts/smoke/phase-33a.sh. Deps. 33. Risks. Operator-facing BaseURL gotcha — bifrost's OpenAI provider appends /v1/chat/completions; operators set the host root, not the full /v1 path. Documented in yaml + the wire-test asserts the correct path. Sub-second timeouts get rounded down to 0 by bifrost's int(seconds) cast — practical minimum is 1s today; widening waits for a NetworkConfig API rev. Corrections (Phase 34) match by model-name prefix; custom-provider model names are typically unprefixed — operators declare ModelProfiles[<model>].Corrections explicitly to get quirks applied. Settled decisions: D-042. See also. docs/plans/phase-33a-custom-providers.md.
34 — Provider correction layer + SchemaSanitizer (one mode, baked in) (RFC §6.5)
Goal. A thin correction layer — bifrost already normalizes provider-specific transport quirks across its 23 first-class providers (brief 08), so this phase is NOT a "native vs. LiteLLM" dual-architecture; it is a narrow SchemaSanitizer + message-shape normalizer that lives between the runtime and the LLMClient (NOT inside the client), handling only what bifrost does not. Scope: response_format shape adjustments, reasoning-effort routing for thinking-class models (o1, o3, deepseek-reasoner), schema normalization (additionalProperties: false, strict: true modes), message reordering (NIM), usage backfill (proxies that report 0/0). No use_native toggle — there is one mode, baked in. Scope is structured-output and message-shape correctness only — never tool-call APIs (those don't exist on this layer). Acceptance. Each documented quirk has a passing normalizer test; switching providers does not require a configuration toggle; no tool-call API references in this package; the layer is demonstrably thin — quirks bifrost already handles are NOT re-implemented here. Tests. One unit test per quirk; assert no Tool* symbol leaks. Deps. 33. Briefs. brief 07 (code-level tool calling — runtime owns dispatch, so this layer never touches tool-call APIs), brief 08 (bifrost validation — what the LLM substrate already normalizes, so this phase doesn't).
35 — Structured output strategies + downgrade chain (RFC §6.5)
Goal. OutputMode = Native | Tools | Prompted. Per-provider ModelProfile selects mode. Downgrade chain: json_schema → json_object → text on invalid_json_schema errors. llm.mode_downgraded events. Acceptance. Forced-failure on each step of the chain results in observable downgrade and continued completion. Tests. Integration per provider. Deps. 33, 34.
36 — Retry with feedback (RFC §6.5)
Goal. Validation/parse failures feed back into the planner via LLMClient retry; bounded by MaxRetries; observable. Acceptance. A planner-tagged invalid arg triggers a single LLM retry with corrective sub-prompt; retry count respects bound. Tests. Integration with mock + bounded-loop assertion. Deps. 35.
36a — Cost accumulator + per-identity ceilings (RFC §6.15)
Goal. Subscribe to llm.cost.recorded events; aggregate Usage.Cost.TotalCost by (tenant, user, session) and by model in StateStore-backed accumulators; gate the next call when ceiling exceeded; emit governance.budget_exceeded; fail loudly with ErrBudgetExceeded. Establish the governance.Subsystem interface with PreCall/PostCall hooks wrapping the LLMClient driver. Acceptance. Three-driver conformance (in-mem / SQLite / Postgres) green for accumulators. Ceilings settable via config (Protocol-driven setters land post-V1 phase 91). Ceiling exceedance emits governance.budget_exceeded with the identity triple; runtime can route to the unified pause/resume primitive when configured. Cross-session isolation test passes. Tests. Unit (accumulator math). Integration per driver. Concurrency (N concurrent calls do not overshoot ceiling — atomic / lock-free path documented). Cross-session isolation. Failure-mode (StateStore read failure → fail-loud, no silent permit). Smoke additions. Healthz still 200; governance.budget_exceeded observable when synthesized; config knob round-trip. Coverage target. internal/governance: 85%. Deps. 11 (event bus skeleton — llm.cost.recorded shape lives there). 15 (StateStore SQLite driver — accumulator persistence). 33 (bifrost integration — cost reporting passthrough is the source). Briefs. brief 03 §6 (LLM client surface, cost reporting), brief 06 §3 (event bus + identity-scoped subscriptions). Risks. Concurrent-call ceiling overshoot if accumulator math isn't atomic — the design must be lock-free (atomic add + compare-and-swap) and the test must exercise high-concurrency. RFC anchor. §6.15.
36b — Per-identity rate limits + per-call MaxTokens (RFC §6.15)
Goal. Token-bucket rate limiter per (identity, model) with bucket-state persisted in StateStore so it survives runtime restart. Per-call MaxTokens enforced from the identity's tier in PreCall. Emits governance.rate_limited and governance.maxtokens_exceeded events; fails loudly with ErrRateLimited and ErrMaxTokensExceeded. Acceptance. Bucket fills/drains per config; bucket state survives runtime restart; MaxTokens tier resolved from identity in PreCall and applied to the request before it leaves Harbor; events emitted with identity triple; CLI smoke configures a tiny bucket and asserts the limit kicks in. Tests. Unit (token-bucket math under fast and slow refill rates). Integration per driver. High-concurrency (N concurrent calls — bucket never goes negative; never permits more than capacity). Restart-survival. Smoke additions. governance.rate_limited observable when bucket exhausted; bucket-fill timestamps consistent with config. Coverage target. internal/governance: 85%. Deps. 36a (Subsystem interface + identity scaffolding). Briefs. brief 03 §6 (LLM client surface), brief 06 (event bus). Risks. Token-bucket race conditions under concurrent call paths — must be lock-free. RFC anchor. §6.15.
37 — Skill store + LocalDB driver + FTS5 ladder (RFC §6.7)
Goal. SQLite-backed skill store; FTS5 → regex → exact ranking ladder; CI tests both FTS-on and FTS-off builds. Schema with Origin / OriginRef / Scope / ContentHash. Acceptance. Same scoring constants documented in brief 04 §4.4 produce stable rankings; existing_origin != "pack" short-circuit refuses overwrites. Tests. Unit (golden ranking) + FTS-off-fallback test. Deps. 01, 07, 15.
38 — Skill planner tools (search/get/list) (RFC §6.7)
Goal. skill_search, skill_get, skill_list registered through phase 26 catalog. Capability filter (RequiredTools/Namespaces/Tags ⊆ allowed). PII + tool-name redaction at injection. Tiered budgeter (full → drop optional → cap steps to 3). Acceptance. Filter excludes mismatched skills; redactor strips disallowed names; budgeter fits within max_tokens. Tests. Unit + integration. Deps. 26, 37.
39 — Virtual directory subsystem (RFC §6.7)
Goal. Directory(cfg) API + pinned_then_recent / pinned_then_top selectors; identity-scoped; capability-filtered; redacted before injection. Acceptance. Default max_entries=30, range 1–200; pinned skills always included; selection respects identity. Tests. Unit + property. Deps. 37.
40 — Skills.md importer (RFC §6.7)
Goal. Spec-compliant CommonMark parser; YAML frontmatter; section normalization (## Steps, ## Preconditions, ## Failure modes); attachments resolved as ArtifactRef (option (b) — RFC settled). Round-trip byte-stable. Acceptance. Golden corpus of N spec-compliant Skills.md files imports without source edits and re-exports byte-stable; missing trigger/empty steps fail loudly. Tests. Golden corpus + negative tests. Deps. 37. Risks. This is the predecessor's gap-closer. The byte-stable round-trip is a tested invariant.
41 — In-runtime skill generator with persistence (RFC §6.7)
Goal. skill_propose(persist=true) validates draft, stamps Origin=Generated, OriginRef = "gen:{session_id}:{run_id}", scopes by operator-provided Scope (default project), upserts via store. Conflict policy: refuse to overwrite Origin=PackImport; for Generated→Generated, content-hash gates last-write-wins. Audit is mandatory.Acceptance. Generator persists; subsequent search discovers; audit event emitted on every persist. Tests. Integration end-to-end + isolation (cross-session no-leak unless promoted). Deps. 37, 38, 03.
42 — Planner iface + Decision sum + RunContext (RFC §6.2, §3.2)
Goal. Define Planner.Next(ctx, RunContext) (Decision, error); Decision sum (CallTool, CallParallel, SpawnTask, AwaitTask, RequestPause, Finish); RunContext is the only surface planner sees. Acceptance. Stub planner returning Finish runs end-to-end; planner package imports no Runtime internals. Tests. Conformance harness skeleton; import-graph lint. Deps. 09, 13, 26, 32. Wake-on-resolution contract (D-032). When the planner emits a SpawnTask (or group SpawnTask via the patched surface from Phase 21) WITHOUT retain-turn, it MUST consume tasks.WatchGroup(sessionID, groupID) (<-chan GroupCompletion, func(), error) from internal/tasks to learn when the group resolves. The three wake modes (push, poll, hybrid) are documented at the internal/tasks package godoc; this phase ships the planner-side interface contract that each concrete (45, 48, future) maps onto exactly one mode. The TaskRegistry stays neutral — no WakeMode field, no Supports* capability protocol.
43 — Trajectory + serialise contract (RFC §6.2, §3.4)
Goal. Trajectory.Serialize() (bytes, error) returns (nil, ErrUnserializable{Field:...}) on any non-JSON-encodable entry. No silent-drop path. ToolContext split: serialisable half + handle registry (process-local at V1 — see RFC §6.3). Acceptance. Round-trip is byte-stable; non-serialisable handle returns ErrUnserializable; resume with missing handle returns ErrToolContextLost. Tests. Round-trip + negative cases (per RFC contract). Deps. 42, 07. Risks. This phase closes the predecessor's silent-context-loss bug. The fail-loudly tests are the gate.
44 — Schema repair pipeline (RFC §6.2)
Goal. Salvage → schema repair → graceful failure → multi-action salvage, in internal/planner/repair/. Configurable per concrete (arg_fill_enabled, repair_attempts, max_consecutive_arg_failures). Acceptance. Each step passes its targeted unit test; graceful failure forces Finish{Reason: NoPath, Followup: true} after N consecutive arg failures. Tests. Unit per step + integration with malformed mock LLM responses. Deps. 42, 32.
45 — Reference ReAct planner (minimum viable) (RFC §6.2)
Goal. LLM call loop, JSON-only action format, tool selection, completion detection, single tool call per step. Functional options for the small policy-shaped knobs. Acceptance. 3-step reasoning task succeeds against a mock LLM; planner package has no Runtime imports; planner is concurrent-safe across runs. Tests. Conformance pack (skeleton) + scenario. Deps. 42, 43, 44, 32. Wake mode. ReAct ships the push wake mode (D-032): a non-retain-turn SpawnTask returns control to the runtime; the runtime registers the planner against tasks.WatchGroup; on GroupCompletion the runtime re-invokes Planner.Next with the resolved MemberOutcome slice surfaced through RunContext. The LLM sees the next planner step only after the group resolves — no LLM call burns while children are in flight.
46 — Trajectory compression / summariser (RFC §6.2)
Goal. Configurable summariser invoked by runtime when token_budget exceeded. Produces TrajectorySummary{Goals, Facts, Pending, LastOutputDigest, Note}. Compression is a runtime concern; planner sees only the compacted view. Acceptance. Over-budget trajectory triggers summarisation; summary replaces raw step history in subsequent prompt builds. Tests. Integration with mock summariser. Deps. 43, 32.
47 — Parallel-call execution + ReAct CallParallel/SpawnTask/AwaitTask emission (RFC §6.2)
Goal. CallParallel{Branches, Join} executes branches concurrently; atomic setup validation (any branch's invalid args fails the whole call before execution); parallel-pause atomicity (no branch starts side-effecting tools, or all reach checkpointed observation before pause commits); system cap absolute_max_parallel=50. PLUS the §13 primitive-with-consumer bundle: ReAct upgrades to EMIT CallParallel (delete the Phase 45 D-051 single-tool-call-per-step stop-gap) AND emit SpawnTask / AwaitTask via the two new reserved tool names (_spawn_task, _await_task). Phase 47 closes three primitive-with-consumer gaps in one wave (CallParallel runtime + SpawnTask emitter + AwaitTask emitter). D-056. Acceptance. Atomicity contract holds under fault injection; ordering preserved per-branch; deterministic merge keys (branch index + tool name); 51-branch input fails with ErrParallelCapExceeded; JoinFirstSuccess cancels remainder; JoinN waits for N successes; ReAct emits _spawn_task → runtime spawns real task → group resolves → planner re-enters via RunContext.Trajectory.Background → planner emits Finish end-to-end. Tests. Concurrency + property (atomicity invariant) + spawn → wake → re-entry integration test against real TaskRegistry + EventBus + ArtifactStore drivers. Deps. 45, 14, 42, 20, 21. Wake-mode interaction. ReAct's WakePush declaration (Phase 45 / D-032) is wired end-to-end: a non-retain-turn SpawnTask returns control to the runtime; the runtime registers against tasks.WatchGroup; on GroupCompletion the runtime re-invokes Planner.Next with the resolved MemberOutcome slice surfaced through RunContext.Trajectory.Background. The integration test asserts the round-trip. Parallel-pause atomicity contract surface. Phase 47 ships the stub (ErrParallelPauseUnsupported) — the executor fails loud on a mid-execution pause request. Phase 50 (unified pause/resume primitive) upgrades the path to a checkpointed atomic pause.
48 — Deterministic planner (proves the iface) (RFC §6.2, §11 Q-6)
Goal. A second concrete that exercises a non-LLM Decision shape. Executes a programmatic decision tree without an LLM call. Acceptance. Deterministic planner passes the conformance pack; the same Runtime executes both deterministic and React without changes. Tests. Conformance pack. Deps. 42. Wake mode. Deterministic ships the poll wake mode (D-032): each Planner.Next invocation reads its outstanding group's GroupCompletion via a non-blocking receive on the channel returned from tasks.WatchGroup. If the channel hasn't fired, the planner emits AwaitTask and the runtime sleeps the step until the next deterministic boundary; if it has fired, the planner reads the resolved MemberOutcome slice and proceeds. No LLM, no eager wake — a clean deterministic shape that proves the registry's WatchGroup surface is mode-neutral.
49 — Planner conformance pack (RFC §6.2)
Goal. A shared test pack any Planner implementation must pass: top-20 prompts produce valid Decision against canned tool catalog + LLM mock; respects budget; never panics on malformed LLM output. Acceptance. Pack runs against React and Deterministic; go test ./internal/planner/conformance/... exits 0. Tests. The pack itself. Deps. 42, 45, 48. Wake-mode round-trip (D-032). The conformance pack MUST include a SpawnTask → group completes → planner re-enters → reads MemberOutcome round-trip exercising whichever wake mode the concrete declares (push / poll / hybrid). React validates push; Deterministic validates poll; future hybrid concretes validate hybrid. Failure to wire tasks.WatchGroup is the test's failure mode, not silent deadlock.
50 — Pause/Resume Coordinator + handle registry (RFC §6.3, §3.3)
Goal. pauseresume.Coordinator with Request/Resume/Status. Token is opaque (runtime-owned encoding). Handle registry is process-local at V1 (documented constraint; distributed handle directory deferred — RFC §12). Acceptance. Round-trip pause→serialise→load→resume succeeds; pauses survive Runtime restart only when StateStore-backed checkpoint is configured. Tests. Unit + integration; durability (in-mem / SQLite / Postgres). Deps. 07, 09, 13.
51 — Pause-state serialise contract (fail-loud) (RFC §6.3, §3.4)
Goal. Pause record serialises with format_version: 1 JSON. Non-serialisable handles → ErrUnserializable (no silent nil); missing-on-resume handles → ErrToolContextLost. Acceptance. Negative tests are the gate. CI fails on any silent-drop regression. Tests. Conformance with phase 43 Trajectory.Serialize. Deps. 50, 43. Shipped. internal/runtime/pauseresume/pauserecord.go ships SerializeRecord / DeserializeRecord + the FormatVersion constant. The Phase 43 reflective walker is exported as trajectory.ValidateEncodable and shared (not forked) by the pause-record contract — SerializeRecord walks it, surfacing trajectory.ErrUnserializable rooted at PauseRecord.payload.<key>; DeserializeRecord enforces format_version: 1 (ErrUnsupportedFormatVersion on any other value). Coordinator.Request's Payload-encodability check is unconditional (fails loud with or without a checkpoint store). Negative tests (pauserecord_test.go, pauserecord_contract_test.go, test/integration/phase51_pause_serialise_test.go) are the gate. Coverage 94.0% (target 90%). See D-069.
52 — Steering inbox + control taxonomy (RFC §6.3)
Goal. Per-run inbox owned by Runtime. Nine control event types: INJECT_CONTEXT, REDIRECT, CANCEL, PRIORITIZE, PAUSE, RESUME, APPROVE, REJECT, USER_MESSAGE. Validation/sanitisation at Protocol edge: depth ≤ 6, ≤ 64 keys, ≤ 50 list items, ≤ 4096 chars/string, ≤ 16 KiB total. Per-event scopes per RFC §6.3. Acceptance. Oversize/over-deep payloads rejected at edge; per-event scope mismatch returns 403 + audit. Tests. Unit (validation) + integration (auth scope per event). Deps. 50, 05.
53 — Steering wiring (9 control events) (RFC §6.3)
Goal. Drain-between-steps; planner sees only RunContext.Control. CANCEL hard/soft propagation; PAUSE blocks at next boundary; RESUME unblocks; INJECT_CONTEXT/REDIRECT/USER_MESSAGE visible on next planner step; APPROVE/REJECT advance pause; PRIORITIZE updates task; control-history capped per session. Acceptance. Each event type has a passing integration test; no event applied mid-tool-call. Tests. Integration matrix; concurrency mid-step. Deps. 52, 13. Shipped. internal/runtime/steering/runloop.go ships RunLoop — the per-run planner-step loop, the §13 first consumer of BOTH the Phase 50 pauseresume.Coordinator AND the Phase 52 steering inbox/taxonomy. RunLoop.Run drains the per-run Inbox once per step boundary (apply.go applies the nine control-event side effects; the planner sees only RunContext.Control), routes a planner's RequestPause through Coordinator.Request and blocks via the new Inbox.WaitForEvent (a coalesced 1-buffered notify channel — no busy-spin) until a RESUME/APPROVE arrives, and caps per-session applied-control history (history.go, MaxControlHistory newest-wins ring). Deviation (§4.3): Phase 53 builds the per-run planner loop rather than retrofitting an existing one — internal/runtime/engine is a graph executor, not a planner-step loop; the only Planner.Next driver before Phase 53 was the Phase 49 conformance harness. The loop lives in internal/runtime/steering (its master-plan subsystem); no new top-level directory, no RFC change (RFC §6.3 §4: "the runtime implements this loop"). CANCEL is soft-by-default with an optional WithHardCancelHook seam (no hard import of the engine). The nine-event integration matrix + the §13 pause-Coordinator round-trip + the drain-between-steps invariant test + the concurrency-mid-step test live in test/integration/phase53_steering_wiring_test.go. Coverage 92.4% (target 85%). See D-071.
53a — Agent Registry (registration identity + IDs) (RFC §6.16, §7)
Goal. An in-process, per-runtime-instance registry.AgentRegistry subsystem, StateStore-backed (in-mem / SQLite / Postgres, §4.4 seam). Owns the registration identity of agents and the three-ID model (D-059): a stable agent_id (minted once at first registration, persisted, rehydrated on restart), an ephemeral incarnation (bumps every process start), and a content-derived version_hash (deterministic hash over prompt set, tool set + schemas, planner config, model policy — bumps only when configuration changes). agent_id is a registration identity, not an isolation principal — the isolation tuple stays (tenant, user, session, run) (D-059, CLAUDE.md §6). Handles both creation cases (D-060): locally-hosted agents (the runtime mints a local agent_id) and connect-to-remote agents (the local agent_id is a handle; the canonical identity is the remote A2A AgentCard, owned by the remote operator). Emits agent.* events (agent.registered, agent.restarted, agent.health, agent.drained, agent.deregistered) so the Console Agents page renders runtime state, never Console-local state (D-061). Fleet control (pause / drain / restart / force-stop) is a distinct, more-elevated privilege tier than fleet observation (D-066) — every control command is audit-redacted and emitted. Acceptance. agent_id is stable across restart when a durable StateStore driver is configured (rehydration test); the in-mem driver is dev-only and documented as non-persistent. incarnation bumps on every restart; version_hash bumps iff configuration content changed and is stable otherwise (restart ≠ recreate — restart keeps the record, recreate mints a fresh agent_id). Remote-agent registration stores a handle + AgentCard reference; the handle is runtime-instance-local and never assumed globally unique. agent.* events carry the registration agent_id. Cross-tenant / cross-session isolation conformance — one identity's registry view never bleeds into another. Fleet-control commands require the elevated scope claim and emit audit events; fleet-observation does not. Concurrent-reuse test: N≥100 concurrent registrations / lookups / control commands against one shared AgentRegistry under -race (no data races, no context bleed, no goroutine leaks). Tests. Unit (three-ID model, version_hash determinism, restart-vs-recreate); integration (StateStore-backed rehydration across all three drivers, real events.EventBus on the seam, identity propagation, ≥1 failure mode — missing identity fails closed); conformance (cross-tenant/session isolation); concurrency (D-025 N≥100 reuse stress). Deps. 01, 05, 07, 08. Briefs. brief 09 (agent-as-actor / agent-bound OAuth — the registration agent_id is what Phase 30 keys agent-bound tokens by), brief 11 (operator Console mockup — the Agents page is a runtime lens over this subsystem; console-agents-page.png). Why here. Slotted into the 50–53 band (steering / pause-resume wave) because the earlier runtime-subsystem bands are already shipped; its real dependencies (01, 05, 07, 08) all landed long ago, so it can be implemented any time after them, but it must land before the Protocol surface (54+) and the Console-attaching wave (72–75) that consume it. Settled decisions: D-059, D-060, D-061, D-062, D-066.
54 — Protocol task control surface (RFC §5.2, §6.3)
Goal. Protocol endpoints: start, cancel, pause, resume, redirect, inject_context, approve, reject, prioritize, user_message. Acceptance. All nine endpoints + start round-trip via SSE+REST (phase 60); identity scope enforced. Tests. Smoke phase-54.sh exercises each method. Deps. 50, 53, 20.
55 — OTel traces + propagation (RFC §6.14)
Goal. Tracer wrapper; spans derived from events. Propagation: traceparent HTTP southbound; _meta.traceparent per request for stdio MCP; HARBOR_TRACEPARENT env on stdio spawn. Acceptance. Trace continuity across HTTP and stdio; spans align with run/step boundaries. Tests. Integration with Jaeger/OTLP collector. Deps. 04, 05.
56 — Metrics + OTLP + Prometheus (RFC §6.14, §11 Q-5 settled)
Goal. MetricsRegistry derives from Event.Type / NodeName / Producer only. OTLP exporter default; built-in Prometheus /metrics endpoint at V1. Acceptance. Cardinality-lint test fails CI on RunID/TraceID labels; both exporters emit core counters. Tests. Integration; static cardinality lint. Deps. 55, 05. Deviations (§4.3, see D-076). (1) NodeName / Producer are realised as the reserved Event.Extra["node"] / Event.Extra["producer"] keys — not new events.Event struct fields — because the Phase 05 Event doc already reserves Extra for "Phase 56's bounded low-cardinality metric labels"; no events.Event shape change. (2) The static cardinality-lint flags attribute.* calls only when nested inside metric.WithAttributes(...) — a span's attribute.String("run_id", …) inside trace.WithAttributes is legitimate (D-073) and is left alone; the rule is metric-labels-only. (3) The /metrics endpoint ships as the standalone telemetry.PrometheusHandler http.Handler constructor; the live Runtime server that mounts it at /metrics is the Phase 60+ bootstrap (there is no internal/server/ yet). (4) The master-plan "§11 Q-5" citation: RFC §11's Q-5 is the skill-versioning question; the metrics-exporter question is brief 06 Q-2, resolved by RFC §6.14 — "§11 Q-5" is read as "the §11-tracked metrics-exporter question is settled".
57 — Durable event log driver (RFC §6.13)
Goal. Persists Event records keyed by (SessionID, Sequence) via StateStore. Replay-from-cursor exact across restarts. Acceptance. Late subscriber after Runtime restart sees no gaps; ring buffer mode auto-degrades to "best-effort" with warning. Tests. Integration across all three StateStore drivers. Deps. 05, 07, 15, 16. Downstream (load-bearing). This is not just the Console event-stream backing — it is the hard dependency for the post-V1 Evaluations / agent version-control program (D-064). Evaluations is built on fully replayable sessions ("create eval from session", "mark as test case"); a session is only replayable if its event log is durable and gap-free. Lossy events (ring-buffer-only) in V1 would foreclose Evaluations entirely, since you cannot retrofit completeness into already-shipped sessions. Treat this phase's durability guarantees as binding for that reason, not optional.
58 — Protocol types/methods/errors single source (RFC §5, §8)
Goal. internal/protocol/types/, internal/protocol/methods/, internal/protocol/errors/ are the only definitions. Lint check forbids hardcoded method strings outside methods/. Acceptance. Build succeeds with the lint check active; new methods land only in methods/. Tests. Lint test (CI). Deps. 01. Status. Shipped — D-075. Phase 54 (D-072 §1) already laid the methods/errors/types single-source layout, so Phase 58 is the enforcement: internal/protocol/singlesource ships ScanProtocolTree, a go/parser AST-walking checker, and TestSingleSource_ProtocolTreeIsClean is the build-gating go test (the same AST-lint pattern as internal/planner/conformance/importgraph_test.go — zero external-tool dependency, no golangci-lint plugin). The checker lints internal/protocol/ only (method-name strings are legitimate unrelated vocabulary in other subsystems — a repo-wide scan would be all false positives) and lints _test.go files too. It surfaced and consolidated three pre-existing hardcoded method literals (control.go's dispatchStart, two _test.go fixtures) — now re-derived from the methods constants. Citation note (§4.3): the row's "§8" is CLAUDE.md §8 ("Harbor Protocol rules") — RFC-001 has no §8; RFC §5 is the design anchor, CLAUDE.md §8 is the rule the checker enforces. Coverage on internal/protocol/singlesource 94.5% (target 90%).
59 — Protocol versioning + deprecation policy (RFC §5.3)
Goal. ProtocolVersion constant; deprecation window discipline; capability negotiation. Acceptance. Version constant returned on harbor version (after phase 63); deprecation note format settled. Tests. Unit. Deps. 58.
60 — Protocol wire transport (SSE + REST) (RFC §5.4, §11 Q-1)
Goal. SSE stream for events; REST/JSON for control surface. Identity-scope enforcement at edge. Q-1 RESOLVED 2026-05-14 — SSE + REST (owner sign-off given; RFC §5.4 + §11 Q-1 updated). Phase 60 is now a normal implementation phase, not a decision gate. WebSocket remains an additive alternate transport for a later phase via the internal/protocol/transports/ seam — not a fork of this phase. Acceptance. Console can stream events and submit control over SSE+REST; smoke covers both directions. Tests. Integration; full duplex stress. Deps. 58, 05. Risks. Q-1 resolved — the load-bearing decision is settled. Remaining risk is ordinary implementation risk (SSE keepalive/reconnect discipline, identity-scope enforcement at the edge).
61 — Protocol auth + identity-scope enforcement (RFC §5.5, §4)
Goal. JWT (asymmetric only); (tenant, user, session) in claims; admin/console:fleet scopes for elevated subscriptions. Acceptance. Missing claim rejected with audit; HS*/none algorithms rejected at parser level. Tests. Unit + integration; security suite. Deps. 58, 60, 01. Status. Shipped — D-079. internal/protocol/auth ships the transport-agnostic Validator (asymmetric-algorithm allowlist enforced via jwt.WithValidMethods at parse time — HS* and alg:none are structurally impossible, the keyfunc is belt-and-braces with a non-asymmetric-key shape rejection); Middleware is the net/http decorator (Authorization: Bearer <jwt> → identity in r.Context() via identity.With + scopes via WithScopes); the eight typed sentinels (ErrTokenMissing / ErrTokenMalformed / ErrAlgNotAllowed / ErrSignatureInvalid / ErrTokenExpired / ErrTokenNotYetValid / ErrUnknownKey / ErrIdentityClaimMissing, plus ErrAudienceMismatch / ErrIssuerMismatch) cover every rejection. The new CodeAuthRejected Protocol error code lands in internal/protocol/errors/ (single-source preserved); transports.NewMux gains a WithValidator option that wraps both Phase 60 handlers in the middleware (additive — the Phase 60 trust-based posture is preserved verbatim when no validator is supplied). The control handler's assertBodyMatchesAuthedIdentity is the defence-in-depth check (a body claiming a different (tenant, user, session) than the JWT is rejected 401 before Dispatch runs); the SSE handler's ?admin=1 query param is gated on the verified ScopeAdmin / ScopeConsoleFleet scope (rejected 403 without). The golang-jwt/jwt/v5 library was promoted from indirect to direct (no new module — already pulled by aws-sdk-go-v2/credentials). test/integration/phase61_auth_test.go exercises every rejection mode end-to-end against a real ES256-keypair-signed bearer + the real ControlSurface + the real events.EventBus behind httptest.Server; the security suite covers algorithm-confusion, alg:none, scope-escalation, kid-substitution, expired-token, and tampered-body attacks; D-025 concurrent-reuse pinned at N=128 with goroutine-baseline assertion. Coverage: auth 90.1%, errors 100%, transports 94.3%, control 89.5%, stream 86.6% (all ≥ targets).
62 — Protocol conformance suite (RFC §5)
Goal. A single conformance suite the protocol surface passes; covers every method, every error code, every event filter. Acceptance. go test ./internal/protocol/conformance/... exits 0; smoke runs the same suite against harbor dev. Tests. The suite itself. Deps. 58, 60, 61. Status note. Shipped at 81.2% statement coverage (master-plan target 85%) per the documented §4.3 deviation in docs/plans/phase-62-protocol-conformance.md — matches the precedent set by Phase 49's internal/planner/conformance (70.8% under the same target). Conformance-suite coverage is dominated by t.Fatalf rollback branches that fire only on assertion failure; the assertion density (10 methods × 2 transports; 8 error codes × ≥1 failure path; every event-filter shape; the version handshake; the auth pipeline; an N=100 D-025 stress) is the load-bearing surface. The suite ships paired with test/integration/wave10_test.go — the Wave 10 wave-end E2E that consumes the same suite from a different consumer profile against the assembled real-driver Wave 10 surface.
63 — Harbor CLI skeleton (RFC §8)
Goal. harbor cobra binary with subcommands dev, scaffold, validate, version, inspect-events, inspect-runs, inspect-topology. All structured-error / --quiet / --json output mode. Acceptance. harbor --help matches a golden file; harbor version returns version + build hash + Protocol version. Tests. CLI golden tests. Deps. 60.
64 — harbor dev v1 (RFC §8)
Goal. Boot embedded Runtime + open Protocol on 127.0.0.1:<port>. No hot-reload yet. Identity injection via dev-token. Acceptance. harbor dev returns /healthz 200; events stream cleanly to a test Console subscriber. Smoke. phase-64.sh boots dev; assert_status 200 /healthz. Tests. Integration (boot, smoke, teardown). Deps. 63, 60.
Phase 64 — harbor dev v1 (pre-plan scoping note — BINDING when the plan is authored)
Phase 64 is the moment cmd/harbor/main.go stops being a driver-registration stub and starts instantiating an LLM-backed runtime for the first time. Before this phase, no production code path resolves the LLM client — every "test stub as default" call (the mock LLM driver, EchoSummarizer, staticSummariser) is dormant. Phase 64 is the moment they go live.
The §13 entry "Test stubs as production defaults on operator-facing seams" is pre-settled for this phase. The plan author MUST satisfy the constraints below — they are not re-litigable inside the phase plan:
Default LLM driver is
bifrost, notmock. Phase 64 flipsllm.DefaultDriverfrom"mock"to"bifrost"(internal/llm/registry.go:172) and updatesexamples/*.yamlsodriver: bifrostis the demonstrated path. Themockdriver subpackage (internal/llm/mock/) moves under aharbor_testfixturesbuild tag (or to atestfixtures/subdirectory) so it is unreachable fromcmd/harbor/main.go's blank-import block in a normal build. Production tests that need a deterministic LLM consume it via the build-tagged path or via*_test.go-local fixtures.Boot fails loudly when no LLM provider is configured. Missing API key, missing
bifrostprovider section, or an emptyllm:block →harbor devprints a one-line error that names the missing config key (e.g.config.llm.providers[0].api_key: required when driver=bifrost) and points toexamples/dev.yaml, then exits non-zero. Silent fallback to the mock is forbidden — this is the §13 "fail loudly at boot" consequence.LLM-backed defaults for
memory.Summarizerandplanner.Summariser. Whenmemory.strategy: rolling_summaryis configured and no customSummarizeris injected, Phase 64 (or a same-wave sibling phase) provides a default LLM-backedSummarizerthat composes anllm.LLMClientwith a versioned compaction prompt template. Same shape forplanner.Summariserconsumed byCompressionRunner.EchoSummarizerandstaticSummarisermove totestfixturesand are no longer reachable from the production wiring path. If the author chooses to split this into a sibling phase (e.g. Phase 64a), that phase MUST ship in the same wave as Phase 64 — the §13 primitive-with-consumer rule applies recursively: aharbor devthat defaults torolling_summarybut has no Summarizer wired is the same failure mode one layer down.Dev-only escape hatch is explicit and banner'd. A
--mockflag onharbor dev(orHARBOR_DEV_ALLOW_MOCK=1env var — Phase 64's plan picks ONE and pins the choice in aD-NNNdecisions entry) is the ONLY path to the mock LLM at runtime. When the escape hatch fires, every boot prints a stderr banner:[DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION]. The README's quickstart MAY use this path but must label it as a dev shortcut, not the production install —examples/dev.yamlshows the production-shaped config and the README's "5-minute quickstart" demonstrates the escape-hatch path with a one-line note.scripts/smoke/phase-64.shexercises the LLM seam, not just/healthz. A smoke that only checksGET /healthzis insufficient — the phase exists to wire the LLM, so the smoke MUST exercise the LLM. The script bootsharbor devagainst a recorded bifrost fixture (no live network — usehttptest.Serveror a recorded-cassette pattern), submits one task over the Phase 60 REST handler, and asserts the SSE stream emits a planner Decision derived from a realLLMClient.Completecall. A second smoke assertion: boot with no provider configured and assert the non-zero exit with the expected error message.The §18 mirror invariant applies in spirit. Phase 64 introduces a binary that real users will run. The README's
## Statustable,cmd/harbor's godoc, and any "Quick start" prose are updated in the same PR — no aspirational claims like "harbor dev boots the Console" that land before the Console-boot phases (72–75) ship. If §3's "Harbor CLI" bullet describes a command that doesn't yet exist, the bullet says so in future tense with a phase reference.Tool catalog wires Phase 30 (OAuth, D-083) + Phase 31 (approval gates, D-086) primitives from operator config (issue #104). Both phases shipped runtime-side primitives whose only consumers today are tests —
internal/tools/auth.OAuthProviderandinternal/tools/approval.ApprovalGatereach the runtime, but the tool catalog (internal/tools/catalog/) doesn't know about either. Phase 64 (or a same-wave sibling per the §13 primitive-with-consumer rule) extends the catalog so a tool registration can declare anApprovalPolicyand/or an OAuthBindingScopevia operator config (tools.<name>.approval: <policy>,tools.<name>.oauth: <provider>or equivalent shape). The catalog auto-wraps the registeredToolwith anApprovalGateand/or an OAuth-aware invocation wrapper. Operators get HITL approval AND tool-side OAuth out of the box without writing Go wiring code. The Wave 11 wave-end E2E exercises APPROVE/REJECT via the realtransports/controlHTTP handler (closing the Protocol-wire round-trip half of issue #104); the catalog-wiring half lands in Phase 64. ✅ shipped in Phase 64a / D-090.
Mandatory reading before authoring this plan (per §16): RFC §5 (Protocol surface), RFC §6.5 (LLM client), RFC §6.6 (Memory + Summarizer), docs/research/brief-02-trajectory-compression.md, docs/research/brief-04-memory-strategies.md (or whichever brief indexes summariser design — docs/research/INDEX.md resolves), docs/decisions.md (D-026 LLM-edge safety, D-035 rolling summary, D-044 latent governance, D-055 trajectory compression rendering rule), the shipped internal/llm/registry.go (the default-driver flip site) and internal/memory/strategy/ (the Summarizer wiring site).
Pre-assigned decisions slot: Phase 64's plan claims a D-NNN number when dispatched and records: (a) the mock → bifrost default flip; (b) the chosen escape-hatch mechanism (--mock flag vs env var); (c) the LLM-backed default Summarizer location (in-package vs new internal/llm/summarizer/ subpackage); (d) any deliberate carve-out from the §13 entry above (requires an RFC PR — bake the carve-out into the RFC, then reference it here).
First production consumer of Phase 55's W3C carriers. Phase 64 is the first production consumer of telemetry.InjectHTTP / telemetry.ExtractHTTP (the HTTP carrier helpers Phase 55 shipped as standalone functions — see issue #94). The plan threads traceparent through tools/drivers/http on outbound calls and extracts on inbound — internal/protocol/transports/control + tools/drivers/mcp follow the same shape. This is the §13 primitive-with-consumer obligation closed for the Phase 55 carriers; before Phase 64 they are dormant helpers exercised only by unit tests.
Departures from this note require an RFC PR. This note is binding, not advisory — it encodes a Wave 10 audit finding (the §13 amendment above) that future plan-authors do not have visibility into. Treat it as the equivalent weight of an RFC section.
65 — harbor dev hot-reload (RFC §8)
Goal. fsnotify watcher; graceful-drain restart on Go-source change; configurable retain-in-flight policy. Acceptance. File change triggers drain; in-flight runs cancel cleanly; new code picked up. Tests. Integration with file mutation. Deps. 64.
§4.3 shape decision (D-099). In-process bootDevStack rebuild, NOT binary re-exec. Re-exec was considered and rejected for V1: it requires an out-of-process supervisor (the binary cannot re-exec itself without losing live http.Server connections), it costs a Go build per cycle (~5s on a warm machine — the developer feedback loop is the load-bearing UX here), and an operator iterating on YAML config does NOT need a binary rebuild. The in-process rebuild satisfies the "new code picked up" acceptance for every config / scaffold change; operators changing Go source rebuild + re-launch the binary manually (the same cycle they'd run today without hot-reload). A future opt-in policy: rebuild can layer binary-rebuild semantics on without changing the supervisor's shape.
66 — harbor dev draft-save scaffolding (RFC §8)
Goal. Project-local .harbor/drafts/ scratchpad endpoint; iterate on agent without committing scaffold; "save" promotes to harbor scaffold-emitted layout. Acceptance. Draft round-trip: edit → preview run → save → resulting scaffold passes harbor validate. Tests. Integration + golden. Deps. 64. Status. Shipped — D-100. internal/devdraft package ships the filesystem-backed Store + the http.Handler mounted at /v1/dev/drafts/ on the harbor dev mux behind the Phase 61 JWT validator. On-disk layout is <root>/<tenant>/<user>/<session>/<draft_id>/ so concurrent operators sharing the same .harbor/drafts/ root cannot collide (CLAUDE.md §6 applied to a filesystem-backed store). Five endpoints: POST / (create + seed via the Phase 67 scaffold engine), GET /{id} (list files + content for the Console editor), PATCH /{id}/files/{path} (path-traversal-safe per §7 rule 5), POST /{id}/preview (validation-only dry-run via internal/config.Load), POST /{id}/save (promote to operator-supplied output dir; refuses with ErrValidationFailed when the rendered harbor.yaml fails the validator), DELETE /{id} (idempotent discard). Five SafePayload bus events land per round-trip — dev.draft.{created,updated,previewed,saved,discarded} — registered with internal/events's exhaustive registry at init(). harbortest/devstack/devstack.go::Assemble mirrors the production wiring per D-094 (always constructs a DraftStore; mounts the handler when transports are enabled). test/integration/phase66_draft_save_test.go exercises the round-trip through the devstack helper with a real Bearer token, observes the five bus events, exercises path-traversal + missing-bearer failure modes, and runs an N=10 concurrency stress under -race. internal/devdraft/concurrent_test.go runs the D-025 N=128 concurrent-reuse test against one shared Store. scripts/smoke/phase-66.sh drives the round-trip against the live binary; the 404/405/501 → SKIP convention keeps the smoke harmless on builds that pre-date Phase 66. Coverage on internal/devdraft: ≥80% (master-plan target 75%).
67 — harbor scaffold (RFC §8)
Goal. Generate a new agent skeleton from a template (default = "minimal-react"). Templates discoverable; output passes harbor validate. Acceptance. harbor scaffold my-agent creates a buildable project; harbor validate returns 0. Tests. Golden output. Deps. 63.
§4.3 deviation (D-087). Phase 67 was dispatched in parallel with Phase 68 (harbor validate) per CLAUDE.md §17.7 step 3. At scaffold-time, harbor validate is still a Phase 63 stub — calling it would exit non-zero with not_implemented regardless of the scaffolded config's validity. Phase 67's acceptance criterion is therefore verified against internal/config.Load + Validate directly (the shipped subsystem the future harbor validate will call), via cmd/harbor/scaffold/scaffold_test.go::TestScaffold_RenderedConfig_PassesConfigValidate. The cross-phase CLI integration smoke step (running harbor validate ./harbor.yaml after a scaffold, asserting exit 0) lands in Phase 68's PR per §17.6. The §13 primitive-with-consumer rule is satisfied — the consumer-of-the-config-validator is a real shipped subsystem (internal/config), not a future CLI surface.
68 — harbor validate (RFC §8)
Goal. Validate config / skills / agent definitions without booting. Errors include file:line. Acceptance. Each error category produces a stable message; CI uses validate as a pre-flight check. Tests. Golden errors. Deps. 63, 02.
69 — harbor inspect-events / inspect-runs (RFC §8)
Goal. Tail/filter event bus; list recent runs + show trajectory. Acceptance. harbor inspect-events --session SID --type tool.completed filters server-side; harbor inspect-runs SID shows run trajectory. Tests. Golden CLI outputs. Deps. 63, 60.
70 — harbor inspect-topology (RFC §8)
Goal. Render run's node graph as ASCII; consumes topology.snapshot events. Acceptance. Sample run produces stable ASCII matching golden. Tests. Golden. Deps. 63, 60.
71 — harbortest test kit package (RFC §6.13)
Goal. Public harbortest package: RunOnce(ctx, agent, input) (Output, EventLog, error), AssertSequence(log, []EventType{...}), AssertNoLeaks(log) (cross-tenant/session leakage detector), SimulateFailure(toolName, code, n), RecordedEvents(runID) []Event. Acceptance. Flow-level test ≤ 10 lines; AssertNoLeaks catches a deliberate cross-session bug in a regression test. Tests. Self-test of the kit. Deps. 05, 09, 07.
Console wave — re-decomposition pending (tracked, not yet expanded). Phases 72–75 currently cover the Runtime-side Protocol hooks for a subset of the Console. RFC §7 now defines the full Console information architecture: a 14-page observability + control plane (Overview, Live Runtime, Sessions, Tasks, Agents, Tools, Events, Background Jobs, Flows, Memory, MCP Connections, Artifacts, Evaluations, Settings) organized as runtime lenses — every page is a projection over
state snapshots + realtime events + control commands. The binding structuring rule (RFC §7, CLAUDE.md §13): no Console page phase ships without its feeding Protocol-surface phase landing first or in the same wave. When this wave is re-decomposed, the heavy pages (Live Runtime, Events, Agents) each become their own phase twinned with a Protocol-surface phase; the lighter pages cluster. The Agents page is a lens over the Agent Registry (phase 53a). Thenotification.*topic (Overview intervention queue) andsearch.*Protocol methods (global ⌘K) land as named acceptance criteria of their consuming page phases, not as free-floating primitives. Evaluations is explicitly post-V1 (D-064) — it is a subsystem, not a page. Re-decomposition itself follows the §16 phase-authoring ritual per new phase and is not done in this edit.Console-wave deployment + shared-library posture (BINDING — D-091 / D-092 / D-093). Companion to the page-decomposition note above; this note locks in the how it's deployed and how it's built answers a future Console plan-author cannot relitigate. Departures from any item below require an RFC PR, not a phase-plan footnote.
harbor consoleis the Console's deployment surface, notharbor dev. The full Console SvelteKit build is baked intocmd/harborviaembed.FSand served by a newcmd/harbor/cmd_console.gosubcommand (a phase to be slotted at re-decomposition time).harbor dev(Phase 64, shipped) is and stays headless — embedding the Console intoharbor devis rejected (couples developer iteration to operator observability; wrong scope). A future packed dev UI for single-agent development reuses the Console's chat/playground components via a shared library; post-V1. Decision: D-091.- Svelte 5 + runes mode only.
web/console/svelte.config.jsships withcompilerOptions: { runes: true };package.jsonpins"svelte": "^5.0.0". Legacy Svelte 4 reactivity ($:, top-levelletas state,export letprops, store auto-subscription in scripts) is rejected bysvelte-check --fail-on-warnings. Decision: D-092.- Protocol TypeScript client is generated, not hand-written.
cmd/harbor-gen-protocol-ts/readsinternal/protocol/singlesource.CanonicalWireTypesand emitsweb/console/src/lib/protocol.tswith a// CODE GENERATED ... DO NOT EDIT.header. Amake protocol-ts-gen-checktarget assertsgit diff --exit-codeis clean in CI. Hand-rolledfetchin.sveltefiles is still rejected (§13). Decision: D-093.- Stylelint enforces the no-raw-literals rule mechanically. The first Console phase that creates
web/console/landsweb/console/.stylelintrc.cjsthat disallows hex / rgb() / named colors and arbitrarypx/rem/emoutside the token surface (tokens.css).npm run lintfails CI on raw literals; reviewers no longer hunt for them by eye.- Shared chat module — encapsulate first, extract on second consumer. The chat + playground + MCP-Apps renderer + file-upload + trace-toggle components ship as a self-contained module at
web/console/src/lib/chat/. The introducing phase enforces: (a) no imports of other Console internals from the chat module; (b) a typedProtocolClientinterface the caller injects, not a Console singleton; (c) the MCP-Apps renderer registry lives atweb/console/src/lib/chat/renderers/. The future packed dev UI extracts toweb/shared/chat/viagit mvwhen its phase plan lands.- Mockup inventory is complete for V1 (as of 2026-05-18). All 13 V1 sidebar pages plus the session-level Playground surface have canonical mockups at
docs/rfc/assets/console-<slug>-page.png(14 PNGs; Evaluations excluded per D-064). Eachdocs/design/console/page-<slug>.mdspec carries a§12. Mockup-aligned refinements (2026-05-18)section that reconciles its mockup against §3-§7. Each Console page phase plan MUST reference the canonical mockup for the view(s) it ships AND consume the §12 reconciliation directly — the §12 component table is the binding source for any[wave-13-extends]Protocol-surface additions. The superseded legacydocs/research/console-mockup-runtime-view.pngis retained as a research artifact only; the canonical Live Runtime mockup isdocs/rfc/assets/console-live-runtime-page.png.- §17.7 dispatch-prompt forcing function. Every Console-wave dispatch prompt MUST name in its mandatory reading list: Brief 11, Brief 12, every
docs/rfc/assets/console-*-page.pngasset (the legacydocs/research/console-mockup-runtime-view.pngis superseded — agents should not consume it), CLAUDE.md §4.5 + §13 frontend bullets, and the three decisions above (D-091, D-092, D-093). This note is binding, not advisory.- Per-page Console specs live at
docs/design/console/page-<slug>.md. The 14-page IA is decomposed into one self-contained spec per page (Overview, Live Runtime, Sessions, Tasks, Agents, Tools, Events, Background Jobs, Flows, Memory, MCP Connections, Artifacts, Settings, Playground) — each carries an eleven-section template with a[shipped]/[wave-13-extends]/[deferred]functionality matrix. These specs are the authoritative per-page mockup-authoring source for Wave 13 and MUST appear in every per-page agent's mandatory reading list alongside Brief 11, Brief 12, and the relevant mockup asset. The directory'sREADME.mdis the index.
72 — Console subscription protocol surface (RFC §5.2, §7)
Goal. Read-only event subscription scoped by identity triple; admin/console:fleet scope for cross-session/tenant. Acceptance. Console can subscribe to a session's events; cross-tenant call rejected unless scoped admin. Tests. Integration. Deps. 60, 05, 06. Plan file. docs/plans/phase-72-console-subscription-scope.md (shipped — D-105).
72a — events.subscribe filter extensions + events.aggregate (RFC §5.2, §6.13)
Goal. Extend the events.subscribe Protocol surface with a wire EventFilter struct (event-type / tenant / user / session / run / time-window) and add a new events.aggregate Protocol method returning time-bucketed event-type counts. Both methods use the closed two-scope set (auth.ScopeAdmin + auth.ScopeConsoleFleet) for cross-tenant fan-in per D-079 — NO new events.crosstenant scope. Acceptance. EventFilter + EventBucket + EventAggregateRequest + EventAggregateResponse ship in internal/protocol/types/events.go; events.aggregate route mounted on the wire; cross-tenant requests without the closed-set scope claim return 403 + CodeIdentityScopeRequired; bucket arithmetic deterministic (Window % Bucket == 0 or 400); concurrent-reuse pin under -race (N≥100). Tests. Unit (filter matrix, aggregate bucket arithmetic, concurrent-reuse) + integration (test/integration/events_filter_aggregate_test.go — real bus + real auth + real transports, scope-claim happy + reject paths, concurrent-reuse over the wire) + smoke (scripts/smoke/phase-72a.sh). Deps. 60, 61, 72. Plan. See docs/plans/phase-72a-events-filter-aggregate.md.
72e — pause.list snapshot Protocol method (RFC §5.2, §6.3)
Goal. Add the pause.list Protocol method (route POST /v1/pause/list) — a paginated, identity-scope-filtered snapshot of currently-paused tasks / sessions, projected from the shipped Phase 50 Pause/Resume Coordinator's in-memory registry. Read-only: it consumes the Coordinator state, it does not mutate the registry or call Resume. It is the snapshot half of the Console intervention-queue contract; live deltas continue to flow through events.subscribe on the pause.requested / pause.resumed topics. The Overview-page intervention queue (Phase 73a) is the UI consumer. Acceptance. MethodPauseList + the PauseSnapshot / PauseFilter / PauseListRequest / PauseListResponse / PauseArtifactRef wire types ship in internal/protocol/{methods,types}; the Coordinator.List interface extension + internal/runtime/pauseresume/list.go implementation; identity-mandatory (401 CodeIdentityRequired); cross-tenant filter without auth.ScopeAdmin → 403 CodeIdentityScopeRequired (D-079 closed-scope reuse, no new scope); the D-026 heavy-content bypass routes oversized pause payloads through the ArtifactStore and emits pause.payload_artifact_routed; pagination (PageSize default 50, max 200, out-of-range → 400, never silently clamped); concurrent-reuse pin under -race (N=128). Tests. Unit (list_test.go — filter combinations + pagination math + status semantics; pause_list_handler_test.go — identity / scope-claim / malformed / heavy-bypass; list_concurrent_test.go — D-025 N=128) + integration (test/integration/pause_list_test.go — real Coordinator + real transport + real auth, two-tenant scope, cross-tenant reject, admin-claim accept, heavy-payload bypass, concurrency stress, all -race) + smoke (scripts/smoke/phase-72e.sh). Deps. 50, 60, 61, 17 (all shipped). 73c / 73d for pagination-shape consistency only — same wave. Plan. See docs/plans/phase-72e-pause-list-snapshot.md (shipped — D-110).
72g — governance.posture + llm.posture (RFC §5.5, §6.15)
Goal. Two read-only posture Protocol methods feeding the Console Settings page (Phase 73m). governance.posture returns the D-081 IdentityTiers view (per-tier BudgetCeilingUSD + token-bucket RateLimit + MaxTokens) plus DefaultTier + the caller-resolved tier. llm.posture returns the bound LLM provider/model/region + a MockMode boolean — true iff the runtime booted with HARBOR_DEV_ALLOW_MOCK=1 (D-089). The two methods EXTEND the Phase 72f PostureSurface (one surface, not two — §13). Both are identity-mandatory; cross-tenant reads require auth.ScopeAdmin (D-079). Read-only — no mutation method. Acceptance. MethodGovernancePosture / MethodLLMPosture registered in internal/protocol/methods + folded into IsPostureMethod; wire types in internal/protocol/types/{governance,llm}.go; the Phase 72f PostureSurface dispatcher routes both new methods through the control transport via the same IsPostureMethod branch; cross-tenant non-admin → 403 CodeScopeMismatch; missing identity → 401; cross-tenant governance/llm admin reads emit a *.posture_read_admin audit event; MockMode reflects the D-089 boot-time capture; concurrent-reuse pin under -race (N≥100). Tests. Unit (posture providers, posture surface, control posture handler, concurrent-reuse) + integration (test/integration/phase72g_posture_test.go — real governance + llm + transports + ES256 auth, MockMode round-trip across two boot modes, cross-tenant reject, N≥10 stress) + smoke (scripts/smoke/phase-72g.sh). Deps. 36a, 36b, 64, 72f. Plan. See docs/plans/phase-72g-governance-llm-posture.md (shipped — D-112).
72h — Console DB local schema + SvelteKit scaffold (RFC §7)
Goal. Land the Console-local IndexedDB schema (per D-061 — Console-local state ONLY, never a shadow source of truth for runtime entities) AND introduce the web/console/ SvelteKit scaffold (audit-resolved A5) every Stage-2 Console page rides on. Eight V1 tables: saved_filters, saved_views, profiles, runtime_registry, auth_profiles, pat_store, notifications_routing, keybindings. Acceptance. web/console/src/lib/db/ ships as a self-contained TypeScript module behind a ConsoleDB driver interface (V1 default driver: IndexedDB); per-operator row scoping is structural ([operator_id, id] compound key); auth_profiles / pat_store blobs are AES-GCM ciphertext with a PBKDF2-derived KEK (crypto.ts); forward-only migrations; the §13 / D-061 carve-out is mechanically scanned (schema-carveout.spec.ts + smoke); the SvelteKit scaffold pins Svelte 5 runes (D-092) + ships the generated protocol.ts stub (D-093). Tests. Vitest unit (crypto.spec.ts, schema.spec.ts, schema-carveout.spec.ts, migrations.spec.ts) + in-package integration (tests/integration.spec.ts — real IndexedDB driver via fake-indexeddb, real WebCrypto, eight-table round-trip, cross-operator isolation, encrypted-blob round-trip, wrong-key fail-loud) + smoke (scripts/smoke/phase-72h.sh, static-only). Deps. 60 (Protocol auth for PAT identity scoping). Plan. See docs/plans/phase-72h-console-db-schema.md. Decision: D-113.
73 — Console state inspection surface (RFC §5.2, §7)
Status. Shipped* — dissolved during Wave 13 (D-133). Phase 73 never landed as a standalone phase; its surface was decomposed across the Console page phases that consumed each slice. Shipped: sessions.inspect (Phase 73c), tasks.get (Phase 73d), artifacts.list / artifacts.put / artifacts.get_ref (Phase 73l, D-120). Deferred post-V1 (no V1 consumer — §13 no-primitive-without-consumer): state.history, state.list_trajectories, state.load_planner_checkpoint, artifacts.get, artifacts.delete — each lands additively with the first Console surface that consumes it. Goal. sessions.inspect, tasks.get, state.history, state.list_trajectories, state.load_planner_checkpoint, artifacts.list, artifacts.get, artifacts.get_ref, artifacts.delete — all scope-checked, redacted on emit. Acceptance. Each method enforces identity; redaction applied; pagination defined. Tests. Integration + scope mismatch. Deps. 60, 07, 17. Cross-reference. Phase 73l (Console Artifacts page) is the page-side consumer — it extends artifacts.list's filter shape and adds artifacts.put + the artifacts.get_ref presigned-URL resolver in the same wave (D-120).
73l — Console Artifacts page (RFC §5.2, §6.10, §7)
Goal. The Console Artifacts page — catalog + preview surface over the runtime's content-addressed artifact store — plus its feeding Protocol additions: the artifacts.list filter extensions (mime / source / size / created / tags), the artifacts.put upload pipeline (Brief 11 §PG-2), and the artifacts.get_ref presigned-URL resolver (D-022 / D-026). Ships the canonical renderer-registry SKELETON at web/console/src/lib/chat/renderers/ (dispatch table + six MIME renderers) — Phase 73l is the registry's first in-staging consumer; Phase 73n extends it. Acceptance. The three artifacts.* methods route through a sibling ArtifactsSurface; identity-mandatory + D-079 cross-tenant gating; artifacts.get_ref fails loud with CodePresignUnsupported on a non-S3 driver; the page dispatches previews through the canonical registry with no bespoke per-mime renderer; mutation surfaces render disabled-with-tooltip. See docs/plans/phase-73l-console-artifacts-page.md. Tests. Unit (internal/protocol/artifacts_test.go), concurrent-reuse N=100 (D-025), integration (test/integration/artifacts_page_test.go — in-mem + SQLite + fs drivers + real wire transport), renderer-registry Vitest, Playwright per-page spec. Deps. 73 (artifacts base methods), 75 (Playwright harness). Deviations (D-120). The surface lands at internal/protocol/artifacts.go (the codebase has no handlers/ sub-package — it follows the SearchSurface / PostureSurface convention); web/console/src/lib/protocol.ts is hand-extended (the cmd/harbor-gen-protocol-ts generator binary has not yet landed — Phase 72h committed protocol.ts as a hand-shaped stub). Both are recorded in the phase plan.
73j — Console Memory page (Protocol + UI) (RFC §5.2, §6.6, §7)
Goal. Bundle the Memory-page Protocol surface and UI into one Stage-2.1 phase (Wave 13 decomposition §5). Three read-only Protocol methods land — memory.list (paginated, identity-scope-filtered memory records + aggregate counters), memory.get (one record's full detail; heavy values routed through artifacts.get by reference per D-026), memory.health (aggregate counters + per-scope driver mapping). The methods compose over the shipped MemoryStore.Snapshot surface (Phases 23–25) + the events.aggregate 24h counters (Phase 72a). The UI is the SvelteKit Memory page (/memory) — catalog table + right-rail status cards (Memory health / Recent identity rejections / Recovery dropouts / Selected-item detail) + the disabled-with-tooltip bulk-action toolbar (V1 is view-only; the memory mutation surface is deferred to Phase 73 / post-V1). The page IS the consumer (§13 satisfied trivially); it also consumes memory.identity_rejected (D-033) + memory.recovery_dropped (D-035) events. Acceptance. MethodMemoryList / MethodMemoryGet / MethodMemoryHealth registered in internal/protocol/methods + folded into the new IsMemoryMethod predicate; wire types in internal/protocol/types/memory.go; the three routes (POST /v1/memory/{list,get,health}) mounted via transports.WithMemory; identity-mandatory (401 CodeIdentityRequired); cross-tenant filter without auth.ScopeAdmin → 403 CodeIdentityScopeRequired — NO new memory scope (audit B1; D-079 closed-set reuse); the D-026 heavy-value bypass routes oversized values through the ArtifactStore and memory.get ships ValueArtifact (never inline bytes); a constructed-driver negative test fails loud with ErrContextLeak; concurrent-reuse pin under -race (N≥100); the Memory page renders against the mockup with design-token-only styling; per-page Playwright spec web/console/tests/memory-page.spec.ts. Tests. Unit (internal/memory/protocol — list_test.go / get_test.go / health_test.go / leak_internal_test.go / concurrent_reuse_test.go; internal/protocol/transports/stream/memory_handler_test.go) + integration (test/integration/memory_page_test.go — real MemoryStore + real transport + real ES256 auth + real artifact store + real events bus; happy path, cross-tenant reject, identity-required fail-loud with the D-033 bus assertion, D-026 heavy-value round-trip, N≥10 two-tenant concurrency stress, all -race) + Console-side Vitest (saved_filters_memory.spec.ts, protocol-memory.spec.ts) + Playwright (memory-page.spec.ts) + smoke (scripts/smoke/phase-73j.sh). Deps. 23, 24, 25, 60, 61, 72a, 72h, 73 (artifacts.get), 75 (all shipped or same-wave). Plan. See docs/plans/phase-73j-console-memory-page.md (shipped — D-118).
73i — Console Flows page (Protocol + UI) (RFC §5.2, §6.1, §7)
Goal. Ship the Console Flows page as a single Wave 13 Stage-2.1 phase: six NEW flows.* Protocol methods (flows.list with aggregate metrics, flows.describe engine-graph payload, flows.runs.list, flows.runs.describe, flows.run, flows.metrics) + the read-only Flows-page UI (catalog table + Flow Metrics card + the shared read-only engine graph canvas + per-flow Budget meter + run-history table + selected-run summary panel) + the per-page Playwright spec. Authoring is OUT of V1 per D-063 — the page is view-only with flows.run as the only mutating action, gated on auth.ScopeAdmin (D-079).
Acceptance. Six method names declared in internal/protocol/methods/methods.go; wire types in internal/protocol/types/flows.go; all six identity-mandatory + cross-tenant gated on auth.ScopeAdmin; flows.run gated on the same admin claim and degrades to 403 without it; flows.runs.describe ships heavy outputs via FlowArtifactRef (D-026); the shared EngineGraphCanvas + typed GraphInput interface published for Phase 73b; no authoring affordances render (D-063).
Deviations (D-117). The flows.run mutating gate reuses auth.ScopeAdmin (D-079 closed two-scope set — no new scope minted). The runtime side introduces a new flow.Registry subsystem as the source-of-truth (registered flows + bounded run-history ring). The typed Console client lives at web/console/src/lib/flows/client.ts as the hand-authored mirror of the flows.* surface until cmd/harbor-gen-protocol-ts (D-093) is extended to emit it — protocol.ts itself is not hand-edited.
Tests. Unit (flow/protocol/*_test.go — surface + catalog + invoker; flows_handler_test.go — identity / scope / decode; concurrent_reuse_test.go — D-025 N≥100) + integration (test/integration/flows_page_test.go — real registry + real transport + real auth, two-tenant scope, cross-tenant reject, flows.run reject without claim, D-026 heavy-output bypass, concurrency stress, all -race) + Console Vitest (format.spec.ts, layout.spec.ts, client.spec.ts) + Playwright (web/console/tests/flows-page.spec.ts) + smoke (scripts/smoke/phase-73i.sh).
Plan. See docs/plans/phase-73i-console-flows-page.md (shipped — D-117).
73g — Console Events page (RFC §5.2, §6.13, §7)
Goal. Ship the Console Events page — the runtime event-bus stream as a full-screen, query-driven investigative surface. This is a composition-only page phase: it ships NO new Protocol method. It consumes the shipped events.subscribe (GET /v1/events SSE table feed — Phase 72), events.aggregate (POST /v1/events/aggregate sparkline feed — Phase 72a), and artifacts.get_ref (heavy-payload Open artifact resolver — Phase 73l). The page IS the consumer Phase 72a's primitives waited for (§13 satisfied trivially). The UI is the SvelteKit Events page (/events) — faceted filter chips + Console-DB-backed saved-view chips + event-rate sparkline + virtualised event table + right-rail Event Details card + Pause-stream toggle + Export ▾ — built on the D-121 design-system foundation.
Acceptance. Route under (console)/events/ (no /console/ URL prefix — CONVENTIONS.md §1); the EventsNamespace joins the unified HarborClient; saved views persist in the shipped saved_filters Console DB table scoped to page='events' (no new table — D-061); the Pause-stream toggle is a Console-local render gate distinct from the runtime pause method; heavy payloads route through artifacts.get_ref, never inlined (D-026); cross-tenant Tenant ▾ gated on the D-079 closed scope set (no events.crosstenant minted); four-state PageState. See docs/plans/phase-73g-console-events-page.md.
Deviations (D-125). No new Protocol method (composition-only). The route ships at web/console/src/routes/(console)/events/ and the page components at web/console/src/lib/components/events/ — the phase plan (authored before D-121) named console/events/ and lib/events/components/; CONVENTIONS.md §1/§3 (D-121) is the binding cross-cutting authority and yields the corrected paths (CLAUDE.md §15).
Tests. Console Vitest (filters.test.ts, sparkline.test.ts, export.test.ts, taxonomy.test.ts, saved_filters_events.spec.ts, EventsNamespace cases in harbor-client.spec.ts) + integration (test/integration/events_page_test.go — real inmem bus + real SSE/aggregate handlers + real artifacts surface, subscribe filter narrowing, aggregate sparkline correctness, cross-tenant isolation, the truncated-payload artifacts.get_ref identity-rejection failure mode, N≥16 concurrent-subscriber stress, all -race) + Playwright (web/console/tests/events-page.spec.ts) + smoke (scripts/smoke/phase-73g.sh).
Plan. See docs/plans/phase-73g-console-events-page.md (shipped — D-125).
73a — Console Overview page (composition-only UI) (RFC §5.2, §6.13, §6.15, §7)
Goal. Ship the Console Overview page — the operator's at-a-glance hub and the default route on a fresh attach. This is a composition-only page phase: it ships NO new Protocol method. It composes the SHIPPED runtime.counters / runtime.health (Phase 72f), pause.list (Phase 72e), events.subscribe SSE (Phase 60 / 72), and the Phase 54 approve / reject control verbs into the 4-card counter row + sub-header health-chip strip + cost-rollup card + intervention queue + recent-activity feed + 2×3 Quick Links grid + the + New quick-create menu. The UI is the SvelteKit Overview page (/overview) built on the D-121 design-system foundation.
Acceptance. Route under (console)/overview/ (no /console/ URL prefix — CONVENTIONS.md §1); the RuntimeNamespace + PauseNamespace join the unified HarborClient; the counter sparklines / recent-activity feed / cost rollup fold client-side off the events.subscribe cursor (no new Protocol method — page-overview.md §12); the intervention queue's Approve / Reject invoke the SHIPPED Phase 54 control verbs and degrade to disabled-with-tooltip without the admin control-scope claim (D-066 / §13 — no parallel implementation); the Quick Links grid is exactly six tiles with no Evaluations tile (D-064); saved views persist in the shipped saved_filters Console DB table scoped to page='overview' (no new table — D-061); four-state PageState with nested PageState per panel. See docs/plans/phase-73a-console-overview-page.md.
Deviations (D-127). No new Protocol method, no new Go-side surface (composition-only — internal/ is unchanged). The route ships at web/console/src/routes/(console)/overview/ — the phase plan (authored before D-121) named web/console/src/routes/overview/ and the smoke probed /console/overview; CONVENTIONS.md §1 (D-121) is the binding cross-cutting authority and yields the corrected unprefixed (console)-group paths (CLAUDE.md §15).
Tests. Console Vitest (aggregations.test.ts, activity.test.ts, cost.test.ts, saved_filters_overview.spec.ts, RuntimeNamespace / PauseNamespace cases in harbor-client.spec.ts) + Playwright (web/console/tests/overview-page.spec.ts — depth-bar shell, counter row, scope-gated intervention actions, Quick Links navigation, + New deep-links, the Disconnected PageState) + smoke (scripts/smoke/phase-73a.sh). No Go-side integration test — Phase 73a adds no internal/ seam; the cross-stack integration assurance is the Playwright spec against a live harbor console plus the upstream 72e/72f integration tests.
Plan. See docs/plans/phase-73a-console-overview-page.md (shipped — D-127).
73c — Console Sessions page (Protocol + UI) (RFC §5.2, §6.9, §7)
Goal. Ship the Console Sessions page as a single Wave 13 Stage-2.1 phase: two NEW sessions.* Protocol methods (sessions.list — paginated + filtered SessionRegistry projection with the full filter set; sessions.inspect — full per-session snapshot) + the SvelteKit Sessions list/detail route + the per-page Playwright spec. Read-only — the bulk Cancel / Pause toolbar actions iterate the shipped per-row control methods (D-072) and render disabled-with-tooltip (D-066). The page IS the first consumer of sessions.list (§13 primitive-with-consumer).
Acceptance. Two method names declared in internal/protocol/methods/methods.go; nine wire types in internal/protocol/types/sessions.go; both identity-mandatory + cross-tenant gated on auth.ScopeAdmin (D-079); sessions.list emits Truncated bool not a silent total (D-026); the Sessions-page Identity column renders Phase 72b's IdentityScope impersonation triplet; no Priority surface (D-065); saved filters Console-DB-local (D-061); the page clears the CONVENTIONS.md §5 depth bar.
Deviations (D-122). The wire handler lands at internal/protocol/transports/stream/sessions_handler.go (the codebase has no internal/server/ package — the plan's path is stale; the handler follows the Phase 73f / 73i precedent). sessions.inspect ships whole, not as an additive extension of a Phase 73 parent method that has not landed. web/console/src/lib/protocol.ts is NOT hand-edited — the Sessions wire types live at web/console/src/lib/sessions/types.ts with a typed SessionsProtocol wrapper over the unified HarborClient, following the Phase 73i Flows-page precedent until cmd/harbor-gen-protocol-ts (D-093) lands.
Tests. Unit (sessions/protocol/protocol_test.go — Service filter/cursor/scope; concurrent_test.go — D-025 N≥100; sessions_handler_test.go — identity / scope / decode) + integration (test/integration/sessions_page_test.go — real registry + real transport + real auth, two-tenant scope, cross-tenant reject + audit emit, malformed cursor, N≥10 SSE-subscriber concurrency stress, all -race) + Console Vitest (sessions/tests/format.spec.ts, db/tests/saved_filters_sessions.spec.ts) + Playwright (web/console/tests/sessions-page.spec.ts) + smoke (scripts/smoke/phase-73c.sh).
Plan. See docs/plans/phase-73c-console-sessions-page.md (shipped — D-122).
74 — Console topology projection events (RFC §5.2, §6.13, §7.1)
Goal. topology.snapshot Protocol method + topology.changed event over the canonical engine-scoped TopologyProjection (static graph + live per-edge queue depth); the event emits on engine construction, the method serves on-demand cold-start. Acceptance. A Protocol client renders a topology view from the canonical projection alone (no internal access); identity-mandatory; cross-tenant requires auth.ScopeAdmin (D-079). See docs/plans/phase-74-console-topology.md. Tests. Unit (internal/protocol/types, internal/runtime/engine), concurrent-reuse N≥128 (D-025), integration (test/integration/phase74_topology_test.go — real engine + real bus + real wire transport). Deps. 05, 09. Deviations (D-114). The ControlSurface topology accessor wires via the WithTopologyAccessor functional option (not a positional NewControlSurface argument — keeps the Phase 54 signature stable); the nil-accessor / engine-less path returns CodeUnknownMethod (no CodeMethodNotSupported code exists); harbor dev hosts no engine-graph so its surface leaves the accessor nil; the decision number is D-114 (the plan's pre-assigned D-106 collided with a parallel Wave 13 phase).
75 — Console e2e Playwright harness baseline (RFC §7)
Goal. Playwright harness baseline under web/console/tests/ — config, fixtures, page-object base class, helpers, the meta-test, and the frontend-e2e CI hook. The harness runs against harbor console (D-091) — NOT harbor dev; the original master-plan wording is corrected per D-091 + Brief 12 (the Console static build is served exclusively by harbor console). Per the binding rule: every operator-facing flow shipped in a phase has a matching .spec.ts. Wave 13 (docs/plans/wave-13-decomposition.md §12 item 7) narrows this phase to baseline-only: per-page specs land alongside each Stage-2 page phase (73a–73n); the wave-end aggregator suite is Phase 75a (Stage 3). See D-115. Acceptance. A baseline harness exists at web/console/tests/ (config + fixtures + page-object base + helpers + meta-test); the frontend-e2e CI job runs it and skips gracefully when web/console/ is absent (directory-missing → SKIP); future Console page phases hook their per-page specs into it. Tests. Playwright meta-test (harness.spec.ts) — boots harbor console, asserts the index serves + the SvelteKit app hydrates; SKIPs cleanly before the harbor console subcommand (Phase 73m) and the SvelteKit scaffold (Phase 72h) land. Deps. 60, 72. (Narrowed from 64, 72, 73 per the Wave 13 decomposition §4 — per-page Protocol additions move into each Stage-2 page phase; 64 is transitively assumed via 60.)
75a — Console e2e Playwright wave-end suite (RFC §7)
Goal. The Wave 13 wave-end aggregator Playwright suite (web/console/tests/wave13.spec.ts) — full IA navigation across all 14 V1 Console pages, scope-claim degradation regression, cross-page identity isolation, saved-view persistence, notification routing end-to-end. Bundled with the final Stage-2 PR per CLAUDE.md §17.5. Includes test/integration/wave13_test.go (Go-side wire-type round-trip + cross-page identity isolation + N≥10 concurrent SSE subscriber stress). Enumerates the 14-page IA and asserts a matching <slug>-page.spec.ts exists for each — a missing page-spec pair is a build break (operator §12 item 7 binding amendment). Acceptance. Every one of the 14 V1 Console pages has a matching per-page spec; the aggregator walks them all; the page-coverage check (make wave13-coverage-check) is green. Tests. wave13.spec.ts + test/integration/wave13_test.go. Deps. 75, 73a-73n. Shipped notes (D-131). Three things landed beyond the original plan: (1) a §17.6 cross-phase fix of a Phase 73m build-pipeline gap — the frontend-e2e CI job now runs make console-build before make build so harbor console embeds the real SvelteKit bundle (it was embedding an empty consoledist/); (2) a dev-only runtime-entity fixture seeder (cmd/harbor/devseed.go, gated by HARBOR_DEV_SEED_FIXTURES=1) so the per-page Playwright specs render real rows — the 25 SEED_DEPENDENT per-page skips were un-skipped and pass; (3) six per-page tests (Live Runtime tab content ×2, Playground chat ×3, Events pause-toggle ×1) carry a documented §17.6 deferral skip — they need run-trajectory fixtures (a live topology.snapshot / chat history / SSE subscription), a larger seam than registry seeding, tracked as a follow-up.
76 — Cross-tenant isolation conformance harness (RFC §4.3)
Goal. A master conformance harness asserting cross-tenant + cross-session isolation across StateStore / ArtifactStore / MemoryStore / SkillStore / TaskRegistry / EventBus. 100 sessions × random ops under -race. Acceptance. Final invariant: every read's identity matches the caller's identity exactly; CI runs the harness on every PR. Tests. The harness is the test. Deps. 07, 17, 23, 37, 20. Risks. This is the integrity gate. A regression here is a security bug. Shipped notes (D-134). The harness lives at test/integration/isolation_conformance_test.go (package integration_test; no new top-level directory — AGENTS.md §3 / §17.2). Three shipped tests: TestE2E_Isolation_ConformanceHarness (the 100-session randomized soak), TestE2E_Isolation_CrossScopeReadIsBlind (targeted positive proof across the cross-session + cross-tenant boundaries), TestE2E_Isolation_FailClosedOnMissingIdentity (the §17.3 failure mode — every subsystem rejects an incomplete triple). Soak-window split (D-134): the every-PR default is a fast ~3 s window (100 workers × thousands of op-cycles still catch a leak with overwhelming probability); the master-plan 30 s soak is opt-in via HARBOR_ISOLATION_SOAK=<go-duration>, and -short forces the fast window. All six subsystems are opened through their production registry factories — no mocks at the seam; SkillStore runs against its only V1 driver, localdb SQLite (:memory: DSN). The dedicated isolation CI job runs the fast window on every PR.
77 — Goroutine leak conformance harness (RFC §5 Go conventions)
Goal. Harness wrapping every long-lived component asserting runtime.NumGoroutine returns to baseline after Stop(). Acceptance. All Runtime components pass; CI runs on every PR. Tests. The harness is the test. Deps. 10, 13, 50. Shipped. test/integration/phase77_goroutine_leak_test.go ships the table-driven TestE2E_Phase77_GoroutineLeakConformance — leakCases is a slice of {name, exercise} rows, one per long-lived Runtime component (Engine, inmem + durable EventBus, sessions.Registry, inprocess TaskRegistry). Each row constructs the real component with real drivers, runs 12 construct → exercise → teardown cycles, and asserts runtime.NumGoroutine() returns to baseline via a bounded eventually-poll (deadline + interval, never an instant snapshot — CLAUDE.md §17.4). A warm-up cycle precedes baseline capture; the suite is not t.Parallel (NumGoroutine is process-global). Passive registries with no background goroutines (pauseresume.Coordinator, steering Registry/Inbox/RunLoop) are deliberately not rows — they have no teardown seam to leak from; the Phase 50 dependency is satisfied by the pause primitive being exercised inside the Engine run lifecycle. A dedicated leak-harness CI job runs the suite under -race on every PR. All five V1 component rows pass on first run — no leaks found. See D-135.
78 — Chaos / fault injection harness
Goal. Kill mid-run, drop messages, simulate provider quirks, simulate StateStore disconnect, force pause-deserialize failures. Used in integration tests; not on hot path. Acceptance. Each failure mode produces the documented event + recovery path. Tests. Chaos suite. Deps. 76, 77. Shipped. test/integration/phase78_chaos_fault_injection_test.go ships the table-driven TestE2E_Phase78_ChaosFaultInjection — chaosCases is a slice of {name, inject} rows, one per master-plan failure mode. Each row wires the real Runtime component through its production factory / constructor (engine.New, events.Open, state.Open, pauseresume.New, retry.Wrap), injects one fault, and asserts BOTH the documented loud error / event AND the documented recovery path. The five rows: kill-mid-run (a run held in-flight by a blocking node is cancelled — asserts the engine's RunCancelledHandler seam fires, FetchByRun observes ErrRunCancelled, Engine.Stop tears down cleanly within a bounded deadline, no goroutine leak); drop-messages (a tiny-buffered subscription is saturated past the bus's drop-oldest backpressure — asserts the typed bus.dropped event carries a non-empty dropped sequence range); provider-quirks (a quirk LLM driver returns malformed output, wrapped in the real retry.Wrap retry-with-feedback layer with a rejecting Validator — asserts the llm.retry_with_feedback event fires + the call exhausts with llm.ErrRetryExhausted, plus a recovery sub-case that succeeds after one retry); statestore-disconnect (a fault-injecting decorator over the real in-mem StateStore returns a transport error — asserts the error surfaces loudly out of Save/Load, then the reconnect recovery path works); pause-deserialize-failure (a PauseRequest whose trajectory carries a live channel fails Coordinator.Request loud with trajectory.ErrUnserializable naming a non-empty field path — the D-069 / RFC §3.4 fail-loud contract, never a half-persisted checkpoint, plus a clean-trajectory recovery sub-case). Faults are injected by THIN DECORATORS over the real components (test/integration/phase78_faults_test.go) — they decorate, never replace, and live in *_test.go files, never registered as a driver default (the §17.3 "real drivers at the seam" pattern with a fault overlay, not the §13 "test stub as production default" anti-pattern — see D-137). Every row asserts the fault is SURFACED loudly; no silent degradation (CLAUDE.md §13). A dedicated chaos CI job runs the suite under -race on every PR. All five failure-mode rows pass under -race. scripts/smoke/phase-78.sh (static-only) asserts the harness + decorators files exist, declare the conformance test, are table-driven, and the chaos CI job is wired. See D-137.
79 — Performance benchmarks
Goal. Engine throughput (envelopes/sec under N runs); bus fan-out (subscribers vs latency); memory-strategy latency (truncation vs rolling_summary). Acceptance. Baseline numbers committed; perf regression threshold gates PRs (e.g. > 10% slowdown blocks). Tests. go test -bench. Deps. 10, 12, 05. Status. Shipped (D-136 — test/benchmarks/ suite over engine / bus / memory against real components; docs/perf/baseline.txt committed; scripts/perf/check-regression.sh benchstat gate wired into CI as the perf-regression job — fails on a statistically-significant slowdown past a noise-tolerant 30% threshold, an empirical calibration of the master plan's illustrative "10%"; make bench / make bench-check; phase plan phase-79-performance-benchmarks.md).
80 — Documentation hygiene polish
Goal. Every package has a doc comment; every exported symbol has godoc; example agents in examples/; recipe docs (docs/recipes/). Acceptance. golangci-lint's revive exported and package-comments clean; examples/ builds end-to-end. Tests. Lint + example builds in CI. Deps. All V1 phases. Status. Shipped (D-138 — the revive exported / package-comments documentation lint gate is now ENFORCED in CI: the lint job installs golangci-lint v1.64.8 and runs make lint-revive, which uses the dedicated .golangci-revive.yml config — previously make lint silently skipped because the binary was never installed. The exported rule keeps godoc-presence enforcement but gains disableStutteringCheck so the ~20 cross-package type renames the stutter sub-check would force stay out of a docs phase; the genuine doc gaps the rule surfaced — eight detached package comments, two malformed package comments, a handful of un-commented const/var blocks — are all fixed. examples/ gains worked, buildable code — examples/agents/echo/ (a harbortest.Agent + test) and examples/tools/weather/ (an inproc.RegisterFunc tool + register→resolve→invoke test) — exercised by a new CI examples job. docs/recipes/ ships five real-API-grounded how-to guides. The broader make lint backlog (~1000 issues across ~20 linters, accumulated while the gate silently skipped) is deliberately left to a separate release-hardening effort. Phase plan phase-80-documentation-hygiene-polish.md).
81 — Release engineering (versioning, changelog) (RFC §12)
Goal. Semver tagging, CHANGELOG.md, build provenance (SLSA-style attestations as a stretch). Acceptance. git tag v1.0.0-rc.1 produces a release artifact; CHANGELOG covers all V1 phases. Tests. Release dry-run. Deps. All V1 phases. Status. Shipped (D-139 — the product release version is stamped into the harbor binary at link time: cmd/harbor.HarborVersion becomes a var (a const cannot be -ldflags -X overridden), and scripts/release-build.sh — the single home of the build incantation — stamps it via go build -ldflags="-s -w -X 'main.HarborVersion=…'" from a git describe --tags-derived version, falling back to the v0.0.0-dev sentinel for an un-tagged build. The product release version is kept STRICTLY distinct from the Harbor Protocol version (internal/protocol/types.ProtocolVersion, RFC §5.3) — harbor version already prints both as separate fields; the two are versioned independently. CHANGELOG.md lands at the repo root in Keep-a-Changelog format, grouped by delivery wave / subsystem, covering every V1 phase (01–81 + the lettered phases). .github/workflows/release.yml triggers on a v* tag push — builds the CGo-free static binary, emits a SHA-256 checksum, attaches SLSA-style build provenance via GitHub's native actions/attest-build-provenance (the master-plan stretch — landed, not deferred, because the first-party action adds no framework dependency), and publishes a GitHub Release; a workflow_dispatch path runs the dry-run. scripts/release-dryrun.sh (the make release-dryrun target) is the master-plan "release dry-run" test — it exercises the exact release-build path with a synthetic version and asserts the artifact + checksum + version stamp, all without pushing a tag. Phase 81 creates NO v* tag — tagging is the operator's job in Phase 82. Phase plan phase-81-release-engineering.md.)
82 — V1 cut (RFC §1, §12)
Goal. v1.0.0 tag; release notes; migration notes (if any); blog/announcement scaffold. Acceptance. harbor version returns v1.0.0; preflight green; protocol conformance suite green; cross-tenant + leak harnesses green. Tests. Full preflight. Deps. 81.
Post-V1 follow-ups (83–90)
Listed for tracking. Not on the V1 critical path.
- 83 — Auto-sequence detection. Skip the LLM call on deterministic single-tool transitions. Off by default. RFC §12. Deps: 45.
- 83a — ReAct prompt structured sections. Refactor
defaultBuilderto assemble the twelve XML-tagged sections from brief 13 §2.1 (<identity>,<output_format>,<action_schema>,<finishing>,<tool_usage>,<parallel_execution>,<reasoning>,<tone>,<error_handling>,<available_tools>,<additional_guidance>,<planning_constraints>); addWithSystemPromptExtraOption +PlannerConfig.ExtraGuidanceconfig key; golden-fixture the default prompt. Foundation phase — 83b/c/d build on its section anchors. RFC §6.2. Deps: 45. Seedocs/plans/phase-83a-react-prompt-structured-sections.md. - 83b — ReAct tool schema injection (catalog rendering). Extend
tools.ToolwithExamples []ToolExample(tag-rankedminimal > common > edge-case); upgrade<available_tools>rendering to emitargs_schema,side_effects, and curated examples per tool. Closes the args-validation-failure cascade caused by name+description-only catalog rendering. RFC §6.2, §6.4. Deps: 83a, 26. Seedocs/plans/phase-83b-react-tool-schema-injection.md. - 83c — ReAct dynamic repair guidance + planning hints. Add per-run
RepairCounters{FinishRepair, ArgsRepair, MultiAction}onRunContext; render escalatingreminder → warning → criticalhints per turn when counters trip; wireRunContext.PlanningHintsinto<planning_constraints>. Closes the across-step feedback loop Phase 44 (per-step repair) leaves open. New decisions entry D-145 scopes counters toRunContext(not the planner struct) per D-025 concurrent-reuse contract. RFC §6.2. Deps: 83a, 44, 05. Seedocs/plans/phase-83c-react-dynamic-repair-guidance.md. - 83d — ReAct skills + memory injection (UNTRUSTED framing). Render
RunContext.MemoryBlocksandRunContext.SkillsContextinto the system prompt as separatellm.ChatMessageentries with the five-line anti-prompt-injection rule list from brief 13 §2.3. Distinct<read_only_external_memory>/<read_only_conversation_memory>wrappers preserved per tier;<skills_context>for pre-retrieved skill bodies. Serialisation failures fail loudly viaErrMemoryBlockUnserializable. RFC §6.2, §6.6, §6.7. Deps: 83a, 23, 37. Seedocs/plans/phase-83d-react-skills-and-memory-injection.md. - 83e — ReAct reasoning channel decoupling (capture-vs-replay). Drop
ReasoningfromDecision_CallTool; extendllm.CompleteResponsewithReasoning string; bifrost driver readsBifrostChatResponse.Choices[0].Message.ReasoningDetails— closing both the unary-path gap (todayOnReasoningis streaming-only) and the Gemini-direct black hole (today bifrost populatesreasoning_details[]on the message but Harbor drops it). Reasoning persists onTrajectoryStep.ReasoningTrace; replay is operator-controlled per agent viaPlannerConfig.ReasoningReplayenum (neverdefault for ALL models,textopt-in). Noprovider_nativemode in V1 (Bifrost docs don't cover thinking-block round-trips). New decisions D-147 (schema narrowing) + D-148 (replay knob shape — two enum values, deferprovider_native). RFC §6.2, §6.5. Deps: 45, 32, 33, 44. Seedocs/plans/phase-83e-react-reasoning-channel-decoupling.md. - 83f — Dev RunLoop populates the 83-band RunContext (D-149). Closes the Wave 15 §17.5 audit's W3/W4 (issue #208).
cmd/harbor/cmd_dev_runloop.go::runOnenow fetches the task'sQuery, session-scoped memory viaMemoryStore.GetLLMContext, session-scoped skills viaSkillStore.Search, allocates a per-run*RepairCounters, and projects operator-suppliedplanner.PlanningHintsfrom the newplanner.skills_context_max+planner.planning_hintsYAML keys. Memory/skills fetch errors fail loud withMarkFailed(code=runtime_fetch_error); the LLM is never called on a degraded run. RFC §6.2, §6.6, §6.7. Deps: 83c, 83d, 23, 37, 20. Seedocs/plans/phase-83f-react-prompt-band-runtime-consumers.md. - 83g — MCP southbound consumer in
harbor dev(D-150). Closes the parallel consumer gap for the Phase 28 MCP driver, surfaced during the 83f operator validation.cmd/harbor/cmd_dev.go::bootDevStacknow iteratescfg.Tools.MCPServers[], spawns each viamcpdrv.New+Connect, discovers tools viaDiscover, and registers eachToolDescriptoron the tool catalog. Boot fails loud (mcp[<name>]: <stage>: <err>) on any connect / discover / register failure; the operator sees the error before the binary starts serving. The MCP Registry is constructed and populated so a small follow-up phase mounts the Console MCP-page surface without re-spawning. Devstack mirror per D-094; integration test spawns a real stdio subprocess via thecmd/harbor-mcptest-stdiotest fixture. RFC §6.4. Deps: 28, 26. Seedocs/plans/phase-83g-mcp-dev-consumer.md. - 83h — Dev-binary fixes (D-151). Two hard-block bugs from the v1.1 operator validation: V1 —
cmd/harbor/cmd_dev_hot_reload.go::shouldTriggerreboot-looped the binary every ~700ms on SQLite WAL/SHM/journal sidecars (fixed: extend with adbSidecarSuffixesignore list). V2 —internal/llm/safety.gorejected every real-bifrost request withCompleteRequest.Model is emptybecause the react planner never setsModel(fixed: defaultreq.Model = c.cfg.Modelbefore structural validation). The mock LLM driver used in every existing integration test does not enforceModel, which is why the gap escaped Wave 13/14/15 audits. Together unblock real-bifrost dev-binary runs. RFC §6.5, §8. Deps: 83g, 64, 32. Seedocs/plans/phase-83h-dev-binary-fixes.md. - 83i — RunContext wiring closure (D-152). Closes the four root causes of the Wave 17 operator-validation "64 steps, 0 tool calls" failure mode: (1) the steering RunLoop's
default:case dropped everyCallTooldecision — fixed with a newsteering.ToolExecutorseam +RunSpec.ToolExecutorfield + trajectory-append on every dispatched step; (2)RunContext.Catalogwas never populated — fixed withruntimeCatalogViewprojecting the productiontools.ToolCatalogthrough a per-run identity filter; (3) heavy tool results (1.5 MB MCP responses) leaked verbatim into the LLM prompt — fixed with D-026-shaped artifact-store promotion in the dev binary'sdevToolExecutor(results abovecfg.Artifacts.HeavyOutputThresholdBytesget stored + a small summary{tool, size_bytes, truncated, preview, artifact_ref}is rendered into the LLM observation); (4)MemoryStore.AddTurnhad no production caller — fixed inrunOneonFinishGoal. The runOne also populatesRunContext.Emitso the planner'splanner.decision/planner.finish/planner.repair_guidance_injectedevents reach the bus. Live validation: 2 LLM calls end-to-end againstmcp-youtube. Devstack mirror per D-094. RFC §6.2, §6.6, §6.8. Deps: 83f, 83g, 83h, 26, 23. Seedocs/plans/phase-83i-runcontext-wiring.md. - 83n —
harbor init+ tiered yaml + docs/CONFIG.md + built-in tools (D-153). Introducesharbor init— the operator-facing entry point that drops a tiered, commentedharbor.yamlplusAGENTS.md/CLAUDE.md/README.mdcompanion files into a fresh directory. The yaml has three tiers: REQUIRED (identity + four commented LLM-provider examples — OpenRouter, Anthropic, OpenAI, NVIDIA NIM — all reachable through bifrost), COMMON KNOBS (memory, planner, tools, skills, governance) all commented with sensible defaults, and ADVANCED with a pointer todocs/CONFIG.md. Companion files explain the workflow (init → validate → scaffold → dev). Shipsdocs/CONFIG.md— the full operator-facing knob reference with one### <yaml.path>heading per leaf field onConfig{}— plus a Go test (internal/config/doc_drift_test.go) that fails CI when a new config field lands without documentation. Also ships the first two opt-in built-in tools atinternal/tools/builtin/:clock.now(current UTC time) andtext.echo(echo input verbatim). The newtools.built_in []stringyaml field registers built-ins by name; the validator mirrorsbuiltin.KnownNames()per the §4.4 seam pattern.bootDevStackcallsbuiltin.Register(toolCat, cfg.Tools.BuiltIn)between catalog construction and the catalog-wiring step; devstack mirrors per D-094. RFC §8, §6.4. Deps: 67, 63, 26. Seedocs/plans/phase-83n-harbor-init.md. - 83u — Console DB chicken-and-egg fix (D-163). Closes round-2 walkthrough F3: the Connected Runtimes add-form on Settings called
console_db::addRuntime→runtimes.upsert(...)on a DB that required an activeRuntimeConnectionto derive its per-operator AES key. Operator without a Runtime → no connection → DB stays closed → form threw "Console DB not open — attach to a Runtime first". The form was reachable (Phase 83p) but structurally non-functional. Fix: newattachConnection(baseURL, opts)helper inweb/console/src/lib/connection.tswrites theharbor.runtime.*localStorage keys first (the operator's primary intent — "make the Console talk to this Runtime");console_db.svelte.ts::addRuntimecallsattachConnection()first, then attempts the DB upsert and degrades to a non-fatal warning if the DB is still locked. On the post-attach page reload, a new private#catchUpAddressBook()invoked fromload()inserts the active connection into the address book if it's not already there (is_default: 1). Playwright test follows the disconnected-boot → form → reload → connected flow end-to-end. RFC §5, §7. Deps: 73m, 73p, 83p. Seedocs/plans/phase-83u-console-db-chicken-and-egg.md. - 83v — Runtime CORS allowlist (D-162). Closes round-2 walkthrough F4 — the showstopper that broke the D-091 multi-process Console+Runtime posture at the wire. The pre-83v
grep -rn 'Access-Control\|cors' --include='*.go'returned zero matches anywhere in the Go codebase; cross-origin requests from a Console (:18790) to a remote Runtime (:18080) were blocked at preflight. Fix: newinternal/protocol/transports/cors/package withWrap(next http.Handler, cfg Config) http.Handlermiddleware; newServerConfig.AllowedOrigins []string+ServerConfig.CORSDevAllowAny boolyaml fields. Default deny (empty list = no CORS headers = same-origin only). Per-origin echo of the request'sOriginheader after exact-match allowlist check (never*in production).Access-Control-Allow-Credentials: true(incompatible with*per CORS spec, which forces the per-origin shape). Validator rejects*in the allowlist unlessserver.cors_dev_allow_any: trueis also set; when the dev flag fires, every boot prints a stderr banner[DEV-ONLY CORS WILDCARD — DO NOT USE IN PRODUCTION]. Middleware wraps both REST + SSE handlers incmd/harbor/cmd_dev.go::bootDevStack; devstack mirror per D-094. Integration test exercises cross-origin preflight end-to-end against an httptest origin.docs/CONFIG.mddocuments both fields with the production-security note. RFC §5, §7. Deps: 60. Seedocs/plans/phase-83v-runtime-cors.md. - 83w — Wire-surface gaps (D-164). Closes round-2 walkthrough F5 + F6 — two wire-surface gaps surfacing as scary red ERROR PageStates on the operator's most-used debugging surfaces. F5 (Console side):
topology.snapshotreturnsunknown_methodon a planner/RunLoop runtime (no engine graph); the Live Runtime + Playground pages routed the error through PageState's red ERROR branch with a Retry button that always failed. Fix: new'info'branch added to PageState'sPageStatusunion (additive to disconnected/loading/error/empty/ready); newisUnknownMethod(err)helper inweb/console/src/lib/protocol/errors.ts; both pages special-caseunknown_method→ route to PageState's info branch with "Topology view not available on this Runtime — planner/RunLoop runtime, not engine-graph" copy + no Retry button (retry is meaningless for not-applicable surfaces). F6 (Go side):mcp.servers.listwas missing from the Runtime's wire surface even though the*mcp.Registrywas already constructed at boot (Phase 83g) and Tools page rendered six youtube tools fine — just no method handler. Fix is wiring-only:cmd/harbor/cmd_dev.go::bootDevStackconstructs the Phase 73kMCPSurfacefrom the boot-time registry + threads it intotransports.NewMuxviatransports.WithMCPSurface(mcpSurface); newmcpconsole.NoOAuthAccessorprovides the read-only access pattern for V1harbor dev(no OAuth providers); OAuth-flow methods fail loudly withErrNoOAuthConfiguredper §13. Devstack mirror per D-094. Integration test asserts the wire surface returns 200 (not unknown_method). RFC §5, §6.4, §7. Deps: 83g, 83m, 73k. Seedocs/plans/phase-83w-wire-surface-gaps.md. - 83x — Real-data layout polish (D-165). Closes round-2 walkthrough W4-W11 + N11-N14 — the "every page has a paper cut" backdrop. Twelve items spanning the Console (memory key ellipsis W4; artifacts grid layout 1fr × right-rail W5; tasks kanban Complete column W7; events empty-state names
events.driver: durableW9; live-runtime session status derived from strip aggregate W10; agents synthetic-default copy W11; overview "(now)" suffix N11; tools "In-flight (now)" relabel N12; tools reliability column width token N13; live-runtime pillar labels "(now)" N14) plus two cross-stack Go-side fixes (§17.6): W6 —Artifact.created_atwas the Go zero value0001-01-01T00:00:00Zbecause two call sites populating the storage Source map silently omitted the timestamp — fixed atcmd/harbor/cmd_dev_executor.go::projectForLLM(heavy-tool promotion,time.Now().UTC()) +internal/protocol/artifacts.go::handlePut(artifacts.put upload,s.clock()so unit tests with injected clock stay deterministic). W8 — SessionRegistry held zero rows under the dev token so the Console Sessions page rendered "No sessions match these filters" even mid-task; fixed bybootDevStackopening the dev session right after constructing the registry, swallowingErrSessionAlreadyOpenfor idempotency. RFC §5, §6.4, §6.6, §6.10, §6.13. Deps: 73m, 73p, 83i, 83m. Seedocs/plans/phase-83x-real-data-layout-bugs.md. - 83s — Saved-views label + per-page footer dedup (D-161). Closes Nits N2 + N7 from the post-83k visual walkthrough. Settles on the canonical pair
"Save view"(button) +"Save current as…"(input placeholder) across every page that surfaces a saved-view save gesture (eight pre-83s phrasings drifted to two). Removes every per-page inlineDisconnected · no Runtime attachedindicator — the viewport-fixedConnectionFooteris the single source of truth; pages now show ONE disconnected indicator per viewport instead of two stacked ones. RFC §5. Deps: 73m, 73p. Seedocs/plans/phase-83s-savedviews-and-footer-dedup.md. - 83r — Disconnected-state hygiene (D-160). Closes Walkthrough Bugs W1 + W2 + W3 + Nits N4 + N5 + N8 + N9 + N10. New
isDisconnected()predicate +DISCONNECTED_TOOLTIPconstant inweb/console/src/lib/connection.ts— every page composes it via$derived(connection === null)locally. Action buttons + filter controls reach the same predicate (disabled + tooltip when disconnected; W2/W3). The Overview Cost Rollup card stops rendering synthetic$0.00data when disconnected (W1). The Tools page renders ONE empty-state message instead of two (N5). Agents KPIs use—matching Tools (N4). MCP Connections status chips desaturate via a newdesaturatedprop that flipsdata-kindtoneutral(N8). Artifacts subtitle reads "— no Runtime attached" when disconnected (N9).<PageState>adds vertical-centring CSS (min-height: 40vh) so empty-state placeholders centre in the viewport instead of hugging the top (N10). Newweb/console/tests/disconnected-state.spec.tsPlaywright spec covers the cluster. Bundled §17.6 fix: pre-83r ESLint break insettings/+page.svelte:94(Phase 83p placeholder) — fixed inline. RFC §5. Deps: 73m, 73p, 83p. Seedocs/plans/phase-83r-disconnected-state-hygiene.md. - 83q — Playground sidebar nav + breadcrumb (D-159). Closes Bug F2 + Nit N1 from the post-83k visual walkthrough. The Console's
(console)/+layout.sveltedefines theNAVconstant (cluster → items) AND derives the breadcrumb'scrumbLabelfrom the same NAV by URL-segment match — so adding{ label: 'Playground', href: '/playground' }to the EXECUTION cluster fixes F2 (Playground unreachable from sidebar) AND N1 (lowercaseplaygroundbreadcrumb) in one stroke. Also rewritesdocs/design/console/CONVENTIONS.md§2 which explicitly declared "Playground is NOT a sidebar entry" (a stale Phase 73n design call). Playwright tests bump cardinality from ≥13 to ≥14 sidebar links + explicit Playground assertion. RFC §5, §7. Deps: 73n. Seedocs/plans/phase-83q-playground-sidebar-nav.md. - 83p — Settings two-group layout (D-158). Closes Bug F1 from the post-83k visual walkthrough: the Settings page wrapped its WHOLE cards loop in
<PageState>, so the disconnected state short-circuited every section to the "Not connected — Attach one in Settings" placeholder — hiding the Connected Runtimes add-form an operator needs to fix the disconnection.SettingsState.load()'s docstring already documented the intended split ("Console-local sections do NOT depend on the runtime posture"); the template ignored it. Fix: eachSETTINGS_SECTIONSentry now carries agroup: 'console-local' | 'runtime-posture'discriminator; the page template renders console-local sections (Connected Runtimes, Per-Runtime Auth, API Tokens, Appearance, Time & Locale, Keybindings, Notifications Routing) unconditionally and routes only runtime-posture sections (Runtime Info, Governance Posture, Storage Drivers, LLM-Provider Posture, About) through<PageState>. Playwright test extension asserts the add-form is reachable + the input fields render in the disconnected state. RFC §5, §8. Deps: 73m, 73p. Seedocs/plans/phase-83p-settings-add-runtime-form.md. - 83k — Console release embed (D-157). Closes the operator-validation gap where
cmd/harbor/consoledist/*is gitignored (only.gitkeepcommitted) — a freshgit clone+go build ./cmd/harbor(orgo install github.com/.../cmd/harbor@latest) produces a binary that embeds an EMPTY Console bundle and serves the synthesized placeholder page. Fix: (1)make buildnow runsmake console-buildas a prerequisite (operators whogit clone && make buildget a working binary the first time); (2) a newmake build-fastpreserves today's no-Console-rebuild path for iterative dev work; (3)scripts/release-build.shrebuilds the Console beforego build, so tagged-release artifacts always carry a fresh Console; (4) a newscripts/check-console-bundle.shstaleness gate (wired into the CIfrontend-e2ejob) asserts two consecutivemake console-buildruns produce byte-identical outputs (catches non-determinism in the build); (5) the placeholder page copy is reworded with the exact rebuild commands + ago installworkaround + a pointer toharbor initfor first-time operators + a link todocs/CONFIG.md. The "first-run reach" polish half (favicon, brand tokens, empty-state copy) is intentionally deferred to a post-walkthrough follow-up — shipping release-embed first means the visual walkthrough runs against the fixed binary. RFC §5, §8. Deps: 73m, 81, 83n. Seedocs/plans/phase-83k-console-release-embed.md. - 83m — WARN cleanup band (D-156). Closes the eight WARN-tier items the §17.5 audit + Wave 17 operator validation surfaced. Item 1: MCP
Config.DefaultIdentityreuse across pushes is now a fallback — the driver prefersidentity.From(ctx)per call (multi-isolation footgun closed). Item 2: hot-reload watcher'sdbSidecarSuffixesextended with.sqlite+.dbmain files (the reboot-loop was slower without it, not absent). Item 3:bootDevStackappendsagentRegistry.Close+draftStore.Closeto the closer chain (goroutine + file-handle leak on shutdown closed). Item 4:extractSkillKeywordshelper drops English stopwords + 1-char tokens + dedupes beforeskills.Search(FTS5 ranker now sees keyword-shaped queries, not full sentences). Item 5:internal/llm/safety.go::Completeprefersc.cfg.Timeoutover the 5-minute default (operator'sharbor.yamltimeout was being silently ignored). Item 6: newtools.granted_scopes []stringyaml field replaces the runloop's hard-codednilpass tonewRuntimeCatalogView(catalog filter now actually applies operator-declared scopes). Item 7:tasks.Task.ToolCountfield +TaskRegistry.IncrementToolCount(ctx, id) errormethod (with conformance + N=128 D-025 concurrent-reuse test) +projectRowprojection + runloop wiring closes the deadprototypes.Task.ToolCountwire field (Console rendered 0 forever before). Item 8:RunContext.OnReasoning func(string)callback (option b design — keeps the Decision sum sealed, treats reasoning as per-step observation rather than per-step instruction) + runloop's per-step closure capture +Step.ReasoningTracecopy on trajectory append makes Phase 83e'sReasoningReplay=textmode structurally effective in production for the first time. Shipped via two parallel worktree-agent buckets per §17.7 cadence + coordinator integration. Devstack mirror per D-094 for items 1, 3, 4, 6, 7, 8. RFC §6.2, §6.4, §6.5, §6.8, §8. Deps: 83g, 83h, 83i, 83l. Seedocs/plans/phase-83m-warn-cleanup.md. - 83l — Real-bifrost integration tests + production-bug fix (D-155). Closes the audit lesson D-151 named verbatim: every existing dev-binary integration test used the mock LLM, which masked two real-bifrost+real-stack bugs through Wave 13/14/15. Ships
test/integration/phase83l_real_bifrost_test.go— two tests + a scripted OpenAI-compatiblehttptest.NewServerhelper (scriptedLLMServer) — exercising the fullcmd/harbor-shape stack (bifrost driver, safety/correction/retry wrapper chain, react planner, steering RunLoop, ToolExecutor seam, trajectory append, memory writeback). The first run of the tests immediately surfaced a latent production bug:cmd/harbor/cmd_dev.go::bootDevStackconstructed thellm.ConfigSnapshotWITHOUTcfg.LLM.CustomProviders,cfg.LLM.NetworkDefaults, orcfg.LLM.Corrections— an operator who declared a custom provider (NIM / vLLM / ollama / in-house gateway) would pass config validation but fail at boot withbifrost: invalid provider … declared custom: (none). Fix lands in this PR per §17.6 (three new projection helpers + the snapshot wiring; D-094 mirror inharbortest/devstack/devstack.go). Exactly the failure mode D-151 predicted; the mock LLM hid it; the integration test caught it on first contact. RFC §6.5. Deps: 33, 33a, 45, 83h, 83i. Seedocs/plans/phase-83l-real-bifrost-tests.md. - 83o — scaffold reads operator yaml + per-custom-tool Go stubs +
--patch(D-154). Closes the operator workflow Phase 83n opened.harbor scaffoldnow reads the operator-editedharbor.yaml(explicit--from-config <path>or auto-detected./harbor.yaml), copies it verbatim into the output project (operator's comments + uncommented LLM block survive), and fans out onetools/<name>.go+ matching_test.goper entry under a newtools.customyaml field. Each custom tool stub carries typed Input/Output structs derived from the operator's flatfield: typedeclarations (string/integer/number/boolean/[]string) plus aTODO: implementHandle. The generatedagent.goincludes aRegisterTools(cat tools.ToolCatalog) errorfunction that registers each built-in (fromtools.built_in) + each custom tool's Handle so the operator's binary bootstrap is one call. A new--patchflag relaxes the refuse-overwrite default: existing files are skipped (Skipped slice on Result), only new tools materialise. The validator rejects name collisions betweentools.customandtools.built_inand rejects unknown shorthand types.docs/CONFIG.mddocumentstools.custom; the doc-drift gate caught the missing heading mid-implementation. RFC §8, §6.4. Deps: 67, 83n, 26. Seedocs/plans/phase-83o-scaffold-from-yaml.md. - 84 — Reflection / critique loop. Optional per planner. Self-critique before Finish. RFC §12. Deps: 45.
- 84b — Multimodal attachment disposition policy (D-189). Turns the hardcoded MIME→disposition
switchinmaterializeOneinto declared policy: anAttachmentDispositionenum (ref/inline/provider_native/tool:<name>) resolved per-attachment caller hint > per-agent policy map > runtime default (ref) — the layers are semantic, the carriers (Protocol hint,harbor.yaml) are thin adapters over a planner-homed policy core (DispositionPolicy+ the exported pureResolveDisposition), so a headless library consumer authors the same policy directly. The default is byte-for-byte unchanged from today — so the existing developer-controllableArtifactStub+Fetch.Toolpath stays first-class for Playground, Protocol, and third-party clients, and 84c's provider-native upload becomes opt-in rather than forced. Ships no provider mechanism and no embeddings. §13 consumer: the materializer + the same-wave 84c. RFC §6.4, §6.5, §6.10. Deps: F11/D-166, 107c. Same wave as 84c. Seedocs/plans/phase-84b-multimodal-disposition-policy.md. - 84c — Provider-native multimodal mechanism (D-190). Implements the
provider_nativedisposition: the bifrost driver hands an over-threshold attachment to the provider's own understanding viaFileUploadRequest→file_id(already oncore@v1.5.15), performed insideCompletesoLLMClientstays one method. Priority order is deliberate — image/audio/video first (regain vision/audio/video capability the stub path loses), PDF/documents last (theref/tool+84d route is preferred for docs). Adds the part-levelProviderNativeflag (settable by anyCompleteRequestbuilder — the driver is the ONLY seam; the run loop never pre-uploads), optionalProviderFileID/DocumentTypecontent fields, a driver-owned identity-scopedfile_idcache + lifecycle (TTL/evict +Close-time cleanup; observability via thellm.provider_file.uploadedevent), and the streaming-with-multimodal residual that Phase 107's row forward-referenced (in 107'sreq.Stream+llm.completion.chunkvocabulary).ArtifactStubstays the universal degradation. Opt-in via 84b; never the default. Shipped deviation (§4.3, D-190): the optional run-loop cancel hook is not wired — the wrappedLLMClientchain would need a forwarding method through five wrapper layers; the driver-owned lifecycle (TTL/LRU evict +Closesweep, with per-key fill coalescing) is the authority, per the SDK-lens C3 guidance. RFC §6.5, §6.10, §11Q3. Deps: 84b, 107, 32. Same wave as 84b. Seedocs/plans/phase-84c-provider-native-multimodal.md. - 84d — Embedding client + semantic retrieval (D-191). Adds Harbor's first embeddings capability — an
Embedder§4.4 seam wired to bifrost'sEmbeddingRequest— with its §13 in-wave consumers being semantic memory retrieval and semantic skill retrieval (the direction set by the owner; not a standalone RAG tool). Both are opt-in modes composing with (not replacing) rolling_summary memory + token-savvy skill retrieval; vectors persist in the existing stores (brute-force similarity at V1 scale; ANN deferred). This is the primitive that makes 84b'sref/tooldocument path powerful. Requires a §6.5 RFC addendum (theEmbedderseam) landed in the same PR. RFC §6.5, §6.6, §6.7. Deps: 32, 23, F11/84b. Follows the 84b+84c wave. Seedocs/plans/phase-84d-embedder-semantic-retrieval.md. Shipped as planned with four recorded §4.3 deviations (see D-191): the seam lives ininternal/embeddingswith the driver blank-imported via the D-196internal/drivers/prodaggregator (the plan's pre-110c "blank-import atcmd/harbor" wording); the interface carries a lifecycleClosealongsideEmbed; the skills-side injection resolved to the store seam (skills.Deps.Embedder+ localdb's semantic Search path) rather than a tool-constructor seam; the §6.5 addendum pre-landed with the D-189 plans PR, so this PR's RFC delta is the D-191 contract sentence + the §6.6/§6.7 consumer text. - 84e — Semantic memory consumption in the run loop (D-211). Closes the gap 84d (D-191) left by design:
MemoryStore.SearchTurnsshipped with store/SDK consumers only — nothing in the run loop called it, so the agent never semantically recalled earlier conversation turns and the planner prompt's memory injection (the 83d path) stayed rolling-summary-only. 84e makes the run loop, whenmemory.retrieval: semanticis on (the ONLY switch — no second knob), search the session's embedded turns with the task query and inject the top-k recalled turns into the prompt's<read_only_external_memory>tier — theplanner.MemoryBlocks.Externalslot that was nil on every production path since 83d, so the planner gains zero new surface and recall inherits the UNTRUSTED framing for free. Composes withrolling_summary(the Conversation tier is byte-untouched); mode off → byte-for-byte prompt parity, zero embedder traffic. The fetch+recall step is promoted to ONE home,runctx.FetchMemoryBlocks, consumed by bothcmd/harbor's runOne and the devstack mirror — the per-stepGetLLMContextmirror copies collapsed. Knobs ride the existing memory block (retrieval_top_kreused; newretrieval_min_scoresimilarity floor, default 0.0, range [-1,1], validated) via a 110c-shapedmemory.RecallFromConfigexporter with field-parity test. Degradation posture is fail-loud: a recall error fails the run (runtime_fetch_error, LLM never called) — never a silent fall-back to rolling-summary-only. Recalled turns are text-only and per-turn capped (2 KiB per side; D-026 guard stays the backstop). Records the deferred sibling: amemory.searchProtocol method must precede any Console memory-search page (D-062 ordering) — parked for post-109 planning. RFC §6.2, §6.5, §6.6. Deps: 84d, 83d, 83f, 110b, 110c, 107c. Shipped as planned. Seedocs/plans/phase-84e-semantic-memory-runloop.md.
110-band — Wave B SDK re-homing (production semantics out of cmd/harbor)
The 2026-06-09 SDK friction audit (docs/notes/sdk-friction-audit.md; program entry D-193) found a package-main stratum of production semantics that lives only in cmd/harbor with an already-diverged D-094 devstack mirror (two shipped/live silent-field-drop bugs — D-155, B3 — plus a degraded executor, a silently-dropped MCP ToolPolicy projection, and missing Emit/OnChunk/envelope wiring on the official test surface). The 110-band promotes that stratum into reusable internal/ packages and collapses the mirror to thin callers — every promotion deletes a devstack copy in the same phase (§13 primitive-with-consumer + §17.6 fix-both-sides). Staging: Stage 1 = 110a ∥ 110c (independent), Stage 2 = 110b ∥ 110d (after Stage 1 merges). The band is module-internal; the external-module facade (the audit's Wave D) is a future RFC-level program for which 110d is the named prerequisite.
- 110a — Tool-executor promotion (SHIPPED — D-194). Promotes the only production
steering.ToolExecutor(cmd/harbor/cmd_dev_executor.go, ~660 lines: catalog dispatch, D-026 heavy-result artifact promotion,CallParallelviainternal/runtime/parallel, SpawnTask/AwaitTask with depth caps) tointernal/runtime/dispatch.NewToolExecutor(catalog, artifacts, tasks, opts...). Also exports the Phase-106 answer envelope{answer, finish_reason, tool_calls_seen}+ terminal task error-code constants asplanner.AnswerEnvelopeet al. (home picked by import direction — tasks stays planner-free), re-homes the catalog→planner view astools.NewPlannerView(structural satisfaction ofplanner.ToolCatalogView; tools cannot import planner), re-pointsinternal/planner/react/prompt.go's shape-contract citation away fromcmd/harbor, switches the D-192 HITL E2E off its test-local executor shim onto the real promoted executor, and DELETES the devstack degraded executor (capability drift closed). D-025 concurrent-reuse test (N≥100,-race) mandatory. RFC §6.4, §6.5, §6.2. Deps: D-192 fix (merged), 107d, 107e, 83i. Stage 1, parallel with 110c. Seedocs/plans/phase-110a-tool-executor-promotion.md. - 110b — RunContext population + event-closure promotion (SHIPPED — D-195). Promotes the five RunContext-population helpers duplicated cmd↔devstack into
internal/runtime/runctx(direction-safe: runtime imports planner/memory/skills; planner gains NO memory import):ProjectMemoryBlocks,ProjectSkillsContext,ExtractSkillKeywords+ stopwords (the D-156 FTS5 query shaping — a third copy existed; godoc carries the "scheduled for deletion by Phase 111d (D-201); add no new consumers" notice per the owner's 2026-06-09 scope amendment),ExtractAssistantAnswer, and the D-166ResolveInputArtifactspolicy — following theplanner.BuildArtifactManifestprecedent. Addsevents.IdentityStampingEmitter(bus, q, logger)forRunContext.Emitandllm.NewChunkPublisher(bus, q, taskID, logger)forOnChunk(the closures whose identity-envelope trap once produced 280+ bus-rejected chunks per task). cmd + devstack become callers; devstack ADDITIONALLY gains missing parity: Emit/OnChunk wired in its RunSpec andMarkCompletecarrying the 110a answer envelope instead of an emptyTaskResult{}(pinned bytest/integration/phase110b_runctx_parity_test.go). Also wires the D-196 call-4 handoff one-liner: dispatch's spawn-depth default now referencesconfig.DefaultSpawnDepthCap. RFC §6.2, §6.5, §6.13. Deps: 110a, 83f, 83i, 83m, 107. Stage 2, parallel with 110d. Seedocs/plans/phase-110b-runcontext-population-promotion.md. - 110c — Config-projection exporters (SHIPPED — D-196). The five config→snapshot projections become exported helpers on the OWNING packages (settled direction: subsystem imports
internal/configadditively; config stays a leaf; the snapshot decoupling is preserved becauseFromConfigis optional sugar, never a required path):llm.SnapshotFromConfig(absorbing the four privatecopy*helpers — closing the D-155 recurrence class),memory.SnapshotFromConfig,skills.SnapshotFromConfig,planner.ConfigFromOperator(fixing the LIVE devstack drift B3 — four planner knobs silently dropped today, pinned by a reflection field-parity test),governance.ConfigFromOperator. Plus:config.Defaults()exported (hand-built configs start defaulted), planner-adjacent knob projections re-homed (skills_context_maxdefault,planner.HintsFromConfig, spawn-depth default deduped), a headless validation profile (ValidateCore— config-without- binary stops demanding JWT identity fields), and ONE blank-import aggregator (internal/drivers/prod) imported bymain.goand devstack — also closing devstack's missing-LLM-wrapper trap (no corrections/downgrade/retry on its chain today). cmd + devstack consume every projection; all duplicates deleted. Shipped as planned, plus one §17.6 cross-fix the parity gate surfaced: bothcopyModelProfilescopies silently dropped the per-modelcost_overrides:/corrections:yaml (a third D-155-class drop);llm.SnapshotFromConfigmaps both, pinned by the sub-struct parity tests. The spawn-depth constant is exported asconfig.DefaultSpawnDepthCap; the executor-side clamp (110a'sinternal/runtime/dispatch) references it at Stage 1 merge (parallel worktrees). RFC §6.5, §6.6, §6.7, §9, §10. Deps: 83l, 83f, 107d, 107e. Stage 1, parallel with 110a. Seedocs/plans/phase-110c-config-projection-exporters.md. - 110d — Assembly promotion (D-197). SHIPPED. Promoted devstack's
tryAssembleshape into the exported, error-returningassemble.Assemble(ctx, *config.Config, Options) (*Stack, error)ininternal/runtime/assemble;bootDevStackanddevstack.Assemble(t, ...)are thin wrappers — the last of the D-094 subsystem-wiring mirror is collapsed. Promoted the remaining cmd-local assembly legs:mcpdrv.Attachnext to the driver INCLUDING the config→ToolPolicyprojection (the devstack silent drop is closed — regression pinned inphase83g's real-stdio-fixture E2E + the attach unit E2E),auth.BuildProviders(OAuth KEK→sealer→tokenstore→provider chain; per §4.3 it returns the provider map only — approval gates remain the catalog Builder's output, which the assembly invokes), andevents.OpenWith(ctx, cfg, redactor, Deps{State})+events.RegisterWithDepsso the durable event driver shares the runtime's StateStore through the factory path (recorded reconciliation: the assembly opens State BEFORE the bus so the shared store outlives it — production's pre-110d bus-first order swapped, behaviour preserved). The per-task run-loop driver (thetask.spawnedsubscriber) stays per-caller (110b's seam); headless embedders driveStack.RunLoop.Rundirectly. Shipsdocs/recipes/embed-harbor-headless.md, acceptance-gated bytest/integration/phase110d_assemble_test.go(recipe path on real drivers + durable-store sharing + identity propagation + 2 failure modes + N=10 Assemble/Close cycles + N=100 concurrent runs on one stack). RFC §6.4, §6.13, §9, §10. Deps: 110a, 110b, 110c, 64, 83g, 30, 57. Stage 2, parallel with 110b. Seedocs/plans/phase-110d-assembly-promotion.md.
111-band — Wave C: finish (or formally defer) the half-shipped primitives (SDK friction audit §3)
The 2026-06-09 SDK friction audit (docs/notes/sdk-friction-audit.md) found a band of shipped primitives with zero production consumers anywhere — not even the dev binary — several behind config knobs that validate cleanly and then silently do nothing. These are standing §13 violations (primitive-without-consumer; test-stubs-as-defaults one layer up: seams never exercised under real call sites). The 111 band gives each its first production consumer or a recorded disposition. Staging: the band parallelizes freely after Wave B Stage 1 (110a + 110c) merges; 111a soft-depends on 110c; all six phases are mutually independent. D-numbers D-198–D-203 are reserved per phase (logged when each ships).
- 111a — Governance enforcement assembly (D-198). SHIPPED. Closed the audit's headline gap:
governance.SetFactory's only caller was a test; populatedgovernance.identity_tiersdrove the posture provider only — clean validation, zero enforcement. Shipped exportedgovernance.NewSubsystemFromConfig(cfg, store, bus)(Compound in the documented MaxTokens→RateLimiter→CostAccumulator order; nil Subsystem on empty tiers preserving the D-044 latent default;ErrInvalidConfigon nil store/bus with tiers), called viaSetFactoryfrom the production assembly (assemble.Assemble, eager build → fail-loud boot → factory installed BEFOREllm.Open; empty tiers clear the factory andStack.Closeclears it again); consumes 110c'sgovernance.ConfigFromOperator; documentsgovernance.Wrapas the multi-runtime headless escape (SetFactory-vs-per-Open evaluated + decided: keep the global seam for the binary); removed the Wave A posture-only boot warning (§4.3 correction: the warning lived inassemble.gopost-110d, notinternal/config/validate.go—validateGovernancenever carried one); E2E proves a configured tier actually rejects/limits (cost + rate + MaxTokens each exercised against the real assembled stack, with identity-propagatedgovernance.*events, cross-session isolation, and D-025 concurrency). RFC §6.15, §6.5, §6.11. Deps: 32, 36a, 36b, 110c (soft). Seedocs/plans/phase-111a-governance-enforcement-assembly.md. - 111b — Tool-OAuth completion leg (D-199). SHIPPED.
auth.CallbackHandler(state→PendingFlow→CompleteFlow→typed 404/410/400/502 error mapping; no secret material in responses or logs) isCompleteFlow's first production caller, mounted byharbor devatGET /v1/tools/oauth/callback(devstack mirrors; both overassemble.Stack.OAuthProviders— D-197) and mountable headless on any mux. The full choreography E2E (test/integration/phase111b_oauth_completion_test.go): gated tool →tool.auth_required+pause.requested→ authorize → 302 redirect onto the handler →CompleteFlow→pause.resumed{Decision: resume}→ run re-enters and the tool succeeds USING the minted token; failure modes: expired flow (410 + pause parked), replayed callback (404 by consumption). §4.3 refinements recorded in D-199:PendingFlowreturns(PendingFlowInfo, bool);DenyFlowadded (upstream denial →DecisionRejectresume); both on theOAuthProviderinterface; the run-level re-entry rides the existing steering RESUME surface (the recipe documents the honesty note). Recipe:docs/recipes/steer-and-resume-a-run.md. Closed theInitiateFlow/CompleteFlowprimitive-without-consumer pair (§13). RFC §6.4, §3.3, §6.3. Deps: 30, 50, 31, the D-192 steering fix. Seedocs/plans/phase-111b-tool-oauth-completion.md. - 111c — Durable pauses + pause lifecycle (D-200).
WithCheckpointStorehas zero production consumers (both assemblies construct the Coordinator storeless with the StateStore in scope);requestPausepersistsTrajectory: nil; no pause GC exists —DecisionTimeouthas no producer and cancel-while-paused orphans records forever. Threads the run's Trajectory intorequestPause; wiresWithCheckpointStore(stateStore)in both assemblies; pause→new-Coordinator-over-same-store→Resume→trajectory-restored E2E (+ the §11ErrUnserializablefail-loud test); shipsWithMaxParkDuration+ the exportedpauseresume.RunSweeperemittingpause.resumedwithDecisionTimeout— its first producer — started config-gated by the one mergedassemble.Assemblesite (cmd + devstack inherit as thin callers). Shipped deviation (§4.3, recorded in the plan + D-200 §5): the sweeper's SCAN is package-internal over the Coordinator's registry rather thanCoordinator.List— List is §6-identity-scoped with no all-tenants wildcard (the plan's Risks anticipated exactly this); every MUTATION still goes through the publicResumeunder the pause's own scope. Timeout is terminal: the parked run finishesFinish{ConstraintsConflict}via the bus-event wake +Status.Decisionfallback. RFC §3.3, §6.3, §6.11. Deps: 50, 51, the D-192 steering fix. Seedocs/plans/phase-111c-durable-pause-lifecycle.md. - 111d — Skills canonical surface + ingestion (SHIPPED — D-201). Three intertwined gaps closed: the rich Phase-38 planner tools (capability filter, redaction, budgeter) + Phase-41 generator were registered nowhere while production registered a thinner parallel builtin implementation (the §13 two-implementations smell); Skills.md ingestion had no shipped path; the Phase-39 Directory had only test consumers. Shipped: builtin
skill_search/skill_getdelegate to the exported Phase-38 handlers (duplicate bodies deleted; filter/redaction/budgeting on production; the capability envelope is server-computed from the run's visible-tool set viatools.VisibleNames, never LLM-supplied);skill_list+skill_proposeregister through the same carrier (skill_proposeopt-in rides the existingtools.built_innames list — the plan's sketched…skill_propose.enabledkey was dropped as a second parallel enablement mechanism, §4.3 deviation in the plan);harbor skill import/rmship over exportedimporter.ImportAndStore(+ §18 SKILL.md restoration in the same PR); the Directory was WIRED per the recorded owner decision (2026-06-09) as the<skills_context>producer —runctx.ExtractSkillKeywordsdeleted per its D-195 deprecation notice;skills.directory.{pinned,max_entries,selection}config block added. Note: the plan's "stalecmd/harbor/main.go:76-90promises" had already been rewritten by 110c — the surviving stale text lived ininternal/skills/tools' package doc +internal/drivers/prod's honesty notes and was replaced there. RFC §6.7, §8. Deps: 37–41, 107c. Seedocs/plans/phase-111d-skills-canonical-surface.md. - 111e — Trajectory compression consumer (SHIPPED — D-202).
planner.Summariserhas only test implementations;MaybeCompresshas zero call sites;Budget.TokenBudgetis dead on every path — while the consumer half (the React prompt'sSummary != nilbranch) is already wired. Ships a real LLM-backedTrajectorySummariser(ininternal/llm/summarizer, distinct from the unrelatedmemory.Summarizer— do not conflate), theMaybeCompressintegration insteering.RunLoop's step loop gated onTokenBudget > 0, theplanner.token_budgetconfig →RunSpec.Budgetproduction wiring, and the godoc un-dormanting. Long-trajectory E2E: compression fires, the prompt shrinks, the run completes correctly on summary-carried context. Scoped tight: one compression per run, no auto-cascade. RFC §6.2, §6.5. Deps: 46, 35, 107, the D-192 steering fix. Seedocs/plans/phase-111e-trajectory-compression-consumer.md. Shipped (D-202). One recorded §4.3 deviation: the compaction payload renders the trajectory's planner-facing projection (per-step action +LLMObservation, per-fragment capped) rather than the rawSerializebytes — raw observations may carry heavy content that must never reach the LLM edge (D-026 /ErrContextLeak); the budget estimator still measures the full serialized trajectory. - 111f — Telemetry assembly + approval-gate authorizer seam (SHIPPED — D-203). Two halves, both closed. Telemetry: pre-111f,
telemetry.New(redactor-mandatory, identity-attribute, bus-paired Logger) had zero production callers — cmd booted bare slog;engine.WithRunErrorHandlerhad no production caller; metrics gotBridgeBusToMetricsbut traces got no bridge andNewTracerwas never constructed despite the blank-imported exporters. Shipped:telemetry.New+ theStack.RunErrorHandlerwired into the production assembly (assemble.Assemble; cmd + devstack inherit as thin callers),telemetry.BridgeBusToTracerstarted symmetric with the metrics bridge (lifecycle-pair span model,DefaultTraceBridgeFilter()volume guard, both on the closer chain), anddocs/recipes/observe-an-embedded-runtime.md. Approval: pre-111f,ResolveApprovalhard-requiredinternal/protocol/authscopes and the runtime's own steering bridge self-elevated to pass its own gate — wire-layer auth vocabulary in an in-process control path. Shipped: the injectedGateDeps.Authorizerseam (runtime identity/control-scope defaultapproval.NewIdentityAuthorizer(); protocolauth via the server-sideProtocolScopeAuthorizeradapter), the protocol/auth import removed frominternal/tools/approval(the steering bridge's self-elevation deleted outright), and the direction rule recorded: runtime may import protocol TYPES, never protocol auth/methods/transports (see D-203's 2026-06-10 addendum for the<area>/protocoladapter carve-out + the namedinternal/searchstanding violation). Three recorded §4.3 deviations (D-203): (a)assemble.OptionsgainsTelemetryOptions/TracerOptions/ApprovalAuthorizer(the MetricsOptions precedent — the union of real-caller needs); (b) the Protocol-side adapter injects viaassemble.Optionsrather than the plan's "server-side gate assembly" because gate assembly lives in the ONE D-197 fan-out (the adapter stays owned byinternal/server); (c)engine/options.go's godoc rewritten, not merely "now true" (the Wave A honesty text had said "no production assembly installs one today"). RFC §6.14, §6.4, §5.1. Deps: 03, 04, 05, 31, 55, 56, the D-192 steering fix. Seedocs/plans/phase-111f-telemetry-assembly-approval-seam.md.
112-band — Wave D: the public SDK facade (D-204)
RFC §3.6 settles the design (alias-based sdk/ tree; curation over moves; D-204 records the rationale). The wave's §13 pairing: 112a ships the facade, 112b is its consumer in the same wave.
- 112a — The public SDK facade (SHIPPED — D-205). The
sdk/tree of alias-based re-exports per RFC §3.6's inventory; the facade-integrity test runs the headless recipe viasdk/imports only (grep-gated zerointernal/imports; deterministic-planner override over an offline custom-provider bifrost client so the path runs in CI without network or the mock driver);sdk/drivers/prodparity with the internal aggregator by construction (its only content is the internal aggregator's blank import); AGENTS/CLAUDE §3 amendment. No moves, no mechanism — forwards only, with ONE documented carve-out (sdk/tools/inproc.RegisterFunc, a generic wrapper Go cannot express as avarforward; smoke-gated as the solefuncin the tree). Seedocs/plans/phase-112a-sdk-facade.md. - 112b — External consumers + the compile gate (SHIPPED — D-206). Scaffold templates emit
sdk/imports (the tool-declaring output compiles AND tests green as an external module — the audit's headline external break, now gated byscripts/smoke/phase-112b.sh: scaffold → replace directive →go build, bounded + self-tested, plus an external harbortestgo testprobe); harbortest vocabulary externally satisfiable via the aliases with signatures unchanged and zero kit constructors; the five consumer-facing recipes + README flipped tosdk/paths (grep-gated). Two recorded §4.3 calls (D-206): (a) phase-67's smoke keeps the toolless build-check and this gate owns the tool-declaring shape (no duplication); (b) the conversions flushed out additive facade extensions —sdk/{audit,telemetry,telemetry/eventbus,governance,tools/auth,skills/{importer,tools,generator}}+sdk/tools.ErrorClass(RFC §3.6 item 3 amended) — whilesdk/pauseresumewas deliberately NOT added (D-205's curation; the steer recipe reworked to the config-driven assemble shape). Wave D and the SDK re-homing program close here. Seedocs/plans/phase-112b-external-consumers.md.
113-band — the Protocol adoption track on the docs site (D-209 / D-210 reserved)
docs/notes/protocol-docs-proposal.md (owner-approved; merged as PR #305) is the binding design: the Protocol is Harbor's ecosystem surface (RFC §5.1 — "the same surface powers a remote attach, a third-party dashboard, or an IDE/TUI client"), but a client author today must read Go source to answer what methods exist, what events arrive with what payloads, what an error looks like, how auth works, what a version bump means. The band serves the proposal's four audiences (evaluator / client builder / event integrator / control integrator) with a docs-site track whose center of gravity is a generated, gen-check-gated contract reference — the house single-source discipline applied to adopter docs. The owner resolved the proposal's open questions per its recommendations: Q1 event catalog is registry-read at gen time (the generator imports internal/drivers/prod and reads the populated events.EventTypes() registry, payload shapes via the CanonicalWireTypes-style reflection + lockstep-test treatment); Q2 OpenAPI emission deferred (recorded as a stretch in 113a's non-goals); Q3 the conformance suite is documented as the certification path in 113b but its sdk-export waits for a real third-party ask; Q4 versioned docs deferred to the first breaking Protocol change (recorded in both plans' risks). §13 pairing: 113a ships the generator + gate, and its own choreography guides + executed quickstart are the consumers in the same phase; 113b consumes 113a's reference pages (lockstep greps) and closes the track.
- 113a — the floor (D-209, logged at ship; Shipped).
cmd/harbor-gen-protocol-docsemitsmethods.md/events.md/errors.md/types.mdintodocs/site/protocol/from the canonical sources (methods.go+ the transports'*RoutePatterntables +IsControlMethod/cluster predicates + auth scopes; the Q1 registry-read event catalog;errors.go;CanonicalWireTypes) under generated-file headers;make protocol-docs-gen-check(git diff --exit-code) wired into the docs workflow — the gate shape D-093 specified, built here for a generator that actually exists (the TS generator stays deferred per D-132 / issue #179; no dependency); the "Speak Protocol in 15 minutes" quickstart whose curl steps the smoke EXECUTES against the preflight dev server (the recipe-cannot-lie pattern); choreography guides 1–3 (auth & identity incl. the D-171 session-blank model; streaming semantics; task control); the Protocol nav section + README Docs-table row; the §18 amendment putting the generated reference under the same-PR regeneration rule (AGENTS+CLAUDE, mirror-gated). Seedocs/plans/phase-113a-protocol-reference-and-quickstart.md. Shipped 2026-06-10 with two recorded §4.3 deviations (D-209 calls 3–4):control.HTTPStatusexported so the generated error page reads the wire transport's own code→status binding, and the executed quickstart's steering step accepts both documented wire outcomes (200 accepted / 404 not_found on a terminal run — the deterministic mock-path result, doubling as the §17.3 failure-mode leg). - 113b — the closer (D-210; Shipped). Choreographies 4–5: the pause model (
pause.requested→ approve / reject / OAuth-callback / plain resume; durable pauses across restarts;DecisionTimeoutreaps — the wire view of RFC §3.3's unified primitive) and versioning & compatibility (RFC §5.3 made adopter-facing, incl. unknown-field tolerance and unknown-method 404/405 handling — the smoke SKIP convention promoted to adopter contract); the build-a-client guide around a ~150-line worked event-viewer atexamples/protocol-clients/(compile-gated in the smoke) with the hand-maintained TS wire-type module + the Console as reference implementations; the conformance-certification page (how to runinternal/protocol/conformance, what passing claims — NO sdk-export per Q3). Deps: 113a. Seedocs/plans/phase-113b-protocol-choreographies-and-certification.md. Shipped 2026-06-11 with one recorded §4.3 deviation (D-210 call 2): the OAuth callback route lockstep-greps against the exportedauth.CallbackPathsource constant instead of the generated reference — the callback is a provider-redirect mount, deliberately not a canonical Protocol method, so it has nomethods.mdrow. The pause guide's approve/reject/timeout wire examples are captured from a production-driver devstack assembly; the OAuth leg is transcribed from the handler + its tests and says so (D-210 call 1). A §17.6-posture docs fix rode along:task-control.mdno longer claimstask.paused/task.resumedfire on the live pause path (nothing calls MarkPaused/MarkResumed in production — a parked run's task staysrunning;pause.listis the authoritative park read).
85-band — MCP client/host compliance (prioritised first post-V1 work)
The integer Phase 85 (Skills Portico provider driver) is removed: Portico is an MCP gateway and speaks MCP like any server, so the generic MCP client driver consumes it — a Portico-specific driver would duplicate the driver and couple Harbor to one ecosystem tool. The 85-band closes Harbor's MCP-client-compliance gap (audit + decomposition in brief 14). This band is the first post-V1 work — ahead of 83/84 in execution priority.
MCP 2026-07-28 RC re-plan (effective 2026-05-28). The MCP Foundation published a release candidate locked 2026-05-21; final spec drops 2026-07-28; Tier-1 SDKs ship RC support within a 10-week window (≈ late July–early August 2026). The RC reshapes the 85-band:
- Roots, Sampling, Logging are deprecated in the RC (annotation-only — functional 12+ months — but on death row). Phases that build operator-facing surface against them are cut.
- Tasks moves from experimental core to an extension, redesigned:
tasks/listremoved; new method set istools/callreturns a task handle, thentasks/get/tasks/update/tasks/cancel. 85h's hand-transcription against 2025-11-25 would lock in the wrong shape. - Session handshake (
initialize/initialized+Mcp-Session-Id) is removed, Streamable HTTP requires newMcp-Method/Mcp-Nameheaders, error code-32002flips to-32602, server-to-client requests restructure intoInputRequiredResult. These cross-cutting changes land as a new sub-phase 85m. - Authorization hardens with six new SEPs (
issvalidation, DCRapplication_type, issuer-bound credentials, refresh-token docs, scope accumulation,.well-knownclarification). 85b absorbs them; scope grows. - Cut phases (
85c,85e,85h,85i) keep their plan files as historical context — do not delete — but their Status readsCut. Thedocs/decisions.mdentry recording this re-plan is the canonical reference. - Lettering note: 85k (skills) already exists. The new RC-adoption sub-phase is 85m (skipping
lto avoidl/I/1ambiguity next to existing85i).
Per-phase RC verdict + readiness:
- 85a — MCP client core-compliance fixes. Pagination-truncation fix,
*ListChangedhandlers, resourceUnsubscribe-on-close. The honest-emptyrootscapability advertisement is now permanent (not a stopgap; 85e is cut). RFC §6.4. Deps: 28. Ready now — uses go-sdk v1.6.0 surface that exists today; nothing the RC removes is in scope. Seedocs/plans/phase-85a-mcp-client-core-compliance.md. - 85b — MCP HTTP OAuth (scope ↑). Wire
auth.Providerinto the MCP driver; RFC 9728 protected-resource-metadata discovery;WWW-Authenticate401 step-up; RFC 8707 resource indicators. Adds the six RC auth SEPs: SEP-2468 (issvalidation per RFC 9207), SEP-837 (DCRapplication_type), SEP-2352 (credential binding to issuer + re-register on migration), SEP-2207 (OIDC refresh-token docs), SEP-2350 (scope accumulation during step-up), SEP-2351 (.well-knownsuffix). Also: token-store keying moves from session-scoped to per-request_metasince the RC removesMcp-Session-Id. RFC §6.4, §3.3. Deps: 28, 30, 50, 85m (for the per-request keying). Ready now — OAuth flow is Harbor-side; SDK exposes the wire transport andWWW-Authenticatealready; the per-request keying mechanic ships with 85m but the plan can be authored against the new shape now. Seedocs/plans/phase-85b-mcp-http-oauth.md. 85c — MCP sampling provider(CUT). RC deprecatessampling/createMessage; replacement is "direct LLM provider API integration" — which is whatllm.LLMClientalready is. Building aCreateMessageHandler, pause-gated review surface,modelPreferencesresolution and tool-enabled sampling would ship operator-facing surface for a 12-month-EOL feature. Servers needing an LLM bring their own provider per the RC's guidance. Plan file kept as historical context. No revisit.- 85d — MCP elicitation provider. Form vs URL mode and the secret-rejection rule survive; the wire mechanic does not. RC replaces the SSE-based wait with
InputRequiredResult(inputRequests,requestState) + client retries the original call withinputResponses. The plan as written targets SSE — must be rewritten before implementation. The pause/resume primitive integration is still conceptually right. RFC §6.4, §3.3. Deps: 28, 50, 85m. Revisit after SDK-RC (≈ late Jul–Aug 2026). Seedocs/plans/phase-85d-mcp-elicitation-provider.md. 85e — MCP roots provider(CUT). RC deprecates roots; replacement is "tool parameters, resource URIs, or server configuration." 85a's honest-empty advertisement is now the permanent posture. Plan file kept as historical context. No revisit.- 85f — MCP remaining server features (slim). Ship completions (
completion/complete), resource templates (resources/templates/list), and progress (_meta.progressToken+notifications/progress). Drop the logging slice — RC deprecateslogging/setLevel+notifications/message; replacement is stderr / OpenTelemetry, both of which Harbor already has. RFC §6.4. Deps: 28, 85a. Ready now — all three retained features are in go-sdk v1.6.0. Seedocs/plans/phase-85f-mcp-remaining-server-features.md. 85g — MCP Apps host(DEPRECATED → superseded by 109a–c, D-172). The original premise — "revisit after RC-final because Apps is experimental and the RC may reshape_meta.ui.resourceUri" — was overturned: MCP Apps is a stable, independently-versioned extension (io.modelcontextprotocol/ui, theext-appsrepo), NOT gated on the July RC, and it ships an official framework-agnostic host bridge (@modelcontextprotocol/ext-appsAppBridge) that removes the hand-rolled-bridge risk this plan carried. A code audit also found 85g's "purely Console-side" non-goal factually wrong (the MCP driver doesn't parse_meta.ui.resourceUri,tool.completedcarries no content, andReadResourceisn't exposed on the Protocol — so there is real runtime + Protocol work). Pulled forward into V1.1.x as the three-phase 109a–c "MCP Apps host" wave, scheduled immediately after Phase 108. Plan file kept as historical context, marked deprecated. RFC §6.4, §7. See D-172, D-173, anddocs/plans/phase-109a-mcp-apps-runtime-protocol.md/phase-109b-console-mcp-apps-host.md/phase-109c-mcp-apps-displaymode-layout.md.- 109a — MCP Apps runtime + Protocol surface. The runtime/Protocol enablement layer the deprecated 85g plan wrongly assumed already existed. Parse
_meta.ui.resourceUrion MCP tool results; recogniseui://-scheme resources; project the app reference (resourceUri + negotiated DisplayMode +RawHTMLTrusted) onto the tool-result Protocol surface; addmcp.servers.read_resource(identity-scoped, D-026 heavy-content aware) to fetch theui://HTML; negotiateDisplayModesfrom the server'sio.modelcontextprotocol/uicapability (replacing the staticregistry.goplaceholder); add an app-initiated-tool-call proxy that routes through the existing approval/OAuth/identity tool-safety path. §13 same-wave consumer: 109b. RFC §6.4, §6.5, §7. Deps: 28, 85a, 84a. Seedocs/plans/phase-109a-mcp-apps-runtime-protocol.md. - 109b — Console MCP Apps host. Sandboxed-iframe renderer in the shared chat module (
web/console/src/lib/chat/renderers/mcp-app.svelte, D-091); strict CSP;postMessageorigin validation; the official AppBridge wired in manual-handler mode (D-173) — every app→host call Protocol-proxied through 109a, never a direct MCP connection; honoursRawHTMLTrusted→ sandbox strictness; the inline DisplayMode via the renderer registry. Adds@modelcontextprotocol/ext-apps+@modelcontextprotocol/sdktoweb/console(RFC §10 dependency-addition prerequisite). RFC §6.4, §7. Deps: 109a, 73n, 108. Seedocs/plans/phase-109b-console-mcp-apps-host.md. - 109c — MCP Apps DisplayMode layout. The Playground page-level layout state machine for fullscreen (app replaces chat + composer; multi-tab) and pip (50/50 resizable split, right rail hidden by default + toggle);
inlinealready shipped in 109b;onrequestdisplaymodedrives runtime transitions. Distinct from PG-6 two-agent comparison (post-V1, D-064). RFC §7. Deps: 109b. Seedocs/plans/phase-109c-mcp-apps-displaymode-layout.md. - 109d — Inline MCP-app discovery (D-215). Closes the dead seam the 109 wave's §17.5 audit pinned: the chain "a planner-initiated MCP tool result carrying
_meta.ui.resourceUri→ a chat message that mounts the 109b renderer → 109c's layout activates" was never wired, so the renderer + entire layout were unreachable in production. Three breaks closed: (1) the runtime emits a new canonical SafePayload eventmcp.app_availableat the MCP provider's invoke site whenever a tool result declares aui://app (carrying the server source id + resource URI + display-mode hint + run/identity correlation), registered alongsidemcp.resource_offloaded; (2) the single-sourced wireMCPAppRefgains aserver_idfield (also populated on the app-tool-call proxy response) so the renderer resolves which server to read theui://document from; (3) the ConsoleChatMessagegains anapp/serverIDfield,MessageBubbledispatches it underMCP_APP_INLINE_MIMEto mount the real renderer, and the Playground page attaches the decodedmcp.app_availableSSE event to the run's agent bubble. The §13 same-wave consumer is the discovery path itself; an inline app'sonrequestdisplaymode(granted by the page's full available-mode set) opens the app through 109c's already-shipped layout reducer. The wave-end W3 weak synthetic-DOM Playwright test (which re-implemented the clamp) is replaced by a real-component Vitest guard that mounts the shippedMessageBubble/McpAppRenderer/AppPaneland fails if the discovery→render wiring is reverted. RFC §6.4, §6.5, §7. Deps: 109a, 109b, 109c. Seedocs/plans/phase-109d-inline-mcp-app-discovery.md. - 109e — MCP App discovery reads the tool-DEFINITION
_meta.ui(D-216). A spec-conformance fix a live test against a realio.modelcontextprotocol/uiext-apps server (go-study-mcp) surfaced: the 109 wave parsed the_meta.ui.resourceUriapp reference from the tool RESULT (CallToolResult._meta), but the canonical SEP-1865 dialect (vendoredMcpUiToolMetaSchema: "UI-related metadata for tools") places it on the tool DEFINITION. A real ext-apps server binds theui://UI resource per tool intools/listand returns an empty result_meta, so the result-parse found nothing andmcp.app_availablenever fired — the renderer (109b) + layout (109c) were unreachable against real servers, and every 109a–d test passed only because its hand fixture put_meta.uion the RESULT (matching the buggy code, not the spec). The fix:buildToolDescriptorcaptures the tool-definition_meta.uiat discovery (immutable closure capture, D-025);callToolreconciles that binding with any optional per-result display-mode hint and firesmcp.app_availablefrom the result, feeding BOTH the discovery event AND the app-tool-call proxy projection (mcpconsole/apps.go, which had the identical result-only bug — fixed in the same change per §17.6). DisplayMode defaults to inline when none is negotiated/declared (go-study-mcp advertises no UI capability; the Console renderer already mounts on a bare{resourceUri, serverID}). The fixtures are corrected to the canonical placement (tool-def_meta.ui, empty result_meta) and aHARBOR_LIVE_MCP-gated probe drives the real go-study-mcp binary over stdio (CI-skipped, verified green in dev). This PR also adds CLAUDE.md/AGENTS.md §17.8: external-protocol conformance fixtures must derive from the real spec, never a hand-built one. RFC §6.4, §6.5, §7. Deps: 109a, 109d. Seedocs/plans/phase-109e-mcp-app-tool-def-discovery.md. - 109f — Render heavy MCP App documents + operator "pop to side-by-side" affordance (D-217). Closes two gaps a live test against the real go-study-mcp ext-apps server surfaced. Gap A: go-study-mcp's
ui://go-study-mcp/studio/index.htmlis 86.4 KB; the default heavy-content threshold is 32 KiB, so 109a'smcp.servers.read_resourcecorrectly offloads the document to the ArtifactStore by reference (D-026) and returns anartifactRefinstead of inlinecontent. The 109b renderer treated that as a FATAL "server bug" and refused to render — which hits nearly every real App, since Svelte/React bundles routinely exceed 32 KiB. The renderer now resolves the by-reference stub to a presigned URL via a new injectedMCPAppHostClient.resolveArtifactseam, fetches the bytes at the iframe edge, and loads them into the SAME sandboxedsrcdoc(same CSP, sandbox tokens,wrapAppDocument, origin guard) the inline path uses — only the content source changes; the offload stays correct (heavy bytes never inline through the context plane). The realresolveArtifactimpl lives in the Console adaptermakeMCPAppHostClient(overartifacts.get_ref), OUTSIDE the chat module, so the renderer keeps zero$lib/imports (D-091). A §17.6 bug-twin is fixed in the same PR: the playgroundChatProtocolClient.resolveArtifactread the absentresp.url(the wire field ispresigned_url), silently breaking every chat-bubble artifact preview. Gap B: a host-side operator "expand ⤢" affordance on the inline app frame pops the app to the 109c side-by-side (pip) / fullscreen layout WITHOUT the app asking, dispatched through the EXISTING injectedonDisplayModeRequestseam → the 109c layout reducer (no parallel display-mode path; no chat-module reach into the page). Always-on Vitest guards: a heavy-document fetch test (realistic >32 KiB App fixture, §17.8) that fails if the artifactRef branch reverts to the error path, plus an inline-path regression and a Gap-B affordance→reducer test. Console-only — no Runtime endpoint or Protocol method. RFC §6.4, §6.5, §7. Deps: 109a, 109b, 109c, 109d. Seedocs/plans/phase-109f-heavy-app-doc-render.md. - 109g — MCP App documents render inline on every artifact driver (D-218). A spec-correctness fix a live test against the real go-study-mcp ext-apps server surfaced: the 109 MCP Apps host gated a
ui://App document on the D-026 LLM-context heavy-output threshold (32 KiB) ininternal/mcpconsole/apps.go::ReadResource. go-study-mcp's studio App HTML is ~86 KB, somcp.servers.read_resourceoffloaded it to the ArtifactStore by reference and returned anartifactRef— which the Console can only fetch via a presigned URL, and the read-side resolver fails loud (CodePresignUnsupported) on every non-S3 driver. So the App never rendered on the inmem / fs / sqlite / postgres stores. Root cause: the heavy-output threshold exists to keep bulky bytes OUT of the LLM context window, but aui://App document NEVER enters the LLM context — the tool result carries only the tiny_meta.ui.resourceUrireference; the HTML is fetched ONLY by the Console and rendered in a sandboxed iframe. The fix re-scopes the threshold OUT of App documents:ReadResourcechecksmcp.IsUIResourceURI(resourceURI)and rides aui://document inline up to a dedicatedappDocumentInlineCap(2 MiB) instead of the 32 KiB heavy threshold, so the common case (all real apps) renders on EVERY driver with no presigning. Above the cap, the existing D-026 offload→artifactRef path (the loudmcp.resource_offloadedbypass) is preserved for pathologically large apps. An ordinary (non-ui://) resource keeps the heavy threshold. The tests use a REAL inmem ArtifactStore on the seam (109f's fetch test stubbed the resolver and so never hit the presign-unsupported driver — the §17.8 failure mode): the below-cap revert-guard reads an 86 KiBui://doc and asserts it rides inline with no offload event (it fails if reverted to the 32 KiB gate — verified); the above-cap test asserts a >2 MiB doc still offloads + fires the event; aHARBOR_LIVE_MCP-gated probe drives the real go-study-mcp studio doc throughReadResourceand asserts it returns inline. No Protocol wire-shape change —ReadMCPResourceResponse.Contentalready carries inline bytes. RFC §6.5, §7. Deps: 109a. Seedocs/plans/phase-109g-app-doc-inline-read.md. - 109h — MCP Apps UI-host capability advertisement (D-224). Closes brief 14's "Extension negotiation — Absent:
ClientCapabilities.Extensionsnever populated" gap on the MCP southbound driver. The 109 wave shipped the READ side —negotiateDisplayModesreads a server'sio.modelcontextprotocol/uicapability — but the driver never advertised its OWN:ClientCapabilities.Extensionsshipped empty, so a spec-conformant ext-apps server could not learn the Harbor host renders apps and could not tailor the app references it returns. 109h adds the symmetric WRITE side: the driver advertises theio.modelcontextprotocol/uiextension carrying the host's renderabledisplayModesduring the initialize handshake (hostCapabilities/filterHostDisplayModesinmcp.go, reusing the existinguiExtensionKey+ closedvalidDisplayModesset), sourced from a new deployment-leveltools.mcp_app_host.display_modesconfig field (MCPAppHostConfig+ToolsConfig.MCPAppHostDisplayModes(), defaulting to the inline baseline[inline]) threaded once throughAttachDeps.HostDisplayModes— which is also the programmatic SDK seam an embedder sets without YAML. The roots-regression trap (brief 14 §2 row 4 / §3): the go-sdk advertises{"roots":{"listChanged":true}}by default whenClientOptions.Capabilitiesis nil; settingCapabilitiesto add the UI extension OVERRIDES that default AND the SDK ignores the deprecatedRootsfield in favour ofRootsV2, so the code MUST replicate the current roots advertisement (RootsV2.ListChanged=true) or opting into the extension silently drops roots. This phase PRESERVES current roots behaviour exactly — it does NOT fix the roots honesty defect (brief 14 §3 / the separate 85a stopgap). Sampling/elicitation stay inferred from their handlers, unaffected. The integration test (§17.8) builds two providers from one resolved config value, pairs them to real SDK in-memory transports, and asserts each server's capturedInitializeParams.Capabilitiesechoes the configured modes AND still advertises roots — the fixture derives from the SDK's realInitializeParamsshape, not a hand blob; an opt-out provider advertises roots with no UI extension (the failure mode). No new inbound Protocol method or REST endpoint — the capability is an OUTBOUND client→server advertisement on the MCP handshake; the smoke is static-only. RFC §6.4, §7. Deps: 109a. Seedocs/plans/phase-109h-mcp-apps-host-capability.md. - 109i — MCP Apps tool-context capture +
mcp.apps.tool_context(D-225). The BACKEND half of the MCP Apps "Data Delivery" lifecycle. The 109 wave lets the Console discover (mcp.app_available), fetch (mcp.servers.read_resource), and render aui://MCP App in a sandboxed iframe — but a rendered app had no way to read the tool context (the input + the lowered result) that produced it. This phase captures that context at the tool-invocation site (internal/tools/drivers/mcp/mcp.go::callTool, the same site that emitsmcp.app_available) whenever a result declares aui://app, and exposes a new identity-scoped Protocol read method,mcp.apps.tool_context. Capture rides the EXISTINGStateStore— all three persistence drivers + identity isolation come free, NO new driver and NO new migration — keyed by the caller's identity triple (with empty RunID; session-scoped) underkind = "mcp.apps.tool_context/<serverID>/<toolCallID>"; the input and result are heavy-content-aware at WRITE (a payload ≥ the heavy threshold offloads to the ArtifactStore by reference through the SAME loud-bypass path the resource read uses, refactored into a sharedoffloadHeavyhelper). Thetool_call_idis a deterministic content hash ofrun | server | tool | args(NO mutableProviderfield — D-025); it is stamped on themcp.app_availableevent (alongsidetool_name; the payload staysSafeSealed— ids/names are not content), on the wireMCPAppRef, and on the app-tool-call proxy projection, so a client correlates a discovered app to its captured context. The new method routes through the AppsSurface dispatcher (IsMCPAppsMethod); an unknown or cross-identity(server_id, tool_call_id)fails withCodeNotFound(existence never revealed across identities — proven by a ≥2-identity isolation test). A capture failure is logged loudly but never fails the tool call (the planner's result is the source of truth); a missing identity fails closed. The capturer is wired into every MCP Provider ininternal/runtime/assemble(mirrored inharbortest/devstack+cmd/harbor), and the read seam onto theAppsAccessor. New wire types (ToolContextRequest/ToolContextPayload/ToolContextResponse) + the new method are single-sourced, hand-mirrored intoweb/console/src/lib/protocol/mcp.ts, and the generated Protocol docs + wire manifest regenerated. The §13 same-wave consumer is the read path itself (capture → read, exercised end-to-end in Go); the Console UI consumption lands in 109j. Concurrent-reuse tests (N=128) on the shared Provider + AppsAccessor + ToolContextStore pass under-race; aHARBOR_LIVE_MCP-gated probe drives a real ext-apps server through capture → read. RFC §6.4, §6.5, §7. Deps: 109a, 109d, 109g. Seedocs/plans/phase-109i-mcp-apps-tool-context.md. - 109j — Console pushes tool-input/tool-result into the app (D-226, Reverted in #346 — re-land tracked in #347). The Data Delivery Console half (Stage 2 of the spec-compliance wave), consuming the 109i
mcp.apps.tool_contextsurface now on main. After the sandboxed app sendsui/notifications/initialized, the host fetches the originating tool's context through the injectedMCPAppHostClientand pushes it via the official AppBridgesendToolInput()thensendToolResult()(the SDK requiresinitializedbeforesendToolResult). Heavy-aware (resolves anartifact_refto bytes at the iframe edge like 109f, else a faithful by-reference stub — never silently empty); a missing/evicted context (CodeNotFound) mounts with no push and no thrown error. Thetool_call_idflows event → ChatMessage app ref → renderer. No new sandbox/CSP/origin change; the no-direct-transport invariant (D-173) holds — the push uses only the injected client. Status: the Console data-delivery push was reverted to v1.4 in #346 because it broke theui/initializehandshake (handshake regression). The BACKEND tool-context surface (109i) is unaffected and remains Shipped. Re-landing the Console push is tracked in #347. RFC §6.4, §7. Deps: 109i, 109b. Seedocs/plans/phase-109j-mcp-apps-data-delivery-push.md. - 109k — MCP Apps spec-conformance hardening (D-227, Shipped V1.1.x). Closes the wave-end adversarial spec-review's findings — two conformance-breaking FAILs (green vs Harbor's own fixtures, broken vs a real ext-apps server — the D-216 class) plus host-obligation gaps. FAIL-1: the UI capability is advertised as the spec
mimeTypes: ["text/html;profile=mcp-app"](the field a conformant server gates on viagetUiCapability(caps).mimeTypes), NOT the hand-rolleddisplayModes109h shipped (not aMcpUiClientCapabilitiesfield). FAIL-2: an app→hosttools/call(bare server tool name) resolves against the calling app's<serverID>_namespace, so it hits the right catalog tool AND an app is confined to its own server's tools. Also: the non-specdisplayModesread offServerCapabilitiesis removed; display modes move to the spec slot (ui/initializehost-contextavailableDisplayModes, sourced from the 109hdisplay_modesconfig viaruntime.info); and the host honoursui/notifications/size-changed(iframe height), gracefului/resource-teardownon unmount, live Console theme +host-context-changed, host-contexttoolInfo/containerDimensions, andresources/templates/list. Sanctioned deviations (D-173 bridge-proxy, D-224 deployment-declaration intent, D-225, D-218) are preserved. The FAIL revert-guards areHARBOR_LIVE_MCPprobes against a real ext-apps server that gates onmimeTypes+ exposes a callback tool. Before merge, the orchestrator live-tests the full MCP Apps surface against the test agent + Console (regression guard — it worked pre-109). RFC §6.4, §7. Deps: 109a, 109b, 109h, 109i, 109j. Seedocs/plans/phase-109k-mcp-apps-conformance-hardening.md. 85h — MCP Tasks wire types(CUT). RC redesigns Tasks (moved to extension;tasks/listremoved; new method set; new lifecycle aroundtools/callreturning a task handle). Hand-transcribing the 2025-11-25 shape now locks in code that the extension SEP and Dockyard's port will both diverge from. Plan file kept as historical context. Revisit when Tasks extension SEP stabilizes + Dockyard ports + SDK adds support — refile as a new band, not 85h. No revisit on this slot.85i — MCP Tasks client(CUT). Same reasoning as 85h. Polling loop,tasks/listconsumption,input_required→ elicitation composition all targeted the old shape. No revisit on this slot.- 85j — MCP client conformance (target: RC). Conformance harness + scoped, substantiated compliance statement at
docs/design/mcp-compliance.md. Statement target bumps from MCP 2025-11-25 to MCP 2026-07-28 (RC) and ultimately the final spec. Wording obligation: never "fully compliant" unqualified; the scoped sentence enumerates exactly what's wired. Drops the cut areas (sampling, roots, logging, original Tasks) from the claim; adds the 85m transport / auth / schema / cache / trace items. RFC §6.4. Deps: 85a, 85b, 85d, 85f, 85g, 85m. Revisit after RC-final (2026-07-28) and after the dependent phases land. Seedocs/plans/phase-85j-mcp-client-conformance.md. - 85m — MCP 2026-07-28 RC adoption (NEW). Absorbs the RC's cross-cutting breaking changes the other phases can't carry on their own:
- Remove
initialize/initializedhandshake plumbing andMcp-Session-Idheader dependence frominternal/tools/drivers/mcpand all transports; client info moves to per-request_meta. - Streamable HTTP: set
Mcp-MethodandMcp-Nameon every outbound request; assert the server's reject-on-mismatch behaviour in tests. - Error code flip: every
-32002(resource-not-found) callsite →-32602(Invalid Params). - Server-to-client request restructuring: server-initiated requests only issuable while server is actively processing a client request; SSE elicitation polling removed (composes with 85d's rewrite).
- JSON Schema 2020-12 (SEP-2106): full draft support in tool / resource-template schema validation (composition, conditionals,
$ref). - Cache directives (SEP-2549): respect
ttlMsandcacheScopeon list / resource reads. - W3C Trace Context propagation (SEP-414): wire Harbor's existing OTel
traceparent/tracestate/baggageinto MCP_meta. - Capability discovery via
server/discover(replaces handshake-time advertisement). RFC §6.4. Deps: 28, 85a. Revisit after SDK-RC (≈ late Jul–Aug 2026) — every item above needs go-sdk RC support; Harbor's plan can be authored now (transcribe the RC SEPs into a phase plan) so implementation can start the day the SDK lands. New plan file:docs/plans/phase-85m-mcp-rc-2026-07-28.md(to author).
- Remove
- 86 — Durable distributed bus driver. NATS / Redis Streams / Postgres-as-queue behind
MessageBus. RFC §12. Deps: 22. - 87 — Durable TaskService backend. Background tasks survive restart. RFC §12. Deps: 20, 22.
- 88 — Episodic memory tier. Durable summaries promoted from session → user/tenant scope. RFC §11 Q-4. Deps: 24, 25.
- 89 — A2A northbound. Expose Harbor as an A2A server. RFC §11 Q-2. Deps: 29.
- 90 — Additional planner concretes. PlanExecute, Workflow, Graph, Supervisor, MultiAgent, HumanApproval. RFC §12. Deps: 49.
- 91 — Console-driven key rotation (Protocol).
governance.rotate_keyProtocol method;Accountimpl atomically swaps the live key set; bifrost picks up the new key on the nextAccount.GetKeysForProviderlookup (noReloadConfigrace). RFC §6.15, D-019. Deps: 36a, 60 (Protocol transport), 73 (Console-attaching). - 92 — Console-driven mid-session model swap.
governance.swap_modelProtocol method; future runs in a session use the swapped model; the planner sees the change viaRunContext. Audited. RFC §6.15. Deps: 36a, 60, 73. - 92a — Agent-config control plane (extends 91/92). Generalises the 91/92 "mutate desired-state via Protocol, reconcile into the runtime" pattern from governance config to AGENT-DEFINITION config: live, audited control of (a) MCP server-connection enablement — pause / resume / remove — plus per-individual-tool policy (the
active/deferred/ disabledloading_modevocabulary from 107c + 26b, made mutable per<source>_<tool>); (b) the skill set, over the existingSkillStore(Phase 37); and (c) a layered system prompt — an operator-owned base layer plus an optional session-scoped instruction layer composed above it, respecting the 83a–f structured prompt sections. The unifying primitive is a durable, identity-scoped, VERSIONED desired-state registry on the StateStore: each edit is an immutable revision (content-addressed, parent pointer), the active config is a revision pointer, rollback = repoint, and a server-side diff between revisions is exposed as a read method plus anagent.config.revisedevent. Next-turn-only, snapshot-immutable semantics (the D-025 alignment): a change affects ONLY the next run — in-flight / concurrent runs keep the immutable view they snapshotted at run-start; the runtime projects the per-run tool/config view from the registry by extendingtools.NewPlannerView(Phase 110a) to read desired-state instead of boot config, so there is no mid-flight mutation, no draining, and no forcible teardown (a paused connection's transport may stay warm — pause is a projection-time decision). Two decisions to settle when the band is expanded under §16: (1) app→hosttools/callcallbacks from a rendered MCP App (109i, D-173) are gated against CURRENT desired-state — a paused server rejects them and the host surfaces a "paused by an administrator" advisory — while in-flight PLANNER calls keep their snapshot (an intentional asymmetry); (2) the authorization-scope matrix — base-prompt edits, connection add/remove, and the per-tool allowlist are tenant/deployment-level capability changes requiring an elevated (fleet / tenant-admin) scope plus audit (adding a stdio server is approval-gated / allowlist-only), while session-scoped callers get only the safe subset (the user instruction layer, enable/disable among already-allowed sources, ephemeral skills). Adding a brand-new connection (async dial +initialize+ OAuth via the unified pause/resume primitive) is the separable hard sub-phase. Decomposes under §16 into: registry + Protocol surface + diff/rollback → skills control → layered prompt → MCP connection pause/resume + per-tool policy → (separable) add-connection. RFC §6.15, §6.16. Deps: 86, 87, 91, 92, 53a, 37, 28, 26b, 110a, 109i. - 93 — Failover chains as Harbor policy. Operator-defined chain
[primary, secondary, ...]per identity / model; orchestrated at the Governance layer with audit per hop; NOT pushed into bifrost's per-callFallbacks. RFC §6.15, D-018. Deps: 36a, 33. - 94 — Provider circuit breakers per
(provider, key). Aggregate error rate; trip on threshold; auto-recover on cool-down; events emitted. Builds on 93. RFC §6.15. Deps: 33, 93. - 95 — LLM cache (exact-match + semantic). Plugin pre-hook checks the cache; semantic uses an embedding similarity threshold. Big complexity; deferred. RFC §6.15. Deps: 33.
- 96 — PII redaction at the LLM boundary. Audit subsystem owns the redactor; Governance hooks it into the LLM call path. Outgoing prompts are scrubbed; raw forms are never persisted. RFC §6.15, D-020. Deps: 03 (audit redactor), 33.
- 97 — Media-input tool wrappers. Bifrost-backed tools that accept
ArtifactRefs and pass image/audio/file content to LLM-side analysis (e.g. a genericimage.analyzewrapper that accepts an image artifact + a text prompt and routes through the planner's normal LLM call). Mostly a convention layer — the plumbing already exists once D-021 + Phase 33 ship. RFC §6.5, D-021. Deps: 17 (artifacts), 33 (bifrost), 26 (tool catalog). - 98 — Media-output tool wrappers. Image generation, speech synthesis, transcription, and video tools that wrap bifrost's media APIs (
SpeechRequest,TranscriptionRequest,ImageGenerationRequest, etc.) and returnArtifactRefs. Each tool is a separate registration; they share a commonMediaToolhelper. The planner invokes them as ordinary tool calls; noLLMClientchange. RFC §6.5, D-021. Deps: 17, 33, 26. - 99 — Vision-aware memory summarization. Extends the
rolling_summarymemory strategy to call a vision model when summarizing turns that includeImageParts, replacing the V1 placeholder ([image: <ref>]) with a generated description. Optional per identity tier; off by default for cost. RFC §6.6, D-021. Deps: 24 (memory strategies), 33 (bifrost), 97 (media-input tools).
Wave / parallelism map
The phase queue is a DAG, not a line. Here are the parallelizable waves; phases inside a wave can be implemented in parallel by separate workers, phases in later waves wait for earlier waves' completion (or for the specific phases their Deps column names).
Wave 1 — Pure foundation (no upstream Harbor deps): 01 (identity), 02 (config), 03 (audit redactor) — three independent, parallelizable.
Wave 2 — Logger + bus skeleton: 04 (slog Logger; needs 03), 05 (Event taxonomy + InMem bus; needs 01, 03), 07 (StateStore iface + InMem; needs 01, 03). Parallelizable across three workers.
Wave 3 — Bus replay + sessions: 06 (replay; needs 05), 08 (SessionRegistry; needs 01, 07). Parallelizable.
Wave 4 — Core runtime serial chain (mostly): 09 (envelopes; needs 01, 08) → 10 (engine; needs 09) → 11 (reliability; needs 10) → 12 (streaming; needs 10, 11) → 13 (cancel; needs 10, 12) → 14 (routers; needs 10, 11). 11+14 can parallelize once 10 lands; 12, 13 serialize after 11.
Wave 5 — Persistence drivers (parallelizable across drivers): 15 (SQLite state), 16 (PG state), 17 (Artifacts iface + InMem + FS — needs 01, 07). Three parallel.
Wave 6 — Tasks + remaining persistence: 18 (Artifact SQLite/PG; needs 17, 15, 16), 19 (Artifact S3; needs 17), 20 (TaskRegistry; needs 01, 07), 21 (TaskGroup + WatchGroup + retain-turn + patches; needs 20), 22 (Distributed contracts; needs 09, 20). Stage 1 (18, 19, 20) parallelizable; Stage 2 (21, 22) once 20 lands.
Wave 7 — Memory + tools core + LLM core (parallel tracks):
- Memory track: 23 → 24 → 25
- Tools track: 26 → 27 / 28 / 29 (HTTP, MCP, A2A in parallel after 26)
- LLM track: 32 → 33 → 34 → 35 → 36 (largely serial)
- Governance track (slots in after 33): 33 → 36a → 36b (serial; relies on cost-passthrough from bifrost integration)
Wave 8 — Skills + planner core (after wave 7's foundations):
- Skills track: 37 → 38 / 39 / 40 / 41 (after 37, the four can run in parallel-ish)
- Planner track: 42 → 43 / 44 (parallel) → 45 → 46 / 47 (parallel) → 48 → 49
Wave 9 — Pause/Resume + Steering + Telemetry + Protocol (cross-track):
- 50 (needs 07, 09, 13) → 51 → 52 → 53 → 54
- 53a (Agent Registry; needs 01, 05, 07, 08) — parallelizable with the 50→54 chain; its deps are all long-shipped. Must land before 54 and the Console-attaching wave (72–75).
- 55 (OTel; after 04, 05) parallel with 56 (metrics; after 55, 05); 57 (durable event log; after 05, 07, 15, 16)
- 58 (protocol types) → 59 (versioning) → 60 (transport) → 61 (auth) → 62 (conformance)
- 30 (Tool OAuth/HITL; needs 26, 50, 53a), 31 (approval gates; needs 30) slot in once 50 + 53a are up
Wave 10 — CLI + test kit: 63 → 64 → 65 / 66 / 67 / 68 / 69 / 70 (mostly parallel after 64). 71 (test kit; needs 05, 09, 07) parallel.
Wave 11 — Console-attaching + hardening: 72 / 73 / 74 (parallel; need 60, 05, 06, 07, 17, 09). 75 (e2e gate; needs 64, 72, 73). 76, 77, 78, 79 (parallel; need their respective subsystems). 80 (docs polish; needs all V1).
Wave 12 — Release: 81 → 82 (serial).
Practical reading: with three or four engineers (or three concurrent worker subagents), waves 5–8 hide enormous parallelism behind their tracks. The serial sections that resist parallelism are: the core runtime chain (09→10→11→12→13), the LLM-client chain (32→33→34→35→36), and the Protocol chain (58→60→61→62).
Open architectural follow-ups feeding next-wave scoping
The Wave 11 §17.5 audit (PR #117) surfaced four architectural gaps tracked as GitHub issues. Three closed in Wave 11.5 (issues #112, #113, #114, #115 via PRs #119, #120, #121, #122; the wave-end E2E now exercises production end-to-end). Issue #116 (tools.oauth_providers[] operator config) shipped in PR #119 alongside Wave 11.5 Stage A. One open follow-up remains:
- #123 — task FSM bridge: translate RunLoop
FinishintoTaskRegistry.Mark{Complete,Failed}. Surfaced by PR #122 (D-097). Closed in Wave 12 Stage 1 via PR #128 (D-098). - #134 — wire memStore into ControlSurface. Surfaced by Wave 12 §17.5 audit N2.
cmd/harbor/cmd_dev.go::bootDevStackconstructs a MemoryStore and currently discards it via_ = memStore; when a Protocol method (or RunLoop hook) needs memory, the consumer phase closes the seam. - #135 — preflight wall time: parallelize phase smokes + ephemeral ports. Surfaced by Wave 12 audit Recommendations + operator feedback ("preflight is more waiting than dev time"). Four-step plan: random port allocation (unblocks parallel-worktree preflight), classify smokes (
live-server | static-only | unit-tests), parallel driver for the static batch, CI matrix sharding. Targets ≥50% wall-time reduction. Recommend scheduling early in Wave 13 — every wave that lands without this added another 10–20s to the gate.
This section accumulates audit-surfaced follow-ups that warrant tracking issues but haven't been promoted to phase plans yet. When the next wave scopes, this is the first list to reconcile against docs/plans/README.md's pending-phase block.
V1 cut line
V1 ships phases 01–82 + 36a + 36b + 53a. The follow-ups (83–100) are intentionally deferred to post-V1: the original band (83, 84, 86–90 — integer 85 was removed, see below), six Governance (91–96), three Multimodality follow-ups (97–99) for media-input/output tool wrappers and vision-aware memory summarization, and the Recipe loader (100). Two lettered bands sit inside this range: 83a–e (ReAct prompt depth + reasoning-channel decoupling) and 85a–j + 85m (MCP client/host compliance — the prioritised first post-V1 work; 85k is the separate Harbor agent-builder skills phase). The 85-band was re-shaped on 2026-05-28 against the MCP 2026-07-28 RC (sampling / roots / logging deprecated, Tasks redesigned to an extension, session handshake removed); see the 85-band detail block for the per-phase verdict. Multimodal inputs ship in V1 (RFC §6.5 + D-021); only multimodal outputs and richer memory handling are post-V1. The Evaluations subsystem and code-mode (Starlark) are also post-V1 — see RFC §12.
The cut line is justified by RFC §12 (Out of Scope for V1):
- Auto-sequence + reflection (83, 84) — explicit RFC §12 entries: "optional optimization, off by default" and "optional per concrete; not on V1's critical path." Shipping the planner without them does not weaken the swappable-planner property; both can land as planner-internal upgrades without runtime change.
- MCP client/host compliance (85-band, 85a–j + 85m) — post-V1 by deferral, not by architecture: the V1 MCP southbound driver (Phase 28) is core-functional; the 85-band raises it to feature-complete. Prioritised as the first post-V1 work. The integer Phase 85 (Skills Portico provider driver) was removed — Portico speaks MCP like any server, so the generic MCP client driver is its consumer; no Portico-specific driver is built. Per the MCP 2026-07-28 RC re-plan (2026-05-28), the band scopes as: HTTP OAuth (now covering six RC auth SEPs), elicitation (RC
InputRequiredResultshape), the surviving server features (completions / templates / progress), MCP Apps host, conformance (target: RC), and a new 85m absorbing the RC's cross-cutting transport / session / error / schema / cache / trace changes. Sampling, roots, the original Tasks pair (85h/85i) are cut. - Durable distributed bus + durable TaskService backend (86, 87) — RFC §6.12 settles "V1 ships contracts only; in-process default." A durable backend is a driver phase, not a runtime-architecture phase. Phase 87 SHIPPED (D-228): a
durableTaskRegistrydriver (internal/tasks/drivers/durable) persists task/group/patch records through the sharedStateStore(per-record slots, replayed on open) so they survive a restart, with an open-time recovery sweep that fails a crash-leftRunningtask toFailed{runtime_restarted}. Two recorded deviations from the merged plan: the task lifecycle was extracted into a sharedinternal/tasks/enginepackage (inprocess+durableare thin wrappers over one state machine) rather than duplicated; and the driver reuses the runtime's sharedStateStore(noStateDriver/StateDSNconfig fields —tasks.Openalready passes the store). Opt-in viatasks.driver: durable; fail-loud when noStateStoreis wired. Single-instance restart-survival of records only (no auto-re-drive); the queue-backed / distributed driver remains future work behind the unchanged seam. Phase 86 SHIPPED (D-229): adurableMessageBusdriver (internal/distributed/drivers/durable) persists everyBusEnvelopethrough the sharedStateStoreand projects it onto the localevents.EventBus, with a background poller that delivers cross-instance envelopes + replays persisted history after a restart (at-least-once; consumers dedupe on(TaskID, Edge, EventID)). StateStore-backed (Postgres-as-queue on a shared Postgres store); NATS / Redis Streams deferred (new deps → RFC §10 PR first). The bus-projection contract (EventTypeDistributedBusEnvelope/BusEnvelopePayload) was promoted from the loopback driver into thedistributedpackage so both drivers share it. Opt-in viadistributed.bus_driver: durable;loopbackstays the default; fail-loud when noStateStoreis wired. TheMessageBusseam itself is still contracts-only in production (noOpenBusconsumer yet, likeloopback), so the driver is registered + conformance/integration-tested, ready for a future bus consumer. - Distributed task dispatcher (86a) — the consumer that makes the Phase 86 durable bus load-bearing: it wires
OpenBusinto the runtime, publishes task-lifecycle envelopes (task.spawned+ terminal) to the bus, and runs a fleet RunLoop driver that claims a spawned task (a StateStore compare-and-swap lease, so exactly one worker drives it despite at-least-once fan-out) and drives it — turning the durable bus into the fleet work queue (a task spawned on any worker is driven on any worker and survives a restart). It is the distributed evolution of the single-instance per-task RunLoop driver (cmd/harbor/cmd_dev_runloop.go). Carries the high-level multi-worker deployment topology (N stateless Harbor workers behind a shared PostgresStateStore+ durable bus + durable tasks; EKS / multi-container). Deps: 86, 87. Without it, the durable bus is registered-but-unconsumed in production — 86a closes the "primitive without a consumer" gap (§13). RFC §6.8 + §6.12, D-230. - Episodic memory tier (88) — RFC §11 Q-4 leans post-V1 unless V1 user feedback demands it.
- A2A northbound (89) — RFC §11 Q-2 leans V1.1 unless an early adopter demands it.
- Additional planner concretes (90) — RFC §12 explicitly: "wait on V1 evidence that the interface holds." V1 ships React + Deterministic; the rest land as evidence accrues.
If under calendar pressure, phase 19 (ArtifactStore S3-style) and phase 75 (Playwright CI gate) are the most reasonable V1 → V1.1 slip candidates inside the V1 list, in that order.
Critical path
The longest dependency chain to V1, in order:
00 → 01 → 03 → 04 → 05 → 07 → 08 → 09 → 10 → 11 → 12 → 13 → 50 → 51 → 52 → 53 → 54 → 26 → 32 → 33 → 34 → 35 → 36 → 42 → 43 → 44 → 45 → 49 → 60 → 61 → 62 → 64 → 76 → 80 → 81 → 82.
That is 36 phases on the critical path out of 84 V1 phases. (Governance phases 36a/36b sit on the LLM track but are not themselves on the critical path; they branch off after phase 33 and rejoin via the StateStore conformance suite.) Practical implications:
- The runtime kernel chain (09→14) is six phases of deeply serial work — half a critical-path month if one engineer.
- The pause/resume coordinator chain (50→54) is the second cluster of serial work — and depends on the runtime chain landing through 13.
- The LLM client chain (32→36) must complete before the planner reference (45) lands.
- The protocol chain (58→62) is independent until 60 needs a wire decision (Q-1) — which can block the Console-attaching wave.
Highest-risk phases on the critical path (in priority order):
- Phase 12 (Streaming + per-run backpressure) — the predecessor's deadlock-under-streaming sharp edge; if shipped wrong, parallel runs deadlock.
- Phase 33 (bifrost integration) — Q-3 is resolved. The phase is now a routine implementation rather than a decision gate. Risk dropped to "ordinary integration risk" — driver translation correctness + cancellation-timing diligence on long streams. See
docs/research/08-llm-client-validation.md. - Phase 50 (Pause/Resume Coordinator) — the unified primitive; if it leaks abstractions to planner code, the swappable-planner property regresses.
- Phase 60 (Protocol wire transport) — Q-1; locking the wrong transport now means a v1→v2 migration later.
- Phase 76 (Cross-tenant isolation harness) — the integrity gate. If it lands late, regressions are not detected.
Risk-mitigation strategy: front-load Q-1 and Q-3 decisions so phases 33 and 60 don't enter implementation with open architecture questions.
Open RFC questions affecting the plan
The RFC's open questions (RFC §11) directly gate or shape these phases:
- Q-1 (Protocol wire transport). Gates phase 60. Lean is SSE+REST. If the answer becomes WebSocket+JSON-RPC or gRPC, phase 60 forks accordingly; phases 64–75 (CLI + Console-attaching) inherit the new transport but their shapes do not change materially.
- Q-2 (A2A northbound at V1). Determines whether phase 89 is V1 or post-V1. Default plan keeps it post-V1.
- Q-3 (LLM client choice). RESOLVED 2026-05-08. Replaced the original CGo-required candidate with
github.com/maximhq/bifrost/core(pure Go). Empirically validated against six OpenRouter-routed models — 23/24 gating items pass. Phase 33 is now a routine integration; phases 34–36 carry only ordinary implementation risk. Seedocs/research/08-llm-client-validation.md. - Q-4 (Episodic memory tier). Determines whether phase 88 is V1 or post-V1. Default plan keeps it post-V1.
- Q-5 (Skill versioning model). Shapes phase 41 (generator persistence) — content-hash-as-version is the V1 default; explicit semver is V1.5.
- Q-6 (Second V1 planner concrete). Settled in RFC as
deterministic. Phase 48 is locked.
Action: Q-1 and Q-3 should be resolved before the corresponding phases enter the implementation queue. Q-2, Q-4 can be resolved at V1 cut.
Notes
- Phase numbers are stable once shipped. A phase number is reused only via a
phase-NN-supersedes-MM.mdPR per AGENTS.md §15. - Phase plans are immutable post-ship, except for typo/clarification fixes. Material change = new RFC PR + new phase plan that supersedes.
- If the RFC switches to subsystem-prefixed numbering (e.g.
R-01,P-01), all phase plans rename in a single PR and this README reorganizes; phase numbering is therefore deliberately stable but not load-bearing for code or filenames ininternal/. - Cross-references: RFC Appendix A (subsystem ↔ brief table) is the canonical map for "which brief informs which RFC section." Use it when reaching for context on any phase.
- Coverage targets in the index column are starting points; per-phase plans may raise them. They never lower.
- Smoke scripts: every phase has
scripts/smoke/phase-NN.sh. The skeleton lands when the phase begins; assertions land as the surface implements. - Phase 0 already passes. Per
phase-00-skeleton.md: 24 OK / 0 SKIP / 0 FAIL on the doc & mirror invariants. Subsequent phases inherit that gate.
Appendix: runtime tool-dispatch trio mapping (post brief 07)
Brief 07 codified Harbor's "code-level tool calling" principle (RFC §6.4) and surfaced four discrete runtime components: ActionParser, Dispatcher (single + parallel folded), RepairLoop, ObservationRenderer. The current phase set covers them across existing phases — no renumbering required, but reviewers should anchor on this mapping when authoring per-phase plans:
| Trio component | Owner phase(s) | Notes |
|---|---|---|
ActionParser (internal/runtime/planner/parser/) | 44 (Schema repair pipeline) + 45 (Reference ReAct planner) | The parser belongs with the repair loop; the ReAct phase wires it into the planner step. |
Dispatcher — single tool path | 26 (Tool catalog core + InProcess) | Validation, identity stamping, cancellation hooks. |
Dispatcher — parallel branches | 47 (Parallel-call execution + JoinSpec) | Same validation/identity/cancel plumbing as 26; the two phases ship the same dispatcher, not two dispatchers. |
RepairLoop | 44 (Schema repair pipeline) | Drives parser → validator → planner-prompt-on-failure cycles up to RepairAttempts. |
ObservationRenderer (internal/runtime/planner/observation/) | 45 (Reference ReAct planner) + 46 (Trajectory compression / summariser) | Renderer interleaves assistant/user messages from (action, observation | error | failure) pairs; compression in 46 plugs into the same renderer. |
SchemaSanitizer (internal/llm/correction/) | 34 (Provider correction layer) | Lives between runtime and LLM client; per-provider response_format adjustments. |
If a future PR renames the package layout from internal/runtime/planner/... to a flatter internal/dispatch/ etc., the mapping table above moves with it and the phases retain their numbers. The trio is a design unit; splitting a single phase into "parser" + "dispatcher" + "renderer" sub-phases is allowed but not required.