Skip to content

Harbor — Architectural decisions log

Append-only record of decisions that have been settled. One entry per decision. Reading this file is the fastest way to answer "wait, why did we pick X?" without re-litigating.

If a decision is later reversed or superseded, do NOT delete the original entry — append a new entry with Supersedes: D-NN and update the Status of the superseded entry to Superseded by D-MM.

The decisions here are mirrored in the RFC (which is the design source of truth). When they conflict, the RFC wins; file an entry here noting the discrepancy and resolve in the same PR.


D-001 — Identity is the triple (tenant, user, session)

Date: 2026-05-08 Status: Settled Where it lives: RFC §4, AGENTS.md §6 Why: The runtime must support concurrent sessions for the same user without context leakage. Tenant-only isolation is insufficient for multi-user agents. The triple is mandatory; there is no opt-out knob.


D-002 — Console is a Protocol client; Runtime is headless

Date: 2026-05-08 Status: Settled Where it lives: RFC §5, AGENTS.md §1, §4.5, §13 Why: The predecessor's Playground re-implemented runtime concepts (2,478 lines, 30+ HTTP routes, parallel state-store protocol). Decoupling unlocks remote attach, fleet view, third-party consoles, IDE/TUI clients, and prevents the "framework with a playground" trap. The Runtime never imports Console code.


D-003 — Planner is swappable behind one interface

Date: 2026-05-08 Status: Settled Where it lives: RFC §3.2, §6.2, AGENTS.md §1 Why: The biggest architectural lift over the predecessor. The runtime owns mechanism; planners own reasoning policy. ReAct is the V1 reference; Plan-Execute, Workflow, Graph, Deterministic, Supervisor, MultiAgent, HumanApproval can plug in over time without runtime changes.


D-004 — Persistence triad shipped at V1: in-mem + SQLite + Postgres

Date: 2026-05-08 Status: Settled Where it lives: RFC §9, AGENTS.md §9 Why: Three drivers from t=0 forces a clean abstraction. Designing against one tends to leak that backend's assumptions into the contract. The predecessor shipped contracts with no production backends; operators DIY-ed queueing. Harbor closes that gap.


D-005 — Skills are a Harbor subsystem (not pushed entirely to Portico)

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.7, harbor_skills_subsystem memory Why: The token-savvy DB-backed search/context/virtual-directory pattern is the predecessor's strongest subsystem; Harbor inherits it cleanly. Portico still owns distribution across tenants; Harbor consumes via a SkillProvider driver. Plus: Skills.md importer (closes the per-skill manual-adaptation gap) and an in-runtime skill generator with persistence (the predecessor's draft generator can't save).


D-006 — Background-task persistence: in-process at V1, durable post-V1

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.8 (and §6.12 contracts), master plan post-V1 list Why: V1 ships the contract. A durable backend (Postgres-as-queue or similar) lands post-V1 once the operational shape is clear.


D-007 — A2A: full spec compliance from V1

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.4, master plan phase 29 Why: The predecessor ships full A2A spec compliance in code (the public docs lagged — that's the lesson Harbor's doc hygiene closes). Harbor inherits the surface verbatim from t=0; A2A peers appear as just-another-tool-source under the unified abstraction.


D-008 — Sessions = longer-lived multi-turn conversations containing many Runs

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.9, glossary Why: Resolves the predecessor's ambiguity between StreamingSession and SessionManager. Identity is (tenant, user, session); RunID is per-execution; TraceID (OTel) may span Runs.


D-009 — CLI dev-loop subcommand: harbor dev

Date: 2026-05-08 Status: Settled Where it lives: RFC §8, master plan phase 64 Why: Boots local Runtime + Console + observability + hot reload + dynamic agent scaffolding with draft saving. Console is still a protocol client even on localhost; same code path as remote attach.


D-010 — Code-level tool calling (LLM = decision-maker, not runner)

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.4 + §6.5, brief 07, harbor_design_principles memory Why: The LLM emits text/JSON describing intent; the runtime parses, dispatches, and merges. Provider-native tool calling APIs are NOT used. Provider differences disappear; the runtime owns the protocol. The LLM client surface collapses to one method. The runtime trio (ActionParser / Dispatcher / ObservationRenderer) plus siblings (RepairLoop, SchemaSanitizer) are the design pieces. Reversibility: if community standard hardens around native tool calling later, a second LLMClient driver can be added — the runtime doesn't change.


D-011 — Unified pause/resume primitive (HITL + OAuth + A2A AUTH_REQUIRED + steering PAUSE)

Date: 2026-05-08 Status: Settled Where it lives: RFC §3.3 + §6.3, harbor_protocol memory Why: Four seemingly-distinct features all converge on one runtime-level pause. The predecessor implements pause inside the planner loop, forcing every pause-shaped feature to reinvent coordination. Harbor's primitive lives at the runtime; planners and tools both signal "I need a pause" and the runtime drives the protocol-level event + resume token.


D-012 — LLM client: bifrost (resolves Q-3); rejects CGo-required candidate

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.5, RFC §11 Q-3 RESOLVED, brief 08 Why: Original candidate (liter-llm) requires CGo bindings to a Rust core, conflicting with AGENTS.md §5/§13. Bifrost is pure Go (verified by direct source inspection: zero import "C", zero #cgo, zero binary blobs), 23 first-class providers, empirically validated against six OpenRouter-routed models — 23/24 gating items pass. Bifrost's Tools/ToolChoice parameters are NOT used (see D-010).


D-013 — Go 1.26+ minimum

Date: 2026-05-08 Status: Settled Where it lives: AGENTS.md §5, RFC §10, .golangci.yml, .github/workflows/ci.yml, go.mod Why: Bumped from 1.22 to match bifrost's go.mod floor. Go 1.26 is current; no downside to the bump.


D-014 — License: Apache-2.0 (MIT was the considered alternate)

Date: 2026-05-08 Status: Settled Where it lives: RFC §10, /LICENSE, README.md Why: Patent grant matters for an SDK companies will build on; NOTICE-file mechanism makes attribution explicit; consistency with the infrastructure neighborhood (Go, Kubernetes, OTel, gRPC, bifrost). MIT remains a real alternate; flip is mechanical (no code changes).


D-015 — Code-level tool calling justification recorded in RFC §6.4

Date: 2026-05-08 Status: Settled (acknowledged as a minority position) Where it lives: RFC §6.4, glossary, this entry Why: Maintainer explicitly questioned whether to switch to provider-native tool calling. Trade-off analysis confirmed code-level is the right call for Harbor's architecture: consistent with runtime/planner separation, swappable planner, cross-provider uniformity, single-method LLM client, custom opcodes (task.subagent, parallel with join spec), simpler streaming, and future-reversibility. Accuracy gap is closing as instruction-tuned models improve. Recorded so future re-reads understand it was a deliberate, examined choice.


D-016 — Governance is a Harbor subsystem; middleware between Runtime and LLMClient driver

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.15, master plan phases 36a + 36b + 91–96, glossary Why: Bifrost (the LLM-call substrate) doesn't know Harbor's identity triple. Identity-scoped policies (cost ceilings, rate limits, per-call MaxTokens, key rotation, model swap, failover, circuit breakers) live in a Harbor middleware layer that wraps the LLMClient interface. The LLMClient interface stays one method.


D-017 — V1 Governance scope: cost ceilings + rate limits + MaxTokens; operator-driven runtime control is post-V1

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.15, master plan phases 36a + 36b (V1) and 91–96 (post-V1) Why: A solo dev running production agents needs bankruptcy prevention from t=0 (cost accumulators + ceilings + rate limits). Live operator-driven controls (key rotation via Protocol, mid-session model swap, failover chains, circuit breakers, caching, PII redaction) require Console to land first; their phases sit explicitly in the post-V1 cluster (91–96) so they are tracked, not forgotten.


D-018 — Failover is a Harbor policy, not bifrost per-request Fallbacks

Date: 2026-05-08 Status: Settled (post-V1 implementation, phase 93) Where it lives: RFC §6.15, master plan phase 93 Why: Bifrost has a Fallbacks []Fallback field on each request — that's a per-call escape hatch with no audit awareness of Harbor's identity scopes. Harbor's failover is a policy with cost + rate-limit + audit implications; centralizing it in the Governance subsystem keeps every fallback hop a Harbor event with the identity triple attached.


D-019 — Key rotation via Account.GetKeysForProvider per-request lookup, not bifrost ReloadConfig

Date: 2026-05-08 Status: Settled (post-V1 implementation, phase 91) Where it lives: RFC §6.15, master plan phase 91 Why: ReloadConfig is whole-config replacement and races with in-flight requests. Account.GetKeysForProvider(ctx, provider) is invoked by bifrost on each request; Harbor's Account impl reads the live key set from a runtime-controlled atomic source. Console-pushed key rotations take effect on the next call with no config-swap race; old keys are invalidated immediately.


D-020 — PII redaction at the LLM boundary lives in Audit; Governance owns thresholds

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.15, master plan phase 96 Why: Redaction is one canonical concern with multiple emit paths (logs, audit events, persisted state). Owning it in Audit gives one redactor; Governance owning it would split responsibility and risk inconsistent output. Governance owns thresholds (cost, rate, tokens) where the canonical concern is policy enforcement.


D-021 — Multimodality scope: inputs in V1, outputs as post-V1 tool wrappers

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.5 (Multimodal inputs subsection), master plan phases 32 + 33 (V1 inputs) + 97 + 98 (post-V1 outputs), glossary (ContentPart, ImagePart, AudioPart, FilePart) Why: The predecessor accumulated an ambient "text-only" assumption that became expensive to retrofit. Harbor settles multimodal inputs at V1 (image/audio/file via ChatMessage.Content's Parts slice; bifrost handles per-provider translation) so the LLM call surface is correct from t=0 — sending images to LLMs as part of analysis is the common case, not a feature. Outputs (image generation, TTS, transcription, video) are delivered as tools that return ArtifactRefs; the planner dispatches them via the existing tool catalog (RFC §6.4 code-level dispatch). This keeps the LLMClient interface one method and aligns multimodal output with the runtime's existing tool-dispatch story.


D-022 — ArtifactRef is the canonical binary representation for multimodal content

Date: 2026-05-08 Status: Settled Where it lives: RFC §6.5 (canonical binary representation paragraph), §6.10 (Artifacts), glossary (Artifact, ArtifactRef, multimodal part types) Why: Three supply forms exist for image/audio/file content (URL, DataURL, ArtifactRef). Above the heavy-output threshold (32 KB default — RFC §6.10), the runtime automatically materializes inline DataURL content into ArtifactRefs and rewrites the message before event emission, audit, and persistence. This keeps event payloads, audit logs, and memory turns from carrying raw bytes; it also gives audit redaction a stable canonical form to handle (ArtifactRef passes through unredacted; DataURL is rewritten to placeholder + ref). URLs pass through unchanged when the provider can fetch them directly.


D-023 — Flow-as-Tool: Go-coded flow.Definition ships V1; declarative recipe (YAML) format ships V1.1

Date: 2026-05-09 Status: Settled Where it lives: RFC §6.1 (Flow-as-Tool subsection) + §6.4 (Flow transport variant), master plan phase 26a (V1) + phase 100 (post-V1 recipe loader), glossary (Flow, Definition, Budget for flows, Recipe) Why: A Flow is a typed DAG of Nodes assembled into a runnable unit and registered as a Tool the planner can call. This composes (a) the existing subflow + reliability shell (NodePolicy retry / exponential backoff / timeout) from §6.1, (b) the unified tool dispatch path from §6.4, and (c) the identity-tier Governance ceilings from §6.15, without adding a parallel orchestration concept. V1 ships the Go-coded Definition shape so the contract is settled and operators can ship flows in code; recipes (declarative YAML loaders into the same Definition struct) ship V1.1 to keep V1 scope tight without losing the surface. Per-flow Budget composes with run-level + identity-level budgets via min(): any layer can abort the flow, whichever cap fires first.


D-024 — ToolPolicy reliability shell wraps every tool invocation, regardless of transport

Date: 2026-05-09 Status: Settled Where it lives: RFC §6.4 (Tool.Policy field + reliability-shell paragraph), master plan phase 26 (acceptance criteria), glossary (ToolPolicy) Why: A predecessor pattern worth preserving: even the minimum-expression tool — a plain Go function decorated as a tool — got per-call timeout / retry-with-backoff / validation for free. Harbor settles this at the catalog level: Tool.Policy is a ToolPolicy mirroring NodePolicy (§6.1). The Dispatcher trio (§6.4) wraps every tool invocation in the shell once; Transport (InProcess / HTTP / MCP / A2A / Flow) does not change the resilience guarantees. Defaults fire when ToolPolicy is zero-valued, so tools.RegisterFunc(name, fn) is production-resilient with no ceremony. Operators who want non-default policy pass tools.WithPolicy(...). Same backoff math + retry classes as NodePolicy so the surface is one mental model.


D-025 — Concurrent reuse contract: compiled artifacts immutable; per-run state lives in ctx + RunContext

Date: 2026-05-09 Status: Settled Where it lives: RFC §3.5 ("The concurrent reuse contract"), AGENTS.md §5 ("Concurrent reuse contract — non-negotiable"), §11 (mandatory concurrent-reuse tests), §13 (forbidden: mutable state on compiled artifacts), docs/plans/_template.md (pre-merge checklist), every Wave 1+ phase plan. Why: The predecessor's most expensive retrofit was thread-safety on its first-version flow runtime — the singleton "build a flow once, reuse across runs" pattern had mutable state that bled across concurrent invocations once parallelism was enabled. Harbor closes this from t=0 by settling four guarantees for every compiled artifact (flow.Engine, Tool, Planner, MemoryStore, Redactor, LLMClient, ToolCatalog): no data races, no context bleed, no cancellation cross-talk, no goroutine leaks. Every phase that builds a reusable artifact ships a concurrent-reuse test (N≥100 invocations under -race); the test is part of the pre-merge checklist and the drift-audit-adjacent phase plan template. Mutable state on artifacts that crosses run boundaries is a forbidden practice (AGENTS.md §13). Per-run state lives in ctx + RunContext; this constraint shapes every interface signature in the runtime.


D-026 — Context-window safety net: no raw heavy content reaches the LLM; standard ArtifactStub everywhere

Date: 2026-05-09 Status: Settled Where it lives: RFC §6.5 ("Context-window safety net" subsection + standard ArtifactStub schema), RFC §6.10 (heavy-output threshold), AGENTS.md §13 (forbidden: raw heavy content in LLM messages), master plan phase 32 (LLM client core enforces the catch-all pass), glossary (Context-window safety net, ArtifactStub, ErrContextLeak, ErrContextWindowExceeded). Why: The predecessor learned the hard way that LLM context windows balloon when artifacts (images, PDFs, large tool outputs, memory turns) are not consistently offloaded — the safety net was retrofitted later. Harbor settles the pattern as a runtime-wide invariant from t=0: no message reaching the LLMClient carries raw heavy content. Multi-stage enforcement: (1) producers (tool dispatcher, memory subsystem, multimodal input materialization, ObservationRenderer) substitute heavy content with ArtifactRefs as part of their normal output; (2) a single catch-all pass at the LLM-client edge walks the assembled CompleteRequest and fails loudly with ErrContextLeak if any ≥-threshold raw payload survived, plus fails with ErrContextWindowExceeded if the estimated token count is within the configured ContextWindowReserve (default 5%) of the model's context limit. V1 does not auto-truncate when the budget guard fires — the planner receives a typed error and is responsible for recovery (drop older turns, summarize, etc.); auto-cascade is post-V1 work. The standard ArtifactStub schema ({artifact_ref, mime, size_bytes, hash, summary, fetch}) is the only thing the LLM sees in place of heavy content; format is uniform across all producers and providers — no per-model swapping.


D-028 — Event bus surface reconciliation: identity.Quadruple field, EventBus name, replay deferred to Phase 06, sealed-via-embedded-Sealed payload pattern, SafePayload bypass

Date: 2026-05-09 Status: Settled (supersedes the earlier RFC §6.13 sketch) Where it lives: RFC §6.13 (revised; earlier sketch retained as "kept for history"), docs/plans/phase-05-events.md ("Findings I'm departing from"), internal/events/events.go (the shipped surface), internal/events/payloads.go (bus-internal SafePayload types). Why: The earlier RFC §6.13 sketch carried flat identity strings (TenantID, UserID, SessionID, RunID), an EmittedAt time, optional metric-shaped fields (LatencyMs *float64, TokensIn/Out *uint32, CostUSD *float64, QueueDepth), called the bus interface Bus, and ranged it over a Replay(ctx, Cursor, Filter) method. The shipped Phase 05 surface diverged in five load-bearing ways: (1) identity reuseIdentity identity.Quadruple re-uses Phase 01's type so a single concept lives in one place rather than four scattered string fields; (2) renamed to EventBus so the symbol doesn't collide with generic Go vocabulary at call sites; (3) Replay deferred to Phase 06 (the in-memory ring-buffer driver) and exposed through a future capability interface, keeping the core EventBus surface to three methods; (4) no inline metric fields — Phase 56 will derive metric labels from Event.Extra (a bounded map[string]string) so the cardinality boundary is explicit; (5) sealed-via-embedded-Sealed payload pattern plus the SafePayload marker (composing SafeSealed) — bus-internal payloads bypass the audit redactor (preserving typed access on the subscriber side), external payloads default to redactor-walked RedactedMap. The OccurredAt rename (from EmittedAt) keeps the field's verb consistent with the new "emit"/"publish" terminology in the bus implementation. Phase 05 plan acknowledged none of these in its "Findings I'm departing from" section; this entry closes that drift retrospectively. The earlier sketch is preserved verbatim in §6.13's "kept for history" paragraph.


D-027 — StateStore is a generic (Quadruple, Kind, Bytes) surface; typed wrappers land at consumer phases

Date: 2026-05-09 Status: Settled (supersedes RFC §6.11's typed-multi-method sketch) Where it lives: RFC §6.11 (revised to the generic surface), docs/plans/phase-07-state.md ("Findings I'm departing from"), internal/state/ (will land in Phase 07 implementation), every consuming phase that wraps the surface (08 sessions, 20 tasks, 22 distributed, 23 memory, 42 planner, 50 steering — each ships its own typed adapter atop the generic interface). Why: The earlier RFC §6.11 sketch listed 21 typed methods (SaveTask, SaveTrajectory, SaveBinding, SaveSteering, SaveMemoryState, …) keyed on Go types (Task, Trajectory, RemoteAgentBinding, SteeringEvent, MemoryKey, …) that do not exist yet — they belong to phases not in Wave 2 (sessions Phase 08, tasks Phase 20, distributed Phase 22, memory Phase 23, planner Phase 42, steering Phase 50). A leaf persistence interface cannot import types from its consumers without inverting the dependency graph. Harbor settles the call: StateStore is a five-method surface keyed on (identity.Quadruple, Kind string, Bytes []byte) with idempotency on a caller-provided EventID (ULID). Consuming phases land their typed wrapper at their own layer (SessionRegistry.Save(s Session) reduces to StateStore.Save(StateRecord{Identity: s.Identity, Kind: "session.lifecycle", Bytes: marshal(s)})). Strictly more general than the typed surface, fully covered by the conformance suite, and avoids the leaf-imports-consumer cycle. Forward-only migrations, three-driver parity (in-mem / SQLite / Postgres), and the no-Supports*-ceremony rule from §9 still apply unchanged. The earlier sketch is not deleted from history (it captured intent); this entry supersedes it.


D-029 — Replay returns []Event, not a fresh Subscription

Date: 2026-05-09 Status: Settled (supersedes brief 06 §2 sketch for Phase 06) Where it lives: docs/plans/phase-06-events-replay.md ("Findings I'm departing from"), brief 06 §2 (the original sketch is preserved unchanged in the brief), internal/events/events.go (the Replayer interface lands in Phase 06's implementation PR), glossary (Replayer capability interface, Cursor). Why: Brief 06 §2 sketched Replay(ctx, Cursor, Filter) (Subscription, error) returning a fresh Subscription whose stream interleaves historical-then-live events. That coupling makes the historical/live boundary fuzzy and forces the bus to dedupe at the seam between snapshot and live tail — exactly the kind of "subtle invariant maintained by clever code" pattern the predecessor learned to regret. Harbor settles the surface as Replay(ctx, Cursor, Filter) ([]Event, error): a snapshot of historical events strictly between the cursor and the bus's current sequence, with the caller responsible for combining the snapshot with a fresh Subscribe if it wants to continue live. The split gives the no-duplicate / no-gap guarantee a clean home — Publish stamps every event with Sequence, and a subscriber's cursor is "the last sequence I have." If a future phase needs a one-shot ReplayAndSubscribe, it composes on top of these two primitives without changing driver implementations. The brief sketch is preserved unchanged in docs/research/06-events-observability-devx.md §2; this entry records the implementation departure.


D-030 — TaskRegistry surface split: per-task in Phase 20, groups + retain-turn + WatchGroup in Phase 21

Date: 2026-05-10 Status: Settled Where it lives: docs/plans/phase-20-tasks.md ("Findings I'm departing from"), docs/plans/phase-21-tasks-groups.md (the follow-up surface), internal/tasks/tasks.go (the shipped Phase 20 surface), brief 05 §7 (the original sketch recommending one phase for the full surface). Why: Brief 05 §7 phase decomposition recommended one phase for the full TaskRegistry (per-task surface + groups + retain-turn + patches + ack-background). Harbor splits this across Phase 20 (per-task surface) and Phase 21 (groups + retain-turn + WatchGroup + patches). Per-task lifecycle is independently shippable and has zero dependencies on group governance; bundling the whole TaskService into one phase would slow the wave-end E2E and delay the per-task surface that downstream phases (steering Phase 53, planner Phase 42) want as a stable foundation. The split keeps Phase 20's TaskRegistry interface narrow (Spawn / SpawnTool / Get / List / Cancel / Prioritize / Mark*) while Phase 21's PR extends the same interface with group + retain-turn methods against a stable per-task subset. Brief 05's recommendation is preserved verbatim in docs/research/05-state-tasks-artifacts-sessions.md §7; this entry records the implementation departure.


D-031 — Distributed contracts: full A2A v1 surface mapping + loopback V1 driver; vendored proto pinned by commit SHA

Date: 2026-05-10 Status: Settled Where it lives: RFC §6.4 + §6.12, docs/plans/phase-22-distributed.md, docs/specifications/a2a.proto (vendored at commit ae6a562d5d972f2c4b184f748bb32e1fa9aa7bf2, 2026-04-23), docs/specifications/README.md, internal/distributed/ (the shipped Phase 22 surface), this entry. Why: D-007 settled "A2A full spec compliance from V1." Phase 22 realises that commitment by hand-transcribing the entire A2A v1 surface into Go: every A2AService RPC maps 1:1 to a RemoteTransport method, every proto message has a Go counterpart in internal/distributed/a2a/types.go, every oneof variant (Part, SecurityScheme, OAuthFlows, StreamResponse, SendMessageResponse) is represented as a Go interface + concrete-type-per-variant discriminated union with a Kind() string discriminator. The TaskState 8-value enum, the Role 3-value enum, and every nested message (AgentCard, AgentInterface, AgentSkill, AgentCardSignature, TaskPushNotificationConfig, AuthenticationInfo, the five SecurityScheme concretes, the five OAuth flow concretes including the two deprecated ones for spec parity, every request/response envelope) ship as named Go types. Phase 29's southbound A2A driver inherits the surface without churn. The proto is vendored at a pinned commit SHA so the source-of-truth is searchable from inside the repo; bumps land as deps(specs): PRs. The Go shapes are hand-written (not protoc-generated) because: (a) Phase 22 must not pull google.golang.org/grpc / google.golang.org/protobuf into a contracts-only package — Phase 29 owns that decision; (b) the hand-written shapes integrate cleanly with identity.Quadruple, slog logging, and Harbor's error idioms; (c) the types_test.go coverage gate (a hand-maintained list of 50 expected type names with a count assertion) makes the transcription auditable. The V1 driver is loopback — in-process dispatch routed through an in-memory Agent interface (in internal/distributed/drivers/loopback/agent.go) so the conformance suite can simulate every A2A RPC without leaving the process. The conformance suite IS the gate: future drivers (durable bus at phase 86, A2A wire at phase 29) inherit it verbatim.


D-032 — Wake-on-resolution is a planner-concrete responsibility; TaskRegistry stays neutral

Date: 2026-05-10 Status: Settled Where it lives: docs/plans/phase-21-task-groups.md (the WatchGroup surface + the three wake-mode names documented at internal/tasks/groups.go package godoc), docs/plans/README.md Phase 42 / 45 / 48 / 49 detail blocks (the consumption contract), internal/tasks/tasks.go (the neutral WatchGroup(sessionID identity.Identity, groupID TaskGroupID) (<-chan GroupCompletion, func(), error) surface — no Mode enum baked in), planner phase plans when authored. Why: Phase 21 closed the predecessor's silent gap where non-retain-turn SpawnTask groups left the planner with no signal that all members had resolved. The fix is WatchGroup + GroupCompletion — a non-blocking notification channel the planner subscribes to. But the policy of how a planner reacts to that channel (wake the LLM eagerly, poll on its next deterministic iteration, or hybrid push + sidecar status emitter) is a planner-shape concern, not a TaskRegistry concern. Burning a WakeMode enum into the registry would either force every planner concrete onto the same policy or introduce a Supports* capability protocol — both anti-patterns under AGENTS.md §4.4. So the registry stays neutral and the three wake modes (push / poll / hybrid) are documented at the internal/tasks package godoc with the same vocabulary the planner phases consume. Each concrete planner (Phase 42+) MUST implement at least one of the three modes for non-retain-turn group continuation; the planner conformance pack (Phase 49) MUST exercise the round-trip (SpawnTask → group completes → planner re-enters → reads MemberOutcome). The retain-turn flow (turn-bound parallel) keeps its existing RegisterRetainTurnWaiter path — WatchGroup is strictly the non-retain-turn dual. Naming the wake modes in one canonical place keeps third-party planner authors aligned and makes the conformance assertion testable.


D-033 — Memory subsystem: identity-rejection emits memory.identity_rejected on the bus with "<missing>" substitution for the partial-triple identity field

Date: 2026-05-11 Status: Settled Where it lives: RFC §6.6, docs/plans/phase-23-memory.md ("Brief findings incorporated" + "Risks / open questions"), internal/memory/events.go (the event-type constant + registration + MemoryIdentityRejectedPayload), internal/memory/reject.go (EmitIdentityRejected + identityRejectionReason), brief 04 §4.2 + §6. Why: Brief 04 §4.2 settles that a MemoryStore operation with a missing identity component MUST (a) fail closed with ErrIdentityRequired and (b) emit an audit event so the rejection is observable on the event bus. The brief does not name the event type. Harbor settles it as memory.identity_rejected, registered in the canonical events registry via this phase's init(). The payload (MemoryIdentityRejectedPayload) is SafePayload by construction — Operation is a bounded enumerable method name, Reason is a static string naming the missing component(s); no caller-controlled bytes survive on the payload. The naming and the SafePayload classification are mine, recorded so a later phase auditor doesn't flag either as drift. The event's Identity field is also load-bearing: Phase 05's ValidateEvent rejects empty-triple events with ErrIdentityRequired, so the rejection event itself cannot be Identity = identity.Quadruple{} even though the rejected input was. The settled solution: substitute any empty component with a "<missing>" sentinel on the published event so ValidateEvent passes; the payload's Reason field names the truly missing component(s), and subscribers MAY Admin: true-filter to fan-in cross-tenant rejections. The memory record persistence key is also settled at this phase: Kind = "memory.state" for the typed-wrapper-over-StateStore write (D-027 pattern), per-Quadruple slot, with the persisted bytes shaped as {strategy, turns} JSON. Phase 23 only writes empty records (Strategy=none has no mutations); Phase 24 will append turn data; Phase 25's persistent drivers will inherit the shape unchanged.


D-034 — Persistent memory drivers own their memory_state tables; Deps.State accepted-but-unused; wire envelope memory.Record exported for cross-driver byte-stable Snapshot/Restore

Date: 2026-05-11 Status: Settled Where it lives: RFC §6.6, RFC §9, docs/plans/phase-25-memory-drivers.md ("Findings I'm departing from" + "Risks / open questions"), internal/memory/wire.go (Record + KindMemoryState), internal/memory/drivers/sqlite/{sqlite.go,migrations/0001_init.sql}, internal/memory/drivers/postgres/{postgres.go,migrations/0001_init.sql}. Why: Phase 23's InMem MemoryStore persists records through the injected state.StateStore per D-027 (typed-wrapper-over-generic, Kind="memory.state"). The Phase 25 persistent drivers (SQLite + Postgres) instead maintain their own memory_state table — this is a deliberate departure from D-027's "one StateStore, many typed wrappers" model and is mandated by the master plan ("Your SQLite/PG drivers persist memory state to their OWN tables ... but the byte serialisation contract is the same shape so cross-driver Snapshot/Restore round-trips byte-stable"). Two consequences are now settled: (1) the memory.Deps.State field is accepted by the persistent drivers but unused — the existing validateDeps contract still requires non-nil to preserve backward compatibility with the InMem driver (which DOES use State), so the persistent drivers hold the reference without writing to it; (2) the wire envelope previously named memoryStateRecord inside the InMem driver is promoted to an exported memory.Record type at internal/memory/wire.go (with the canonical KindMemoryState routing constant alongside it) so all three drivers marshal byte-identical JSON, enabling the Phase 25 acceptance criterion that a Snapshot taken by one driver Restore-round-trips byte-stably through another. Each persistent subsystem's Postgres migration runner uses a distinct pg_advisory_lock key (fnv64aSigned("harbor-memory-migrations")) so the state + memory migration runners cannot collide.


D-035 — Memory strategies: OverflowDropOldest is the only OverflowPolicy; recovery loop is bounded by RecoveryBacklogMax with drop-oldest + memory.recovery_dropped emit; retry/backoff/cadence are constants, not config

Date: 2026-05-11 Status: Settled Where it lives: RFC §6.6, docs/plans/phase-24-memory-strategies.md ("Findings I'm departing from" + "Risks / open questions"), internal/memory/memory.go (OverflowPolicy enum + OverflowDropOldest constant + ValidateHealthTransition + ErrInvalidHealthTransition + transition table), internal/memory/events.go (EventTypeMemoryHealthChanged + EventTypeMemoryRecoveryDropped + HealthChangedPayload + RecoveryDroppedPayload), internal/memory/health.go (EmitHealthChanged + EmitRecoveryDropped), internal/memory/strategy/rolling_summary.go (the constants defaultRetryAttempts = 3, defaultRetryBackoffBase = 100*time.Millisecond, defaultDegradedRetryEvery = 10*time.Second; the bounded recovery loop), brief 04 §2 + §4.1. Why: Three narrow scope calls at this phase, all driven by AGENTS.md §13's "no silent degradation" rule + the "fail loudly" principle:

  1. OverflowPolicy narrows from brief 04 §2's three-option enum to a single OverflowDropOldest. Brief 04 §2 names truncate_oldest, truncate_summary, and error. Harbor ships only OverflowDropOldest. Rationale: (a) truncate_summary requires the summariser inside the truncation path which conflates two strategies; (b) error is a silent-degradation footgun — an over-budget AddTurn returning ErrBudgetExceeded would force every caller to handle the error or silently lose turns, which is exactly the pattern AGENTS.md §13 closes. The narrow enum lets the surface grow if a real LLM-client integration (Phase 32+) surfaces a use case for truncate_summary ("always keep a summary line, drop oldest verbatim turns first"); today the simpler shape avoids the footgun.
  2. Recovery loop is bounded by RecoveryBacklogMax with drop-oldest + memory.recovery_dropped emit on overflow. Brief 04 §4.1 names the bound; the drop-oldest action + the recovery-dropped event are mine, recorded here so a later auditor doesn't flag the naming or the SafePayload classification as drift. The payload is SafePayload by construction — only a bounded Reason string survives ("backlog_overflow" today). Default RecoveryBacklogMax = 16 sized to absorb a short summariser outage (≈4 minutes at defaultDegradedRetryEvery = 10s × 16 retries) without unbounded memory growth.
  3. Retry / backoff / cadence knobs from brief 04 §2 (RetryAttempts, RetryBackoffBase, DegradedRetryEvery) do NOT land in config.MemoryConfig. Only RecoveryBacklogMax is operator-tunable. The three constants live in internal/memory/strategy/rolling_summary.go as package constants (defaultRetryAttempts = 3, defaultRetryBackoffBase = 100*time.Millisecond, defaultDegradedRetryEvery = 10*time.Second). Rationale: nobody has needed to tune them yet, exposing knobs no one has a calibrated answer for is fighting yaml; if the LLM-client integration (Phase 32+) surfaces real-world miscalibration we re-litigate via an RFC PR + a new MemoryConfig field. Keeping the surface narrow today avoids version-skew between an operator's harbor.yaml and a future Harbor that retunes the defaults internally.

The Health FSM transition table is also settled at this phase: healthy ↔ retry ↔ degraded ↔ recovering with the explicit edges listed in internal/memory/memory.go's healthTransitions map. Self-loops are valid; any other pair is rejected by ValidateHealthTransition with ErrInvalidHealthTransition (fail-loud — an invalid transition is a programming error in the calling executor, not a recoverable state). The full matrix is property-tested in internal/memory/strategy/strategy_test.go::TestValidateHealthTransition_Matrix. The Health FSM's observable degradation path (memory.health_changed emit on transition) is the explicit, documented exception to AGENTS.md §13's "no silent degradation" rule — degraded mode IS the observable failure surface, and emitting the event makes it observable (and therefore not silent).


D-036 — HTTP tool driver: URL/body/header templates use text/template + explicit urlquery; secrets live in Auth only

Date: 2026-05-11 Status: Settled Where it lives: docs/plans/phase-27-tools-http.md ("Findings I'm departing from"), internal/tools/drivers/http/http.go (the checkNoSecretLeak guard + compileTemplate with missingkey=error + urlquery funcmap), internal/tools/drivers/http/manifest.go (the loader's pre-compile leak check + the ${ENV_VAR}-only secret form), docs/glossary.md (AuthSpec + UTCP manifest + RegisterHTTPTool), AGENTS.md §7 (credential boundary rule this implements). Why: Brief 03 §3 sketched HTTP tool registration with "url-template substitution from args" but did not specify the credential boundary. Without a constraint, the simplest implementation lets operators interpolate ${API_KEY} or {{ .Auth.token }} directly into the URL — which means the secret crosses the audit redactor, lives in observability logs, and rides through any caching layer. Harbor's tools-HTTP driver tightens this from t=0: URL / body / header templates are text/template strings whose only namespace is .Args.*; the loader runs a regex check ({{[\s-]*\.Auth\b) against every template at register / load time and rejects matches with ErrTemplateSecretLeak. Secrets enter the driver only via the Auth map (operator-supplied), and the manifest loader requires the ${ENV_VAR} reference form — literal secret strings are also rejected at load time. Combined: a leaked secret in an HTTP tool config is a register-time error, never a runtime data leak. Templates use missingkey=error so {{ .Args.unknown }} fails loudly rather than silently rendering as empty (consistent with the runtime's "fail loudly" rule, AGENTS.md §5). The urlquery funcmap alias is documented in package godoc so operators write {{ .Args.city | urlquery }} explicitly when the substituted value must be URL-escaped; the default rendering does NOT auto-escape (Go's text/template is byte-faithful), so this is the operator's responsibility for now — a future enhancement could auto-escape every substitution if the asymmetry proves error-prone.


D-037 — MCP southbound driver wraps github.com/modelcontextprotocol/go-sdk@v1.6.0; transport-reconnect lives in ToolPolicy, not in a parallel state machine

Date: 2026-05-11 Status: Settled Where it lives: RFC §6.4, docs/plans/phase-28-tools-mcp.md ("Findings I'm departing from" + "Risks / open questions"), internal/tools/drivers/mcp/ (the shipped driver), internal/tools/drivers/mcp/auto.go (the MCPTransportMode selector + auto-fallback at Provider.Connect, not at Transport.Connect), brief 03 §4 (the "reconnect-on-failure" brief recommendation), this entry. Why: Brief 03 §4 named "reconnect-on-failure" as a Phase 28 requirement. The Go MCP SDK's StreamableClientTransport already ships an internal exponential-backoff reconnect loop for the standalone SSE stream; stdio + SSE transports leave session-level failures to the caller. Harbor handles those failures at the ToolPolicy retry shell (D-024) — the descriptor's Invoke closure re-runs callTool which re-reads sessionForRead, so a ToolPolicy retry transparently uses a reconnected session when the operator runs Provider.Connect again. Implementing a parallel reconnect state machine inside the driver would have shipped "two parallel implementations of the same conceptual feature" (AGENTS.md §13 forbidden practice) — one in ToolPolicy, one in the driver — and required new sentinels + new audit events to make the per-driver reconnect observable. The settled design keeps reliability at the catalog edge and the driver thin. SDK version v1.6.0 is pinned (its Go floor 1.25 ≤ Harbor floor 1.26); bumps are routine deps PRs with conformance suite re-run. Auto-mode fallback (streamable-HTTP → SSE) was lifted from Transport.Connect to Provider.Connect so the SDK's client.Connect initialize-handshake failures are also covered — a Transport.Connect-only fallback would miss "endpoint answers HTTP but isn't really streamable".


D-038 — A2A southbound driver: JSON-RPC binding, route-scoring weights settled, push-config storage forwarded to peer (no local mirror)

Date: 2026-05-11 Status: Settled Where it lives: RFC §6.4, master plan phase 29, docs/plans/phase-29-tools-a2a.md, internal/distributed/drivers/a2a/registry.go, internal/distributed/drivers/a2a/a2a.go package godoc, internal/tools/drivers/a2a/a2a.go, glossary (A2A peer, Agent Card cache, Route scoring). Why: Phase 29 lands the first wire-level A2A driver. Three design calls warrant a settled entry so a later auditor doesn't churn them. (1) Wire binding. The vendored proto carries both service A2AService { rpc … } (gRPC stubs) AND google.api.http annotations (HTTP+JSON binding). Phase 29 implements the JSON-RPC 2.0 over HTTPS binding per the master-plan Phase 29 detail block and brief 03 §5; gRPC + HTTP+JSON bindings on the same peer's AgentCard are accepted as read-only metadata until those drivers ship. The driver matches AgentInterface.ProtocolBinding == "JSONRPC" (the Phase 22 constant a2a.ProtocolBindingJSONRPC); peers declaring no JSONRPC interface fail loudly with ErrNoJSONRPCInterface. (2) Route-scoring weights. The Registry's CompositeScore = (5 × TrustTier) + (1000 / max(1, LatencyTierMS)) + (10 × CapabilityScore). Trust outranks latency 5:1 (safety first); latency is the tie-breaker among similarly-trusted peers (the 1000/lat_ms term saturates at the LatencyWeight when latency is 1ms, drops to 1.0 at 1000ms); capability match adds an additive boost so a peer that declares the exact AgentSkill.ID outranks a tag-match. Lower latency + lexicographic URL break composite ties so the result is deterministic. Weights are tunable post-V1 but not exposed at V1 (a single deployment uses one canonical scoring policy). (3) Push-notification config storage. The master-plan detail block specifies "store push-notification configs in-memory at V1." Phase 29's southbound driver IS the client (issuing Create/Get/List/Delete against the peer); the peer is responsible for durability. The wire driver forwards CRUD verbatim and stores nothing locally. A multi-replica Harbor consequently sees per-peer push-config state — acceptable for V1; durable mirroring is a Phase 23 (memory) / Phase 15 (SQLite state) / Phase 16 (Postgres state) compose post-V1. HTTPS-only is enforced for non-loopback peers (AGENTS.md §7); HTTP is allowed for 127.0.0.1, ::1, localhost, and operator-allowlisted loopback shapes only. The conformance suite (internal/distributed/conformancetest.RunRemoteTransport) is the gate — passes verbatim against the wire driver bound to an httptest.Server-shaped mock A2A peer.


D-039 — LLM-edge safety pass: mandatory-by-construction, ordering = materialize → leak-detect → token-budget; safety wrapper is the registry's only handout

Date: 2026-05-11 Status: Settled Where it lives: RFC §6.5 ("Context-window safety net" subsection), AGENTS.md §13 (forbidden: raw heavy content reaches LLMClient), master plan phase 32 (acceptance criteria), docs/plans/phase-32-llm-client.md, internal/llm/safety.go (safetyClient), internal/llm/registry.go (Open returns the wrapper, not the raw Driver), glossary (Context-window safety net). Why: D-026 settled the what ("no message reaching the LLMClient carries raw heavy content; fail loudly via ErrContextLeak / ErrContextWindowExceeded"); Phase 32 settles the how. Three design calls warrant a settled entry so a later auditor doesn't churn them. (1) Mandatory-by-construction. internal/llm.Open(...) returns an LLMClient interface whose only concrete implementation is the package-private *safetyClient. The factory builds a Driver (the unexported-by-naming surface) and wraps it. A caller cannot bypass the safety pass through the registry; a caller who genuinely needs a bare Driver (an evaluation harness that has already run the pass) constructs the wrapper directly in its own package — but the production code path is the registry. This is the AGENTS.md §13 "fail-loudly + capability mandatory" pattern applied to the safety net: the runtime fails closed, not "fails open with a feature-flag." (2) Pass ordering. Inside safetyClient.Complete, the steps are: identity → structural-validate → materialize → leak-detect → token-budget → driver. Materialize runs BEFORE leak-detect so a producer that ships a oversize DataURL gets one more chance to be rewritten; a producer that ships raw bytes in a Text field (not a DataURL) is caught by leak-detect. The token-budget guard runs LAST so it sees the post-materialize byte count (an ArtifactStub-rewritten message is small; estimation reflects what the driver will actually send). Cancellation is honoured between every step via ctx.Err(). (3) No auto-cascade at V1. The token-budget guard fails loudly with ErrContextWindowExceeded; V1 does NOT truncate or summarize automatically. The planner is responsible for recovery (drop older turns, summarize, etc.). Auto-cascade is post-V1 work — an extension of memory's rolling_summary plus a PromptAssembler orchestrator; tracked but not on V1's floor. The acceptance bar of "fails loudly = observable" is settled at the bus emit (llm.context_window_exceeded); operators quantify how often the guard fires and tune ContextWindowReserve accordingly.


D-040 — bifrost driver design: single-provider per Harbor instance; env.NAME API-key resolution at New time (fail-closed on missing); stream cancellation abandons the chunk reader; cost emit lives in the driver

Date: 2026-05-11 Status: Settled Where it lives: RFC §6.5, RFC §11 Q-3 (RESOLVED), docs/plans/phase-33-bifrost.md, internal/llm/drivers/bifrost/bifrost.go (Driver + Complete + streamComplete), internal/llm/drivers/bifrost/account.go (Account + resolveAPIKey), internal/llm/drivers/bifrost/cost.go (emit helper), glossary (BifrostDriver, BifrostContext, ProviderRouting), brief 08. Why: Brief 08 settled the adoption ("bifrost is the V1 LLM driver"); Phase 33 settles the adapter shape. Four design calls warrant a settled entry so a later auditor doesn't churn them. (1) Single-provider per Harbor instance. LLMConfig (Phase 32) ships Provider / Model / APIKey / BaseURL / Timeout as singular fields; Phase 33's Account advertises exactly one configured provider. The operator's harbor.yaml carries the bifrost-side Provider (e.g. openrouter); the per-model ModelProfiles keys carry the upstream identifier (openai/gpt-5.3-chat). Multi-provider routing per Harbor instance is post-V1; deployments needing multiple endpoints run multiple Harbor instances. (2) API-key resolution at New, not at Complete. Account.resolveAPIKey reads cfg.APIKey once at construction. The literal "sk-..." form is the value; the "env.NAME" form looks up os.Getenv(NAME) and fails closed with ErrMissingAPIKey (naming the env var) if unset. Fail-at-boot is the runtime principle (AGENTS.md §5); a runtime that boots clean and fails the first user request because of a missing key is the silent-degradation footgun §13 closes. The key value is NEVER logged, surfaced in errors, or emitted on the bus. (3) Stream cancellation abandons the chunk reader. Brief 08 §"Cancellation caveat" observed that bifrost's chunk channel can take a few seconds to close on some providers after ctx cancel. The driver's streamComplete does a select on <-ctx.Done() and the chunk channel; on ctx-cancel the driver returns ctx.Err() IMMEDIATELY and never waits for the channel close. Bifrost's worker goroutine continues draining upstream on its own; the goroutine-leak test asserts baseline restoration. (4) Cost emit lives in the driver, not the safety client. The Phase 32 safety client is provider-blind; the driver knows the request's model and observes bifrost's BifrostCost shape. cost.go::emitCostRecorded publishes llm.cost.recorded after a successful Complete with the full identity quadruple + model + cost + usage; Phase 36a's governance accumulator subscribes against this emit site. If a future phase ships a second non-mock LLM driver and wants cost emission to fold into the safety client (so all drivers emit for free), the wave-end audit can re-litigate — V1 has one production LLM driver, so the redundancy doesn't matter.


D-041 — Provider corrections: outside the safety pass; single baked-in mode; CorrectionsProfile lives on ModelProfile; hook-registered wrapper

Date: 2026-05-11 Status: Settled Where it lives: RFC §6.5, docs/plans/phase-34-provider-corrections.md, internal/llm/llm.go (CorrectionsProfile + four enum types), internal/llm/registry.go (RegisterCorrectionsWrapper hook + Open compose order), internal/llm/corrections/corrections.go (Wrap + init() self-registration), internal/config/config.go (LLMCorrectionsConfig, LLMCorrectionsProfileConfig), internal/config/validate.go (enum allowlists), brief 03 §4–§5, brief 08 §"Phase 34 scope shrinks slightly".

Why: Phase 34 ships the per-provider correction layer between Harbor's runtime and the Phase 32 safetyClient(driver). Four design calls warrant a settled entry so a later auditor doesn't churn them.

  1. Compose order is corrections(safetyClient(driver)) — corrections OUTSIDE safety. The safety pass (D-026 / D-039) materializes oversize DataURLs, asserts no raw heavy content survived, and runs the token-budget guard. If corrections wrapped INSIDE safety, the safety pass would evaluate the PRE-correction request and any future correction that grows token count would slip past. With corrections outside, the safety pass sees the POST-correction request (the final outgoing payload reaching the driver) and its invariants apply to what actually leaves the runtime. Phase 34's quirks today are content-preserving (reordering, schema mutation, envelope translation, usage backfill); future quirks may not be. The outside-safety arrangement is the safe default.

  2. Single baked-in mode — no use_native toggle. Brief 03 §5 documented the predecessor's use_native_llm=True/False toggle that shipped TWO LiteLLM/native implementations in parallel and is exactly the "two parallel implementations of the same conceptual feature" §13 rejects. Harbor picks one architecture (corrections.Wrap over a bifrost-backed driver) and compiles the per-provider quirks into a single layer. The operator's only choice is enable: true (production default) or enable: false (test-only escape hatch). The yaml field is a *bool so the loader distinguishes "operator omitted" (nil → default true) from "operator explicitly disabled."

  3. CorrectionsProfile lives on llm.ModelProfile, not in internal/llm/corrections. Two reasons: (a) Import-cycle avoidance — corrections imports llm; placing the profile TYPE on ModelProfile in the llm package lets the corrections sub-package consume it without a back-edge. (b) Single source of truth — ModelProfile already carries JSONSchemaMode (Phase 35), DefaultMaxTokens (Phase 36b), ReasoningEffort (Phase 33), CostOverrides (Phase 36a). The corrections fields belong in the same bundle so an operator's harbor.yaml model_profiles[<model>]: block is the one canonical place per-model quirks land. The corrections LOGIC stays in internal/llm/corrections/.

  4. Hook-registered wrapper, blank-imported in cmd/harbor/main.go. llm.RegisterCorrectionsWrapper(fn) is the seam: the corrections package's init() calls it with Wrap. Production binaries blank-import _ "github.com/hurtener/Harbor/internal/llm/corrections" so the registration fires at boot. Tests that exercise the safety pass in isolation set cfg.DisableCorrections = true; tests that exercise the corrections layer directly call corrections.Wrap without going through llm.Open. This pattern mirrors §4.4's driver-registry seam — write-once-at-init, blank-import for production wiring, opt-out for tests.

Inverse-naming the snapshot field DisableCorrections (instead of CorrectionsEnabled) means the zero-value matches the production default: programmatic snapshot construction in tests does not have to flip an extra knob to get correct behaviour. The config loader's *bool Enabled field resolves to DisableCorrections = !*Enabled at the boundary (Phase 64+ implements the mapping; today the snapshot is constructed directly by tests).


D-042 — Custom OpenAI-compatible providers: operator-declared via yaml, OpenAI base-type only (Phase 33a), per-provider network knobs override global NetworkDefaults, env var resolves at New time

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.5, docs/plans/phase-33a-custom-providers.md, internal/config/config.go (LLMCustomProviderConfig + LLMNetworkDefaults), internal/config/validate.go (cross-check against native ∪ custom names; nativeBifrostProviders mirror; allowedCustomBaseProviderTypes), internal/llm/registry.go (CustomProviderSpec + NetworkDefaults on ConfigSnapshot), internal/llm/drivers/bifrost/account.go (Account widened to support custom primary; buildCustomProviderConfig; customByName table), brief 03 §"Provider catalog", brief 08 §"Architecture".

Why: Phase 33 shipped a thin bifrost adapter for the native provider list (OpenAI / OpenRouter / Anthropic / Cohere / Mistral / NIM / etc.). Operators want to wire OpenAI-compatible endpoints (NIM as the canonical first case, plus vLLM, ollama, lm-studio, in-house gateways) without per-provider Go code. Bifrost ships schemas.CustomProviderConfig for exactly this use case — Phase 33a exposes the operator-tunable subset. Four design calls warrant a settled entry.

  1. OpenAI-compatible base type only at Phase 33a. LLMCustomProviderConfig.BaseProviderType defaults to "openai" and only that value is accepted at this phase. Bifrost itself supports Anthropic / Mistral / etc. as base types for custom providers; widening Harbor's surface is a Phase 33b/c task once we have evidence operators need it. The narrow surface today avoids fighting yaml when no one's calibrated the per-base-type quirks yet. The validator's allowedCustomBaseProviderTypes map gates this; widening is a one-line table edit + a phase plan.

  2. Per-provider network knobs override global NetworkDefaults. Phase 33a unifies Timeout / MaxRetries / RetryBackoff* / Concurrency / BufferSize under one operator-facing surface (llm.network_defaults) with per-provider overrides on each custom entry. Zero-valued per-provider fields fall through to the global; zero-valued globals fall through to bifrost's package-level defaults. The fallthrough order (per-provider > global > bifrost-default) is identical for native primary and custom primary — operators tune them with one mental model. The motivating case is NIM cold-start latency (often > 60s); a 180-second per-provider Timeout on the NIM entry survives the cold-start without pulling every other provider's timeout up.

  3. API key resolution: env.NAME for native primary, raw env var NAME for custom providers. The native primary path (Phase 33) inherited LLMConfig.APIKey with the env.NAME form because the field overloads literal-or-env. Custom providers have a dedicated APIKeyEnvVar field; operators write the env var NAME directly (e.g. "NVIDIA_API_KEY", NOT "env.NVIDIA_API_KEY"). This is one indirection shorter and avoids the literal-vs-env ambiguity for the multi-provider case. The validator rejects env. prefixes on APIKeyEnvVar with a clear error so the operator notices the asymmetry. Both forms resolve os.Getenv(NAME) at New time; missing env vars fail closed with ErrMissingAPIKey naming the unset variable.

  4. GetConfiguredProviders returns the single PRIMARY provider only — D-040 preserved. Phase 33a's Account holds a customByName map of every declared custom provider but GetConfiguredProviders returns only the one named by LLMConfig.Provider. Multi-provider routing within a single Harbor instance is a future extension; the seam (the table, the per-provider config resolution) is ready but Phase 33a does not commit to multi-routing semantics. This keeps D-040's "single-provider per Harbor instance" intact while making the future widening additive (no API change to GetConfiguredProviders — just return the full table when the time comes).

The operator-facing BaseURL gotcha lands in this entry too: bifrost's OpenAI provider appends /v1/chat/completions to whatever BaseURL the operator sets. Operators write the HOST root (https://integrate.api.nvidia.com) — NOT the full /v1/ path — for the canonical case. Endpoints whose URL already includes /v1 use RequestPathOverrides to override the suffix. The example yaml documents this; the wire-level integration test asserts the path is /v1/chat/completions (not /v1/v1/...).

Sub-second Timeout values get rounded down to zero by bifrost's int(seconds) conversion at the NetworkConfig.DefaultRequestTimeoutInSeconds boundary. Operators who need sub-second timeouts wait for Phase 33b's NetworkConfig widening; today the practical minimum is 1 second. The custom-provider wire timeout test uses 1s vs 3s server sleep to clear this boundary.


D-043 — LLM-edge compose order: retry(downgrade(corrections(safety(driver)))); OutputMode.Tools is Harbor-side prompted output, not provider tool-calling; Validator is a CompleteRequest field

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.5, docs/plans/phase-35-structured-output.md, docs/plans/phase-36-retry-feedback.md, internal/llm/llm.go (OutputMode enum + CompleteRequest.Validator field + ModelProfile.OutputMode/MaxRetries), internal/llm/registry.go (RegisterDowngradeWrapper + RegisterRetryWrapper + the compose chain in Open), internal/llm/output/ (downgrade wrapper), internal/llm/retry/ (retry wrapper), internal/llm/errors.go (IsInvalidJSONSchemaError + new sentinels), internal/llm/events.go (ModeDowngradedPayload + RetryWithFeedbackPayload), brief 03 §6, brief 07.

Why: Phases 35 + 36 ship two new wrappers on top of Phase 32's safetyClient and Phase 34's corrections.Wrap. Three design calls warrant a settled entry.

  1. Compose order — retry(downgrade(corrections(safety(driver)))). Three principles drive the order, outermost first:

    • Retry is outermost. A validator-driven retry appends a corrective user turn to the conversation; the new turn must flow through corrections + downgrade + safety on each attempt. Corrections normalize message ordering (NIM rejects mid-thread system) — if retry sat INSIDE corrections, the corrected message slice would be augmented with the corrective turn AFTER the reorder, breaking the invariant on the second attempt.
    • Downgrade sits between retry and corrections. A downgrade rewrites ResponseFormat (e.g. json_schemajson_object + system-prompt instruction); corrections then re-shape the per-provider envelope for the rewritten format (Anthropic envelope translation; JSONOnly stash-the-schema hint). If downgrade sat INSIDE corrections, the corrections layer would only see the ORIGINAL format; the downgraded format would skip the per-provider shaping.
    • Corrections sit between downgrade and safety. Settled by D-041. The safety net (D-039 / mandatory-by-construction) sees the post-corrections request — leak-detection and the token-budget guard apply to the final outgoing payload regardless of whether downgrade or retry rewrote it.

    The chain is composed in llm.Open via three write-once hooks: RegisterCorrectionsWrapper (Phase 34), RegisterDowngradeWrapper (Phase 35), RegisterRetryWrapper (Phase 36). The wrappers self-register via init() in their respective sub-packages; cmd/harbor/main.go blank-imports them. The ConfigSnapshot.DisableDowngrade / DisableRetry inverse-named knobs (zero-value = enabled) let tests exercise lower layers in isolation.

  2. OutputMode.Tools is a Harbor-side prompted-output strategy, NOT provider tool-calling. RFC §6.4 + brief 07 keep tool dispatch runtime-side. OutputMode.Tools asks the model to emit {"name":"respond_with","arguments":{...}} as plain JSON output (parsed locally by the runtime); the bifrost driver never sees provider-native tools= / tool_choice= / function_call / tool_use parameters. The static guard in scripts/smoke/phase-35.sh greps internal/llm/output/ for the canonical provider-tool-call symbol names; a leak fails the smoke. The package godoc names the boundary explicitly so future readers don't reintroduce the violation by reaching for bifrost's native tool-call API.

  3. Validator is a field on CompleteRequest, not a separate method. Two alternatives were considered: (a) Validator func(CompleteResponse) error field on CompleteRequest — the retry wrapper runs the loop internally; (b) a client.Validate(resp) error method on LLMClient — callers run the loop themselves. Option (a) wins because Phase 36a / governance wraps the OUTER client with PreCall / PostCall hooks — the retry loop must stay INSIDE the governance wrapper so each retry's call counts against the identity budget. Surfacing the loop as a caller-driven Validate method would leak retry semantics to governance and require every caller to re-implement the bounded loop. The field-on-request shape also lets the validator be nil (the common case) — the wrapper becomes a pure pass-through with one branch.

The wrapper's corrective-sub-prompt template ships fixed at Phase 36: assistant turn echoes the rejected content; a user turn says "Your previous response failed validation: <truncated reason>. Please respond again, addressing this issue exactly." Tuning is post-V1 — operators who need a different template can shadow-wrap the retry layer in their own code.

IsInvalidJSONSchemaError is the boundary the downgrade wrapper uses to classify driver errors. The classifier checks (1) errors.Is(err, ErrInvalidJSONSchema) for drivers that wrap with the sentinel, and (2) a small case-insensitive substring allowlist (json_schema, json schema, invalid schema, schema validation, response_format, response format, structured output, json mode, json_object). The allowlist is deliberately narrow to avoid false-positive downgrades on transient / auth / 5xx failures. Drivers can tighten the classification by wrapping their provider-specific schema errors with ErrInvalidJSONSchema — Phase 33's bifrost driver is a §17.6 follow-up candidate.

ResponseFormatProfile.ResponseFormatJSONOnly (Phase 34) and OutputMode.Prompted (Phase 35) are deliberately distinct concepts. JSONOnly is a corrections-layer per-provider quirk: "this provider rejects json_schema at the wire level, surface schema as Extra["schema_hint"]." Prompted is a Harbor-side output-mode strategy: "skip native schema enforcement entirely, instruct the model to emit JSON matching the schema via system prompt." They compose: a Prompted request flowing through a JSONOnly profile would have the schema both in the system prompt (Prompted's job) and in Extra["schema_hint"] (JSONOnly's job, when a FormatJSONSchema survives). Operators who want one or the other (not both) set OutputMode and leave the corrections profile default, or vice versa.


<!-- Append new entries below this line in the form:

D-NNN — <one-line summary>

Date: YYYY-MM-DD Status: Settled | Tentative | Superseded by D-MMM | Reverted Where it lives: <files> Why: <2-3 sentences> -->

D-044 — Governance ships latent at V1: interface + math wired, every enforcement path is opt-in; PostCall is the in-band cost accumulator; compose order governance(retry(downgrade(corrections(safety(driver)))))

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.15, docs/plans/phase-36a-cost-accumulator.md, docs/plans/phase-36b-rate-limit-maxtokens.md, internal/governance/ (Subsystem + Wrap + CostAccumulator + RateLimiter + MaxTokensEnforcer + Compound + registry + events + errors), internal/llm/registry.go (RegisterGovernanceWrapper + the new outermost compose step in Open), internal/config/config.go (GovernanceConfig.IdentityTiers + DefaultTier + GovernanceTierConfig + GovernanceRateLimitConfig), internal/config/validate.go (the tier validator block), examples/harbor.yaml (the latent-default + commented opt-in block), brief 03 §6, brief 06 §3.

Why: Phases 36a + 36b establish Harbor's governance subsystem (cost ceilings, rate limits, per-call MaxTokens) wrapping the LLM-edge chain. Four design calls warrant a settled entry.

  1. Latent V1 default. The interface, accumulator math, token-bucket math, persistence (three-driver state-store conformance), event taxonomy (governance.budget_exceeded / governance.rate_limited / governance.maxtokens_exceeded), and the compose seam all ship in Wave 7b. Every enforcement path defaults to permit — an operator must populate Governance.IdentityTiers with at least one tier (and set DefaultTier or supply a custom TierResolver) for any policy to fire. Each tier's fields (BudgetCeilingUSD, RateLimit, MaxTokens) are independently opt-in. This is the Wave 7b scoping decision: V1 ships plumbing visible to operators but enforcement waits on operator policy. Future Protocol-driven setters (post-V1 phase 91) let Console flip tiers without restart.

  2. PostCall is the in-band cost accumulator path — NOT a subscription to llm.cost.recorded. The cost-recorded event fires from the bifrost driver (Phase 33's emitCostRecorded) and remains the operator-facing observability stream. The governance accumulator updates synchronously in PostCall per RFC §6.15 line 1128 ("PostCall... Accumulates cost / tokens / latency"). A subscriber-based accumulator opens a race window where the next PreCall checks the ceiling before the previous call's cost lands; ceiling enforcement correctness requires synchronous update. The atomic CAS (math.Float64bits + CompareAndSwap) lets concurrent PostCalls accumulate lock-free.

  3. Compose order governance(retry(downgrade(corrections(safety(driver))))). Governance is the OUTERMOST wrapper, sitting outside Phase 36's retry per D-043 + master plan line 420. A PreCall that fires ErrBudgetExceeded MUST short-circuit before retry / downgrade burn attempts; rejecting once is the correct semantics. PostCall runs after the entire downstream chain returns, so it sees the final outcome (post-downgrade, post-retry). governance.SetFactory is the per-process hook; cmd/harbor blank-imports internal/governance so the wrapper hook seats at boot. With no factory set, the hook is a pass-through (latent default — even a registered package import does not implicitly enforce).

  4. Concurrent-call ceiling overshoot is bounded, not zero. The PreCall→inner→PostCall sequence creates a race where N concurrent in-flight calls can each see "below ceiling" before any PostCall lands. The accumulator overshoots by at most in_flight × per_call_max_cost. The conformance test asserts total ≤ ceiling + N × per_call_max_cost rather than strict equality. Operators who need first-cross-blocks-everyone semantics get them post-V1 via the unified pause/resume primitive (RFC §6.15 line 1181) — V1 ships eventually-consistent ceilings. governance.budget_exceeded events emit only from PreCall on the NEXT call after a breach; a PostCall that pushes the accumulator over the ceiling is accepted (the call already happened) and the breach surfaces via the cost-recorded observability stream.

The MaxTokens semantic is fail-loud not clamp (master plan line 420 + RFC §6.15 line 1122 both say ErrMaxTokensExceeded). Refunds on call failure are out of scope (RFC §6.15 simplicity — drain-on-PreCall is final). State persistence is one record per identity (Kind=governance.cost for accumulator, Kind=governance.bucket for buckets), JSON-encoded for cross-driver byte-stability; the wire shape carries a schema version field for forward-compat.

governance.NewCompound(subs...) bundles MaxTokensEnforcer (cheapest reject — no state I/O), RateLimiter (per-key mutex + per-identity state write), and CostAccumulator (state I/O on every PostCall) into one Subsystem. Fan-out order is operator-driven; the default ordering puts cheapest-first so a likely rejection short-circuits before reaching the state-heavy accumulator.

D-045 — Skills LocalDB driver owns its own tables (no piggyback on StateStore); FTS5 detected at open with deterministic regex/exact fallback

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.7, docs/plans/phase-37-skills-store.md, internal/skills/skills.go (Deps has no State field), internal/skills/drivers/localdb/localdb.go (driver opens its own DB), internal/skills/drivers/localdb/migrations/0001_init.sql (own skills + skills_fts schema), internal/skills/drivers/localdb/search.go (FTS5 → regex → exact ladder), brief 04 §4.3 + §4.4.

Why: Phase 37 lands the SkillStore subsystem. Two design calls warrant a settled entry.

  1. D-034 analog: skills drivers own their tables; the Deps struct does NOT carry a StateStore. Memory's Phase 25 settled the precedent — persistent memory drivers own a dedicated memory_state table rather than piggybacking on the StateStore's state_records shape (D-034). The skills LocalDB driver follows the same pattern with a dedicated skills + skills_fts schema. Three reasons compound:

    • Schema fit. Skill has 20+ load-bearing columns (Origin, OriginRef, Scope, ScopeTenantID, ScopeProjectID, ContentHash, JSON-encoded slices, lifecycle timestamps). The StateStore's (Quadruple, Kind, Bytes) envelope means every column lookup is a JSON probe — fine for opaque memory blobs, not for an indexed FTS5 corpus.
    • FTS5 needs a real table. The FTS5 virtual table uses content='skills' content_rowid='rowid' external-content mode + INSERT/DELETE/UPDATE triggers to mirror the skills table. Building this against state_records would require a phantom-content table and a custom rowid mapping; the per-driver skills schema is cleaner.
    • Cross-driver portability. The Portico SkillStore driver (post-V1) talks to a remote MCP server and has no StateStore need either — keeping the seam free of StateStore obligations widens the door for future drivers (Git, OCI, HTTP).
  2. FTS5 availability detected at open via a probe query; the ranking ladder gracefully falls through to regex/exact when FTS5 is unavailable. brief 04 §4.4 mandates the fallback test. modernc.org/sqlite compiles FTS5 in by default, so the production path always uses FTS5; the fallback is a correctness gate for builds (and for tests that force ftsAvailable = false via the internal test surface). No operator-facing knob to force the regex/exact path — that would be a "two parallel implementations of the same feature" (AGENTS.md §13 forbidden practice). Detection is mechanical: SELECT count(*) FROM skills_fts WHERE skills_fts MATCH '__fts_probe__' either succeeds (FTS5 alive) or errors (the migration's CREATE VIRTUAL TABLE rolled back on a build without FTS5).

D-046 — Skill ContentHash is sha256 over canonicalised content fields, excluding Origin / OriginRef / Scope / lifecycle timestamps

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.7, docs/plans/phase-37-skills-store.md, internal/skills/wire.go (CanonicalContentHash), internal/skills/drivers/localdb/localdb.go (LWW + idempotency check uses the hash), brief 04 §4.8.

Why: Conflict policy needs a deterministic gate. brief 04 §4.8 says "Generated → Generated: last-write-wins gated by content_hash change" — the hash must be stable across re-imports (the same Skills.md pack imported twice produces the same hash) and resilient to caller-side normalisation noise.

The canonical hash envelope:

  • Included: Name, Title, Description, Trigger, TaskType, sorted Tags, ordered Steps, ordered Preconditions, ordered FailureModes, sorted RequiredTools, sorted RequiredNS, sorted RequiredTags, Extra (key-sorted text rendering).
  • Excluded: Origin, OriginRef, Scope, ScopeTenantID, ScopeProjectID — provenance metadata that legitimately differs across import paths without representing content drift.
  • Excluded: CreatedAt, UpdatedAt, LastUsed, UseCount — lifecycle state that evolves over a row's life.

Slice fields are sorted before hashing when ordering is non-semantic (Tags, RequiredTools, RequiredNS, RequiredTags); preserved when ordering is semantic (Steps, Preconditions, FailureModes — these are procedural prose rendered to the planner in declared order). Field separator is \x1f (ASCII unit-separator) so caller-supplied newlines / whitespace / pipes can't collide with the envelope framing.

Extra participates because the generator may stamp model-specific metadata there that legitimately differs between drafts even when the body text is identical (e.g. the model fingerprint that produced the skill). The renderer accepts string / int / int64 / float64 / bool / nil and substitutes <unhashable> for anything else so a caller-side type bug yields a stable hash rather than a panic or non-deterministic ordering.

The hash version is implicit at V1 — changes to the envelope format (adding / removing / reordering fields) require a 0002_*.sql migration that rehashes existing rows AND a new decisions entry naming the old/new envelope. Operators with frozen content_hash values in external systems are explicitly out of scope at V1; we cross that bridge when a downstream system surfaces the hash externally.


D-047 — Planner package owns PauseReason, FinishReason, WakeMode, and the SpawnSpec shape; the TaskRegistry stays neutral on wake-mode

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.2, RFC §6.3, RFC §3.2, docs/plans/phase-42-planner-iface.md, internal/planner/planner.go (PauseReason, FinishReason), internal/planner/wake.go (WakeMode, WakeAware, ResolveWakeMode), internal/planner/decision.go (SpawnSpec wrapping the planner-side subset of tasks.SpawnRequest), brief 02 §2.

Why: Phase 42 lands the planner's swappable seam (CLAUDE.md §1 / RFC §3.2). Four design calls warrant a settled entry.

  1. PauseReason lives in the planner package, not in a pauseresume package. The unified pause/resume primitive (later phase) is not yet shipped; brief 02 §2 sketches PauseReason as a planner-local type. Phase 42 follows the sketch — the four canonical values (approval_required, await_input, external_event, constraints_conflict) live in internal/planner/planner.go. When the unified pauseresume phase lands, it MAY canonicalise via a typedef bridge (pauseresume.Reason = planner.PauseReason) without changing call sites. The enum values match the RFC §6.3 canonical strings exactly, so the bridge is byte-stable.

  2. SpawnSpec is a planner-side projection of tasks.SpawnRequest, not a duplicate type. Brief 02 §2 sketches SpawnTask{ Kind TaskKind; Spec TaskSpec } as planner-local types. Phase 42 departs: SpawnTask.Kind is the production tasks.TaskKind; SpawnTask.Spec is planner.SpawnSpec (the planner-visible subset — Description, Query, Priority, RetainTurn, FailFast). The Runtime fills the rest of tasks.SpawnRequest (Identity from the run quadruple; IdempotencyKey from the planner step counter; PropagateOnCancel from the default; NotifyOnComplete from the spawn intent) at dispatch time. Duplicating tasks.TaskKind in the planner would be a §13 "two parallel implementations of the same conceptual feature" smell — internal/tasks is NOT a internal/runtime/... package, so the import is fine.

  3. WakeMode enum + optional WakeAware interface live in the planner package — the TaskRegistry stays neutral (D-032). D-032 settled that the wake-on-resolution strategy (push / poll / hybrid) is a planner-concrete concern, not a registry concern. Phase 42's internal/planner/wake.go ships the canonical enum + the optional WakeAware interface a concrete may implement to declare its mode. The conformance pack (Phase 49) uses planner.ResolveWakeMode(planner.Planner) WakeMode (which falls back to WakePush for concretes that skip WakeAware) to assert the round-trip. The WakeAware interface is NOT a Supports* capability protocol (§4.4 forbids those when all V1 drivers implement everything) — it's identity / metadata for a single mode each concrete picks at construction time. The conformance assertion exercises BOTH branches (concretes with WakeAware AND concretes without).

  4. FinishReason is canonical at the planner edge, NOT at the Protocol edge. The Protocol's task.completed / task.failed event payloads (later phase) project FinishReason into a Protocol-stable representation; the planner-internal enum is the truth source. Phase 42's enum (goal, no_path, cancelled, deadline_exceeded, constraints_conflict) covers the V1 terminals; future phases (phase-44-schema-repair, phase-50-pauseresume) add no new reasons — every terminal collapses to one of these five. IsValidFinishReason is the validator the Runtime executor will use to reject malformed Decisions before dispatch.

Additionally, Phase 42 declares the planner-emitted event taxonomy (planner.decision, planner.finish, planner.error) in internal/planner/events.go and registers the types via events.RegisterEventType from the package init(). The payload structs land at Phase 45 (the first concrete that emits); registering the type names at Phase 42 lets future concretes emit without re-registering. The stub finish.Planner does not emit (Emit may be nil); concrete planners (Phase 45+) MUST nil-check before calling.

The §13 import-graph lint test (internal/planner/conformance/importgraph_test.go) is the gate that keeps internal/planner/... decoupled from internal/runtime/.... The test walks the planner subtree with go/parser and fails the build on any internal/runtime/... import. Concretes added at Phase 45 / 48 inherit the gate without re-authoring.

D-049 — Trajectory fail-loudly Serialize contract lives in internal/planner/trajectory/; process-local handle registry at V1; canonical JSON ordering; Phase 42 stub retired

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.2, RFC §3.4, RFC §6.3, docs/plans/phase-43-trajectory.md, internal/planner/trajectory/trajectory.go (Trajectory + Step + nested types), internal/planner/trajectory/toolcontext.go (ToolContext split + HandleID), internal/planner/trajectory/registry.go (HandleRegistry interface + process-local driver), internal/planner/trajectory/errors.go (ErrUnserializable + ErrToolContextLost struct sentinels), internal/planner/trajectory/serialize.go (Serialize + Deserialize + reflective walker), internal/planner/trajectory.go (alias re-exports from the subpackage), brief 02 §4.

Why: Phase 43 closes the load-bearing predecessor-bug: the silent-context-loss path where a non-serialisable handle in pause state was dropped silently. Four design calls warrant a settled entry.

  1. The trajectory subsystem lives at internal/planner/trajectory/, not directly in internal/planner/. The master plan's Subsystem column reads planner/trajectory; Phase 42 shipped the type skeleton inline in internal/planner/trajectory.go (file, not subpackage) as a placeholder. Phase 43 moves the load-bearing types (Trajectory, Step, ToolContext, HandleID, HandleRegistry, ErrUnserializable, ErrToolContextLost) into the canonical subpackage so the §4.4 extensibility-seam pattern applies — future drivers (a distributed handle registry, alternate serialisers) land alongside the existing process-local driver. The legacy planner-package types become type aliases (type Trajectory = trajectory.Trajectory) so existing call sites compile unchanged. Phase 42's stub ErrTrajectoryNotImplemented is retired: the only consumer was Phase 42's own test, which Phase 43 updates to exercise the real fail-loudly contract.

  2. Trajectory.Serialize uses a reflective pre-flight walker; the walker drives the canonical fail-loudly contract. The stdlib json.Marshal reports non-encodable types via *UnsupportedTypeError / *UnsupportedValueError — adequate for binary outcome but inadequate for the actionable field path the contract requires. Phase 43's walker recursively traverses the trajectory by reflect.Value, tracking the dotted field path ("Trajectory.Steps[3].Observation.callback"); on the first non-encodable leaf it returns (nil, ErrUnserializable{Field: <path>}). The walker mirrors encoding/json's encoding rules verbatim (chan / func / unsafe.Pointer / complex are rejected; nil interfaces / nil pointers / nil slices encode as JSON null; []byte encodes as base64; json.Marshaler implementers are probed; struct fields with json:"-" are skipped; cyclic graphs surface as ErrUnserializable{Field: ... <cycle>} via a visited-pointer-address map). On the happy path the walker passes; json.Marshal then produces the canonical bytes.

  3. HandleRegistry is process-local at V1; distributed-handle directory is a post-V1 RFC concern. RFC §6.3 already documents this constraint: "V1: process-local. Resume must run in the same Runtime process. The seam for a distributed handle directory exists (the registry is an interface) but no production driver ships at V1." Phase 43 ships the HandleRegistry interface (Set / Get / Delete) with one V1 driver — processLocalRegistry backed by sync.Map. The choice of sync.Map over map + RWMutex matches the read-heavy access pattern (one Set on tool dispatch, many Gets across pause/resume / planner steps); D-025 concurrent-reuse stress under -race is green with N=128. The fail-loud contract is Get returns (nil, ErrToolContextLost{Handle: id}) on miss — never (nil, nil). This is the load-bearing closure: the predecessor's try { ... } catch { return None } shape is rejected here, in Trajectory.Serialize, and in the planner-package alias re-exports — three places enforcing the same invariant.

  4. Canonical JSON ordering: declaration-order struct fields + alphabetised map keys. The stdlib encoding/json emits struct fields in declaration order (per JSON tag) and alphabetises map[string]X keys. Combined with explicit JSON tags on every Trajectory field, the canonical form is stable across re-encoding when any-valued fields hold JSON-tree shapes (map[string]any / []any / primitives). The runtime planner-step builder (later phase) follows this discipline; Phase 43's golden-bytes test uses the same shape and pins the canonical encoding. When any values hold Go structs, the first encoding (declaration-order) and the second encoding (alphabetised map after Deserialize) MAY diverge — the godoc on LLMContext / HintState / Step.Observation documents the discipline.

Round-trip byte stability is the load-bearing acceptance criterion from RFC §3.4 + brief 02 §4: Serialize → Deserialize → Serialize produces byte-identical output. The invariant is asserted in trajectory_test.go::TestRoundTrip_ByteStable against a populated trajectory using JSON-tree shapes throughout.

The §11 mandatory pause/resume serialisation test (toolcontext_test.go::TestPauseStateSerialisation_FailsLoudlyOnUnserializableContext) constructs a pause-state-shaped trajectory whose ToolContext.Serializable carries a live channel masquerading as a "config" value; asserts Serialize returns ErrUnserializable with the channel's key in the field path. The companion test TestResumeWithStaleHandle_ReturnsErrToolContextLost verifies the second half of the contract: a serialised trajectory carrying a HandleID whose registry mapping has died (simulated by a fresh HandleRegistry on the resume side) surfaces ErrToolContextLost on Get — never (nil, nil).

The D-025 concurrent-reuse contract is pinned in concurrent_test.go across four tests: N=128 goroutines serialising distinct trajectories, N=128 goroutines exercising HandleRegistry.Set/Get/Delete on disjoint IDs, N=128 goroutines reading a shared handle, and N=128 goroutines serialising a shared read-only trajectory. All four exit under -race with no leaks (baseline runtime.NumGoroutine restored), no context bleed, no byte-stability violations across concurrent invocations.

The §13 forbidden practice of "silent degradation" is closed by construction at three layers: Trajectory.Serialize (no try/catch → nil path), HandleRegistry.Get (no (nil, nil) return), and the planner-package alias re-exports (ErrUnserializable / ErrToolContextLost are public sentinels callers reach for via errors.As). Phase 51's pause-record contract (later phase) consumes this phase's Serialize bytes; Phase 51 inherits the fail-loud invariants without re-authoring.


D-050 — Repair ladder ordering (salvage → schema repair → graceful failure → multi-action salvage); graceful failure is Finish{NoPath} not error; Followup carried via Metadata; parser+loop both live under internal/planner/repair/

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.2 (Settled — "salvage → schema repair → graceful failure → multi-action salvage" + arg_fill_enabled / repair_attempts / max_consecutive_arg_failures knobs), docs/plans/phase-44-schema-repair.md, internal/planner/repair/repair.go (Config, RepairLoop, Run, gracefulFailure), internal/planner/repair/parser.go (ActionParser), internal/planner/events.go (EventTypePlannerRepairExhausted, RepairExhaustedPayload), brief 02 §6, brief 07 §3 + §8 + §10.

Why: Phase 44 lands the salvage / schema-repair / graceful-failure / multi-action-salvage ladder for planner steps. Four design calls warrant a settled entry.

  1. Ladder ordering is load-bearing: salvage → schema repair → graceful failure → multi-action salvage. RFC §6.2 states the ladder explicitly. The order is binding because each step's invariant depends on the prior step:

    • Salvage is FIRST because a malformed parse leaves the loop without typed []planner.CallTool to validate. The parser is the only tolerant pass in the ladder — it accepts fenced JSON (```json), prose-wrapped JSON, multi-object scans, and bare arrays. Brief 07 §3 catalogued the predecessor's parser modes; Phase 44's ActionParser.Parse ships them as the salvage step.
    • Schema repair is SECOND because validating args is meaningful only on a parsed action. The corrective sub-prompt names the tool + the validator's complaint verbatim ("argument X failed: <validator error>; please re-emit with the corrected field"), which is a focused signal the LLM can act on. Bounded by Config.RepairAttempts.
    • Graceful failure is THIRD (a terminal short-circuit, not a step) because brief 07 §10 catalogued the failure-mode-blind footgun in the predecessor: "if the model's response is consistently malformed, _repair_attempts (default 3) of identical-shape feedback may never converge." Phase 44's Config.MaxConsecutiveArgFailures is a separate counter from Config.RepairAttempts — identical-shape failures terminate via the consecutive-failure path even when the attempts budget is high. Default MaxConsecutiveArgFailures = 2 < RepairAttempts = 3 so the storm guard typically fires first.
    • Multi-action salvage is FOURTH (a packaging step, not a step) because it operates on the OUTPUT of salvage + repair. When the parser returned >1 well-formed CallTool and every one validates, the loop packages them as CallParallel{Branches: [...], Join: JoinAll} rather than re-asking the LLM. Concretes that want sequential salvage opt out by setting Config.ArgFillEnabled = false.
  2. Graceful failure is Finish{Reason: NoPath, Metadata["followup"]=true}, NOT an error. The repair loop's Run returns (planner.Decision, error). On graceful failure the loop returns (Finish{}, nil) — Finish IS the success path the planner contract describes. An error return would conflate two distinct conditions: (a) the LLM client surfaced a transient error (caller must retry / abort the run), vs. (b) the repair ladder exhausted (the planner step itself produced a terminal Finish that the runtime executor maps to task.completed with reason=no_path). Conflating them would break the planner contract (§13 forbids two-parallel-implementations of the same feature; here the feature is "what shape the planner returns at step end"). The planner.repair_exhausted event emit is the load-bearing observability surface — graceful failure is NOT silent (§13 silent-degradation ban). The event payload carries the attempt count, consecutive-failure counter, and truncated chain of validator reasons; operators see the failure loudly via the bus + audit pipeline.

  3. Followup carried via Metadata["followup"] = true, NOT a new field on planner.Finish. Brief 02 §6 spec'd Finish{Reason: NoPath, Followup: true} but Phase 42 froze the Finish struct (Reason, Payload, Metadata — D-047). Adding a Followup bool field would require touching every Phase 45 / 48 / 49 concrete and the conformance pack, and would re-litigate D-047. Metadata is already the documented surface for terminal-decision annotations (the stub finish.Planner uses it for run_id round-trip; Phase 45 will use it for the planner's free-form Reasoning hash). Phase 49's conformance pack reads Metadata["followup"] to detect the followup signal; glossary entries spell out the convention. Same applies to the auxiliary fields the loop stamps for observability: Metadata["repair_attempts"], Metadata["repair_consecutive_arg_failures"], Metadata["repair_chain"], Metadata["repair_error"].

  4. Parser + loop both live under internal/planner/repair/, NOT internal/runtime/planner/parser/. Brief 07 §8 sketched ActionParser at internal/runtime/planner/parser/. Phase 44 co-locates the parser with the loop under internal/planner/repair/. Three reasons:

    • Import-graph contract (Phase 42 settled). The planner subtree MUST NOT import internal/runtime/...internal/planner/conformance/importgraph_test.go is the §13 gate. The parser is a planner-side utility (it produces planner.CallTool shapes the loop returns to the runtime executor); it cannot live at internal/runtime/planner/parser/ without breaking the gate.
    • Master-plan glossary lines 927 + 930. "ActionParser (internal/runtime/planner/parser/) | 44 (Schema repair pipeline) + 45 (Reference ReAct planner)" + "RepairLoop | 44 (Schema repair pipeline)". The path "internal/runtime/planner/parser/" in the glossary is pre-RFC nomenclature; the RFC's settled home is internal/planner/.... Co-locating with the loop matches the "owned in one phase, consumed in another" pattern that the master plan glossary describes.
    • Single-package cohesion. Parser + loop + feedback builder + events live in one Go package. The package's godoc describes the ladder; the implementation is one file per concern (repair.go, parser.go, feedback.go); the test files mirror that split (repair_test.go, parser_test.go, integration_test.go, d025_test.go). Splitting parser into a sibling package would force a public API on what is structurally an implementation detail of the repair loop.

Additionally, Phase 44 ships:

  • Config.ArgFillEnabled — opt-in. When false the loop returns the parser's first valid action(s) verbatim and lets the dispatcher's tool.invalid_args reject path handle schema misfits. Phase 45 (ReAct) defaults this to true; Phase 48 (Deterministic) defaults it to false (the deterministic planner does not consume LLM output, so the knob is structurally irrelevant).
  • Config.RepairAttempts default = 3 matching brief 07 §3 step 5's predecessor default. The storm guard is Config.MaxConsecutiveArgFailures = 2 < 3 so the typical malformed-shape session terminates after 2 LLM calls rather than burning the full 3.
  • planner.repair_exhausted event taxonomy. The event type registers in internal/planner/events.go::init() alongside planner.decision / planner.finish / planner.error (Phase 42 entries). The typed RepairExhaustedPayload (SafePayload) carries Identity, Attempts, ConsecutiveArgFailures, Reasons []string (each entry truncated to 256 bytes), OccurredAt. The payload struct ships in the same PR as the emit site — distinct from the Phase 42 pattern where payload structs deferred to Phase 45 — because Phase 44 IS the first emitter, so deferral would be a fail-loudly violation.
  • No two-parallel-retry-implementations (§13). The repair loop calls llm.LLMClient.Complete; the LLM client (composed at internal/llm/registry.go::Open) already has the Phase 36 retry-with-feedback wrapper inside. Repair is OUTSIDE the LLM call (it consumes the response); retry-with-feedback is INSIDE the LLM call (it wraps a single attempt). The smoke script guards against internal/planner/repair/ importing internal/llm/retry — composition stays at the registry edge.

The internal/planner/repair/d025_test.go ships the N=128 concurrent-reuse stress: one shared RepairLoop instance, per-goroutine identity quadruples, four response patterns (clean salvage / parser-correction / multi-action / graceful-failure), per-call identity round-trip assertion at three boundaries (the stub client's seen-ctx, the success-path Decision's Reasoning field, the graceful-failure-path RepairExhaustedPayload.Identity). Pre-cancelled ctxes on i%5==0 verify cancellation cross-talk is absent.


D-048 — Phase 38 planner-skill tools: split into three Tools (not a SkillProvider struct); default-deny capability filter; chars/4 budgeter aligned with §6.5 LLM safety net

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.7, docs/plans/phase-38-skill-planner-tools.md, internal/skills/tools/tools.go (Register, searchHandler, getHandler, listHandler), internal/skills/tools/filter.go (capability subset gate), internal/skills/tools/redactor.go (tool-name + PII redaction), internal/skills/tools/budgeter.go (Fit ladder + ErrSkillTooLarge), brief 04 §4.5.

Why: Phase 38 lands the planner-facing surface for the skills subsystem. Three design calls warrant a settled entry.

  1. The planner-facing surface is three discrete Tools, not a SkillProvider struct. RFC §6.7's sketch shows a SkillProvider interface with Search / GetByName / List / Directory / FormatForInjection — modelled after the predecessor's monolithic provider. Phase 38 splits the surface across three Tools registered through the Phase 26 catalog (skill_search, skill_get, skill_list) plus Phase 39's Directory(cfg) API rather than a single struct. Two reasons:

    • Catalog dispatch uniformity. Every other Harbor tool (HTTP, MCP, A2A, in-process, flow) goes through the catalog; carving a separate dispatch path for skills would split the reliability shell (ToolPolicy — D-024) and the audit emit taxonomy (tool.invoked / tool.completed / tool.failed). Three Tools-on-the-catalog gives the planner the same observability surface as any other tool.
    • Per-tool ergonomics. skill_get carries the tiered budgeter; skill_search carries ranking; skill_list carries paging. A single SkillProvider.FormatForInjection would have to multiplex these concerns — three Tools keep each handler narrow. The capability filter + redactor are shared utilities (Filter, Redact), not a methods-on-a-struct API, so future tools (Phase 39 Directory, Phase 41 skill_propose) reuse them by calling.

    The departure from the RFC sketch is recorded here, not silent — future readers chasing the RFC's SkillProvider shape land here and see the rationale.

  2. Capability filter is default-deny. When CapabilityContext.AllowedTools / AllowedNamespaces / AllowedTags is empty, a skill with non-empty Required* lists is rejected. The predecessor's _skill_is_applicable documents the same stance ("required must be a subset of allowed"; empty allowed is a strict subset only of empty required) — Phase 38 ports it verbatim. The alternative ("empty allowed = everything passes") would silently leak high-capability skills into low-capability runs the first time an operator forgot to populate the allowed-set; default-deny fails closed, matches CLAUDE.md §6 rule 9 ("identity is mandatory"), and is the only stance that survives an operator-config-bug audit. Skills with empty Required* lists for every axis are unconstrained — they neither carry nor demand a capability.

  3. The tiered budgeter uses the chars/4 token estimator, aligned with the §6.5 LLM safety net (D-026). Two alternatives existed: (a) a tokenizer-backed estimator (tiktoken / Anthropic counter) for precision; (b) chars/4 for simplicity. Phase 38 picks (b) because:

    • Consistency with the safety net. RFC §6.5's context-window safety net uses chars/4 as its budget envelope at V1 (D-026); the planner-side budgeter MUST agree on the cost model or the safety net would surface ErrContextWindowExceeded on payloads the budgeter accepted. Same estimator → coherent gate.
    • CGo-free constraint. Most production tokenizers (tiktoken-go, anthropic-tokenizer) either pull a C library or a sizeable pre-trained vocabulary table; both inflate the binary and either break the CGo-free constraint or burden the cold-start footprint. The chars/4 envelope is byte-counting — zero binary cost, deterministic, well-understood industry low-precision heuristic.
    • Swappable point. The estimator lives in one function (tokensForcharsEstimate); a post-V1 swap-in via a tokenizer interface is a one-package change. We cross that bridge when an operator surfaces a real over-budget bug; until then, the safety net's chars/4 is the cost authority.

    The budgeter's ladder (full → drop optional → cap steps to 3 → ErrSkillTooLarge) ports brief 04 §4.5 verbatim. Step 4 fails loud per CLAUDE.md §5 — no silent degradation; the planner sees ErrSkillTooLarge wrapped and can either reformulate via LLM retry feedback or shrink its MaxTokens and retry.

The CapabilityContext value is a value-type carried on the args of all three Tools; it is never mutated in-flight and is safe to share across N goroutines (D-025). The Phase 38 helpers (Filter, Redact, Fit) are pure functions over value inputs — no shared state, no closures over per-run data. Phase 39's Directory(cfg) will reuse Filter + Redact directly; Phase 41's skill_propose(persist=true) will reuse the validator path on the input draft.

D-051 — Phase 45 ReAct planner: JSON-only action format with _finish reserved tool name; single-tool-call-per-step (multi-action salvage reduced to first); MaxSteps circuit breaker + planner.max_steps_exceeded fail-loudly emit; WakePush declaration ships ahead of the SpawnTask emission path; SpawnTask / AwaitTask / RequestPause emission deferred to later phases

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.2, RFC §3.2, docs/plans/phase-45-react-planner.md, internal/planner/react/react.go (ReActPlanner, FinishToolName, DefaultMaxSteps, DefaultSystemPrompt, the six functional options, Next, WakeMode, mapDecision, translateFinishCall, reduceToSingleAction, maxStepsExceeded, emitMaxStepsExceeded), internal/planner/react/prompt.go (PromptBuilder interface + defaultBuilder), internal/planner/events.go (EventTypePlannerMaxStepsExceeded, MaxStepsExceededPayload), internal/planner/conformance/conformance.go (Harness.RunContextFactory extension), brief 02 §2 + §4 + §5 + §6 + §7, brief 07 §2 + §3 + §5 + §10.

Why: Phase 45 lands Harbor's first concrete Planner implementation — the LLM-driven ReAct step loop that bridges Phase 32's LLMClient, Phase 43's Trajectory, Phase 44's RepairLoop, and Phase 42's Planner seam. Five design calls warrant a settled entry.

  1. JSON-only action format with _finish as a reserved prompt-time tool name — NOT a magic-string opcode in the Decision sum. Brief 02 §2 sketches the LLM emitting one of six Decision shapes (CallTool / CallParallel / SpawnTask / AwaitTask / RequestPause / Finish); brief 07 §3 catalogues the predecessor's parser stack and §5 documents the assistant/user-rendered observation shape. Phase 45 narrows the V1 prompt-emission surface to exactly two envelopes:

    • {"tool": "<name>", "args": {...}, "reasoning": "..."} — a tool call.
    • {"tool": "_finish", "args": {"answer": "..."}, "reasoning": "..."} — completion.

    The reserved _finish name is intercepted by the planner BEFORE it returns the Decision. react.translateFinishCall translates the parsed CallTool{Tool: "_finish"} into planner.Finish{Reason: planner.FinishGoal, Payload: <args.answer>} — the Decision sum stays sealed; the planner contract surfaces only the typed Finish shape. The predecessor's "magic strings as next_node" anti-pattern (D-047) is explicitly rejected: _finish lives in the LLM-prompt convention, NOT in the planner-internal Decision opcodes. The leading underscore is a documented hygiene convention; future runtime catalog registration MAY reject _-prefixed tool names to make the collision impossible. The integration with Phase 44's repair.RepairLoop flows naturally — the loop returns a CallTool with the reserved name; the planner's mapDecision switch detects and translates BEFORE the runtime executor would dispatch the reserved name as a real tool.

  2. Single-tool-call-per-step semantics with multi-action salvage reduced to the first action. RFC §6.2 + the Phase 45 master-plan detail block: "LLM call loop, JSON-only action format, tool selection, completion detection, single tool call per step. No parallel, no schema repair beyond a single retry." Phase 44's RepairLoop ships multi-action salvage as a CallParallel (D-050 — when the parser returns >1 well-formed CallTool and every one validates, the loop promotes to CallParallel{JoinAll}). Phase 45 overrides this at the planner concrete level: react.reduceToSingleAction collapses a CallParallel from the loop to its first CallTool. The rest are dropped — V1 minimum viable per the master-plan detail block. Three rationales:

    • No parallel executor exists yet. Phase 47 ships CallParallel execution (Deps: 45, 14); returning CallParallel from Phase 45 would prematurely commit to a runtime dispatch path with no executor. The error would surface as planner.ErrInvalidDecision at runtime dispatch time, but the planner's contract is to return a Decision the runtime CAN execute.
    • Unwind point is one method. reduceToSingleAction is the entire override surface. Phase 47 deletes the override; the rest of the planner is unchanged. The brief 02 §6 "queue the additional read-only tool calls for sequential execution without another LLM hop" promise revisits at Phase 47 — until then, the dropped actions ARE NOT surfaced as fallback context to the next prompt (a forwarding-the-rejected-actions path would have no test coverage until Phase 47 lands).
    • Special case for the _finish first branch. When the first branch of a multi-action salvage is the reserved _finish tool, the reduction must still translate to a Finish Decision (the completion semantics MUST NOT change with the reduction). The unit test TestNext_ParallelWithFinishFirstStillFinishes pins this.

    Brief 02 §6 lists multi-action salvage as the Phase 44 default; Phase 45 departs at the planner concrete level (not at the loop level — the loop is reused as-is per §13 two-parallel-implementations ban).

  3. MaxSteps circuit breaker as planner-side defence in depth — accompanied by the planner.max_steps_exceeded fail-loudly emit. Brief 02 §2 puts MaxSteps / HopBudget at runtime level only; RFC §6.2's RunContext.Budget.HopBudget is the authoritative runtime gate. Phase 45 ALSO ships a planner-side WithMaxSteps functional option (default 12) as a circuit breaker against a buggy LLM mock that never returns _finish AND a runtime that hasn't yet wired the hop-budget enforcement (Phase 47+). When len(rc.Trajectory.Steps) >= MaxSteps at the start of Next, the planner:

    • Emits planner.max_steps_exceeded (registered in internal/planner/events.go alongside planner.repair_exhausted; typed MaxStepsExceededPayload SafePayload carries Identity, MaxSteps, StepsObserved, LastTool, OccurredAt).
    • Returns Finish{Reason: planner.FinishNoPath, Metadata: {"max_steps_exceeded": true, "max_steps": <cap>, "steps_observed": <count>, "last_tool": <name>, "run_id": <runID>, "via": "react.maxStepsExceeded"}}.
    • Does NOT call the LLM (the breaker fires BEFORE any LLM call — a runaway must not burn additional completions).

    The emit is the load-bearing observability surface that makes the breaker NOT silent (§13 silent-degradation ban). The same fail-loudly shape as Phase 44's planner.repair_exhausted — different graceful-failure source (repair-loop exhaustion vs. planner-side step cap), same observability shape. When Phase 47's runtime hop-budget enforcement lands, MaxSteps becomes a redundant defence in depth (preferred over a load-bearing single gate). The runtime's hop budget remains the authoritative gate; the planner's MaxSteps is the secondary one.

  4. WakePush declaration (D-032) ships at Phase 45 ahead of the SpawnTask emission path. Phase 45's master-plan detail block: "ReAct ships the push wake mode (D-032): a non-retain-turn SpawnTask returns control to the runtime; the runtime registers the planner against tasks.WatchGroup; on GroupCompletion the runtime re-invokes Planner.Next with the resolved MemberOutcome slice surfaced through RunContext." Phase 45's ReActPlanner implements planner.WakeAware returning planner.WakePush; the conformance pack's WakeMode_Declared subtest asserts planner.ResolveWakeMode(reactPlanner) == planner.WakePush. The SpawnTask emission PATH itself is deferred to a later concrete-planner upgrade — the V1 prompt schema is intentionally narrow (only CallTool / _finish); SpawnTask emission would need additional prompt-engineering surface to describe background tasks to the LLM, which is out of scope for "minimum viable." The WakePush declaration is still load-bearing: it binds ReAct to the conformance pack's wake-mode-round-trip subtest (Phase 49) so that when SpawnTask emission lands, the binding is already in place.

  5. Phase 45 V1 deferrals: SpawnTask / AwaitTask / RequestPause emission, multi-action fallback-context forwarding, runtime loop, trajectory compression. The master-plan detail block reads "minimum viable"; Phase 45 ships exactly the surface the spec names. Deferrals:

    • SpawnTask / AwaitTask emission: the prompt schema doesn't describe background tasks. A later planner upgrade (or a separate concrete) extends the schema; Phase 45 surfaces only CallTool / _finish to the LLM.
    • RequestPause emission: Phase 50 ships the unified pause/resume primitive; until then, there's no pauseresume.Coordinator for RequestPause to dispatch into. Phase 45 observes rc.Control.PauseRequested from incoming steering but does NOT emit RequestPause itself.
    • Multi-action fallback-context forwarding: the rejected actions in reduceToSingleAction are dropped at V1 (no forwarding to the next prompt). Phase 47 will revisit when the parallel executor exists.
    • Runtime loop / multi-step orchestration: Phase 45 ships Next(ctx, rc) (Decision, error) — ONE step. The runtime executor that calls Next in a loop, executes Decisions, and threads observations back into the next prompt lands in the planner-runtime wiring phases (Phase 47+).
    • Trajectory compression / summariser: the prompt builder consumes Trajectory.Summary when set (the read path is shipped); the summariser that populates Trajectory.Summary lands in Phase 46.

    The deferrals are recorded here, not silent — future readers chasing the planner concrete's full surface land in this entry first.

Additionally, Phase 45 extends internal/planner/conformance/conformance.Harness with an optional RunContextFactory field so the Sanity scenario receives a populated identity quadruple. The Phase 42 harness skeleton's Sanity subtest passed a zero RunContext; the stub finish.Planner accepted it because that stub does NOT enforce identity. Phase 45's planner enforces identity (§6 rule 9 + D-001) and would otherwise fail the Sanity scenario. The harness extension is backward-compatible (nil RunContextFactory falls back to the zero RunContext for the stub).

The internal/planner/react/d025_test.go ships the N=128 concurrent-reuse stress: one shared *ReActPlanner instance, per-goroutine identity quadruples + ctxes, per-goroutine LLM stubs returning _finish envelopes whose args.answer carries the run's RunID. The terminal Finish.Payload is asserted to match the goroutine's RunID (no identity bleed); pre-cancelled ctxes on i%5==0 return ctx.Err() (no cancellation cross-talk); the goroutine baseline is restored within 500ms of WaitGroup join (no leak). The shared planner's StepsTaken() atomic counter is asserted to match the expected non-cancelled count, proving the per-call mutation is correctly atomic.

The §13 import-graph contract is preserved by construction — internal/planner/react/ imports only internal/llm, internal/planner, internal/planner/repair, internal/events, internal/tools, and stdlib packages. No internal/runtime/... imports; the Phase 42 lint test (internal/planner/conformance/importgraph_test.go) covers the new package by construction (it walks the entire planner subtree). The Phase 45 smoke script asserts the same via grep at every preflight gate.


D-053 — Phase 40 Skills.md importer: byte-stable round-trip via raw-frontmatter passthrough and line-based body parsing; attachments as ArtifactRef option (b); fail-closed at every parse failure mode

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.7, docs/plans/phase-40-skills-importer.md, internal/skills/importer/importer.go (Importer interface + Import / Export / Close, Deps{Store}, sentinels ErrMissingFrontmatter / ErrMalformedYAML / ErrMissingTrigger / ErrEmptySteps / ErrUnknownSection / ErrAttachmentOutsideRoot / ErrInvalidAttachmentRef / ErrRoundTripDrift / ErrImporterClosed), internal/skills/importer/parser.go (scanFrontmatter, parseFrontmatter, bodyParse, resolveAttachments, uploadAttachment, classifySection, slugify, nameFallbackFromHint, doImport), internal/skills/importer/exporter.go (doExport, synthesiseFrontmatter, desubstituteArtifacts), internal/skills/importer/path_safety.go (resolveSafePath, pathHasPrefix), internal/skills/importer/testdata/golden/*.md + *.want.json (5 fixtures), brief 04 §4.7 + §5 + §6.

Why: Phase 40 closes the predecessor's per-skill-manual-adaptation gap — the load-bearing Harbor-defining feature (RFC §6.7, brief 04 §1). The byte-stable round-trip Export(Import(b)) == b is the tested invariant that distinguishes a working importer from a working-by-coincidence importer. Four design calls warrant a settled entry.

  1. Byte-stable round-trip via raw-frontmatter passthrough — NOT YAML re-emission. Brief 04 §4.7 step 1 says "CommonMark-only parser"; step 5 says "round-trip byte-stable." A naive implementation parses YAML into a struct, parses Markdown into an AST, and re-emits both — which never survives the round-trip because (a) every YAML emitter has its own key-ordering / quoting / spacing convention, (b) every CommonMark AST loses some source-side fidelity (heading underline style, list bullet character, blank-line-between-paragraphs count). Phase 40 picks a different shape:

    • Frontmatter: the importer captures the raw bytes between the --- fences VERBATIM via scanFrontmatter. The parsed frontmatterFields struct is used for value extraction (validation, slugified-name fallback, Skill struct population); the raw bytes are stashed in Skill.Extra["_importer.frontmatter_raw"] for Export. doExport reads the raw bytes back and emits them between fresh --- fences. Authors hand-ordering keys (name first, description second — a common Skills.md convention) round-trip byte-stable.
    • Body: the importer ships a line-based deterministic parser (bodyParse) — strictly stricter than CommonMark, but deterministic by construction. Section headings (## Steps, ## Preconditions, ## Failure modes) are accepted with case + plural + trailing-colon variations on the parse side; Export emits the canonical heading (canonicalHeading). A source with ## steps parses correctly but does NOT round-trip byte-stable — the invariant gates canonical sources only. The golden corpus uses canonical headings throughout.
    • List items: one line per item, prefix -␣ (dash-space) required. The parser rejects lazy-continuation list items (CommonMark allows them; Skills.md is stricter). Blank lines inside a section are tolerated as separators; non-list-item prose inside a section is rejected via ErrUnknownSection.

    The departure from "CommonMark-only parser" is recorded in the plan's "Findings I'm departing from" section. Two reasons: (a) full CommonMark parsers (e.g. goldmark) ship AST-rendering only, not AST-to-source emission, so a round-trip through one would still need to carry the original source text and re-emit from it — which is what the line-based parser does directly; (b) adding a new dependency for a single use-case violates the CLAUDE.md §13 forbidden-practices section on heavy frameworks. The line-based parser uses only stdlib + the existing goccy/go-yaml (already used by internal/config/loader.go).

  2. Attachments resolve to artifacts.ArtifactRef (option (b) per RFC §6.7). Brief 04 §5 surfaced three options for inline ![alt](path) references: (a) inline at import (simple, blows up the Skill row); (b) store as artifact references (clean but couples to artifact subsystem); (c) keep filesystem-backed and re-resolve at injection (fast but breaks once skills move between machines). RFC §6.7 settled on (b) — Phase 40 implements it:

    • On Import, resolveAttachments walks the description + every section list item via imageRefRegexp. For each ![alt](path) reference: read the file under ImportSource.AllowedRoot (path-safety guarded — see point 3), upload via Deps.Store.PutBytes(ctx, src.Scope, data, {Namespace: "skills-importer"}), replace the path in the body with artifact://<ArtifactRef.ID>. The mapping (Path → Ref) is captured in ImportArtifacts.PathToRef.
    • On Export, desubstituteArtifacts walks the body for artifact://<ID> markers and substitutes each ID back to its source-side path verbatim via the reverse lookup. A dangling ID (not in PathToRef) returns wrapped ErrInvalidAttachmentRef — Export never silently emits a broken reference.
    • URL / data:URI refs (http://, https://, data:, artifact://) are kept verbatim and NOT uploaded — they don't resolve to filesystem paths; the importer stays offline at V1 (no network calls).
    • Duplicate paths in one source return ErrInvalidAttachmentRef at Import time. Duplicates would break Export's injectivity (one path → many refs → the reverse mapping is ambiguous) and are an authoring smell — fail-closed at parse time.
    • ArtifactScope is caller-supplied via ImportSource.Scope. The importer does NOT synthesise the scope itself — callers (Phase 60+ upload handlers) thread the identity quadruple plus the import-task ID through. The convention (documented in the plan, not enforced by the importer) is TaskID = "import:" + sha256(src)[:12] so all attachments of one Skills.md file cluster under a stable task-shaped key.
  3. Path-traversal protection at path_safety.go (CLAUDE.md §7 #5). Every relative attachment path is resolved via:

    • filepath.IsAbs rejection — Skills.md is path-relative-to-source; absolute paths are rejected with wrapped ErrAttachmentOutsideRoot.
    • Empty path / empty AllowedRoot rejection — the operator must declare a safe root; empty fields are rejected to fail closed.
    • filepath.Clean + pathHasPrefix(joined, canonicalRoot) lexical check — the standard traversal guard. The pathHasPrefix helper avoids the /a matching /abc false-positive by appending the OS separator to the root before the prefix check.
    • filepath.EvalSymlinks symlink check — when the path exists, both the joined path and the canonical root are evaluated for symlinks; the prefix check is repeated on the evaluated paths. This blocks the attachments/link -> ../../outside.txt escape. When the path does NOT exist (the caller is probing — currently not a path the importer takes, but defended for future read-before-write callers), the symlink-eval step is skipped and the lexical check carries.

    The helper is the canonical path-safety guard for the skills subsystem; future skills-side callers (Phase 41 generator if it persists attachments, etc.) reuse it.

  4. Fail-closed at every parse failure mode — no lenient flag at V1 (CLAUDE.md §13 silent-degradation ban). The exhaustive failure-mode set:

    • ErrMissingFrontmatter: source does not begin with ---\n. Empty file lands here too.
    • ErrMalformedYAML: opening fence found but closing fence missing, or YAML parser failed.
    • ErrMissingTrigger (wraps skills.ErrInvalidSkill): frontmatter parsed but trigger: empty after trim. The Phase 37 validator pinned trigger as the planner-visible match cue (brief 04 §4.7 step 4); empty trigger is a hard reject.
    • ErrEmptySteps (wraps skills.ErrInvalidSkill): body parsed but ## Steps absent or had zero list items. Same Phase 37 validator rule.
    • ErrUnknownSection: body contained a ## Heading outside the canonical set, or a duplicate section, or non-list-item prose inside a section. A lenient flag that accepted unknown sections would silently drop content (the planner would see only the canonical fields); fail-closed avoids surprise.
    • ErrMalformedYAML (also covers): YAML keys that fail to decode into the typed frontmatterFields struct.
    • ErrAttachmentOutsideRoot: path-safety rejection (see point 3).
    • ErrInvalidAttachmentRef: duplicate attachment path at Import, OR dangling artifact:// reference at Export.
    • ErrRoundTripDrift: reserved for tests that explicitly assert byte-stable round-trip; the importer does not emit it from production code.
    • ErrImporterClosed: any method called after Close.

    The set is exhaustive — every failure mode has a typed sentinel that callers compare via errors.Is. There is no silent-degradation path; every parse failure surfaces with a wrapped error and a %v context string naming the offending input.

Additionally, Phase 40 ships:

  • N=128 D-025 concurrent-reuse test. One shared *importerImpl instance; per-goroutine distinct in-memory Skills.md payloads (the Name field encodes idx so cross-goroutine bleed surfaces as a name-mismatch); pre-cancelled ctxes on i%5==0 return ctx.Err() without affecting siblings; goroutine baseline restored within 500ms of WaitGroup.Wait. Under -race. The Importer holds no per-call mutable state on itself — closed is an atomic.Bool; the injected ArtifactStore is D-025 safe per Phase 17's conformance suite.

  • 5-fixture golden corpus under internal/skills/importer/testdata/golden/: minimal.md (trigger + steps only), full.md (every section + every frontmatter field), preconditions-only.md, failure-modes-only.md, with-attachments.md. Each fixture ships a .want.json mirror that the importer's Skill output must match deep-equal (lifecycle fields excluded; ContentHash is recomputed at Import via skills.CanonicalContentHash). The with-attachments.want.json carries a <REF:attachments/example.txt> placeholder that the test substitutes with the actual ArtifactRef.ID before comparing. Every fixture is asserted byte-stable via bytes.Equal(src, Export(Import(src))).

  • 93.8% statement coverage on internal/skills/importer (target 90%). The uncovered branches are defensive (filepath.Abs error path on the canonical root, EvalSymlinks root-eval error path, the Export method's closed-state branch when ctx.Err() also fires — race-window edge case). Not material to the load-bearing surface.

  • Phase 37 hand-off via Skill.Extra: the raw frontmatter bytes and the source-hash are stashed in Skill.Extra["_importer.frontmatter_raw"] and Skill.Extra["_importer.source_sha256"]. The Phase 37 CanonicalContentHash includes Extra via its key-sorted text rendering, so changes to the raw frontmatter (even when the parsed fields are identical) produce a different ContentHash — exactly the LWW gate the Phase 37 conflict policy needs. The hash exclusion of Origin / OriginRef / Scope (D-046) is preserved — a Skills.md re-imported via a different OriginRef (different pack version) still hashes identically when the content is the same.

The internal/skills/importer/concurrent_test.go ships the N=128 stress; internal/skills/importer/path_safety_test.go ships the 6-entry path-safety rejection table + the symlink-escape test; internal/skills/importer/negative_test.go ships the 10 negative cases; internal/skills/importer/importer_test.go ships the golden corpus assertions; internal/skills/importer/attachments_test.go wires the real inmem.ArtifactStore through the seam and asserts round-trip + duplicate-rejection + URL-passthrough + close-survival. The Phase 40 smoke script (scripts/smoke/phase-40.sh) asserts the test surface passes under -race AND the golden corpus directory is non-empty (the round-trip invariant has nothing to assert against without fixtures).


D-054 — Phase 41 skill generator: skill_propose(persist=true) with conflict-policy precedence (PackImport-protected; Generated→Generated content-hash-gated LWW); audit-mandatory with persist rollback on emit failure; default Scope=project; Promote is a Go-level API not a planner tool

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.7, docs/plans/phase-41-skill-generator.md, internal/skills/generator/generator.go (Register, Propose, Promote, SkillDraft, SkillReceipt, ProposeResult, ErrSkillConflict, ErrSkillConflictSentinel, ToolNameSkillPropose, buildSkillFromDraft), internal/skills/generator/events.go (SkillProposedPayload), internal/skills/generator/audit.go (redactExcerpt, emitProposed, auditExcerptCap), internal/skills/events.go (EventTypeSkillProposed), internal/skills/skills.go (ScopeSession), brief 04 §4.8 + §5 + §6.

Why: Phase 41 closes the predecessor's "draft generator can't save" gap — Harbor's runtime persists generated skills, and every persist emits a mandatory audit event. Four design calls warrant a settled entry.

  1. Conflict policy precedence: PackImport-protected first; Generated→Generated content-hash-gated LWW second; insert otherwise. The policy is the load-bearing rule from RFC §6.7 + brief 04 §4.8 ("refuse to overwrite a Origin=PackImport skill with the same name. For Origin=Generated → Origin=Generated, last-write-wins gated by content_hash change"). Phase 41 centralizes the precedence in generator.Propose:

    • Probe via SkillStore.Get BEFORE the upsert. If the existing row is Origin=PackImport, refuse with *ErrSkillConflict{Reason:"pack_import_protected"} AND emit skill.proposed with Result="rejected". The audit emit on rejection is load-bearing — the rejection IS observable on the audit pipeline (matches RFC §6.7's "audit is mandatory" framing extended to refusals).
    • If the existing row is Origin=Generated AND ContentHash matches the incoming draft's canonical hash, return Result="idempotent". No DB write needed; the audit event still lands so subscribers can correlate the call.
    • Otherwise fall through to SkillStore.Upsert — which is either an insert (no existing row) or a LWW overwrite (existing Generated with different hash). The Phase 37 storage layer's ErrPackOverwriteRefused is still wrapped defensively at the generator boundary in case a probe-then-upsert race lets a fresh pack row slip in between probe and upsert.

    The order of probes is binding: pack-protection wins over hash-idempotency because a same-content-hash drafted skill against an existing pack row should still be refused (the operator's invariant is that pack rows are inviolable to the generator, regardless of whether the generated draft happens to match the pack's content).

  2. Audit-mandatory with persist rollback on emit failure. Every persist=true call emits a skill.proposed event BEFORE returning success. Caller-controlled excerpts (SkillDraft.Title / Trigger) flow through audit.Redactor.Redact BEFORE the payload is built; the bounded post-redactor excerpts land on the typed SkillProposedPayload (SafePayload, so the bus does not re-run them through the redactor). The payload also carries Name, Origin, OriginRef, ContentHash, Scope, Result, Reason, and Promotion — all bounded enumerable strings or hex hashes; no untyped tool arguments in audit payloads (CLAUDE.md §7 rule 7).

    Audit-emit failure aborts the persist. Three branches:

    • Insert / LWW emit failure: the DB row was committed by store.Upsert; on skill.proposed emit failure the generator calls store.Delete(ctx, q, name) to roll back. The caller's subsequent Get returns ErrSkillNotFound. The wrapped error names the audit-emit failure as the cause; if the rollback Delete ALSO fails, the wrapped error names both failures. This is the spec's "audit-emit failure elevated to a first-class concern" requirement.
    • Idempotent emit failure: no DB write happened; the wrapped error simply surfaces the emit failure. The row stays intact (matches existing Generated content).
    • Rejection emit failure: no DB write happened; the wrapped error surfaces the audit-emit failure rather than the *ErrSkillConflict so the audit pipeline's drift is the dominant fault.

    The same fail-loud shape applies to Promote: per-target audit emit failure rolls back the target's row via store.Delete(ctx, target, name). The strict-fail model — the first failing target aborts the whole call, subsequent targets are NOT attempted — is the simplest semantics matching the storage layer's transactional shape.

  3. Default Scope=project. RFC §6.7's "Generator scope default — Settled" decision + brief 04 Q-4 are honored verbatim: when SkillDraft.Scope is empty, Propose stamps ScopeProject before validation. Three rationales:

    • Scope=session (the narrower default) would mean every generated skill stays trapped in the originating session — the predecessor's "draft generator" pattern in even more degraded form. The user-facing promise of skill_propose is "the LLM authored a reusable skill"; the default must be reusable.
    • Scope=tenant (the broader default) overshares — a skill authored by user A's planner should not auto-leak to user B's projects.
    • project is the operator-declared aggregation point at which "this team's planners should see this generated skill" — the right default-with-room-for-explicit-broaden.

    The Promote API explicitly handles broadening (session → project, project → tenant): operators or composition code can elevate a skill's visibility after the fact without touching Propose.

  4. Promote is a Go-level API, not a planner-callable Tool. Cross-session promotion is an operator concern (who decides which sessions get a generated skill is a policy question, not a planner reasoning question). Surfacing skill_promote as a planner tool would expose every running session to the cross-session-write capability — a privilege escalation that would let one user's planner write into another user's session. Phase 41 ships Promote(ctx, store, deps, src, name, []targets, scope) as a Go-level function only; the planner-callable catalog tool is skill_propose alone. Phase 39's Directory subsystem will layer a more ergonomic promotion surface (e.g. an operator-facing endpoint that takes a project ID and fans out to discovered session siblings) on top of Promote's primitive; Phase 41's API is the minimum-viable seam.

    The cross-session no-leak invariant is testable end-to-end: identity A persists Scope=session → identity B sees nothing via skill_search AND direct store.Get. Identity A calls Promote(idA, name, []{idB}, ScopeProject) → identity B sees the skill via both surfaces. The integration test TestIntegration_CrossSessionPromotion_AgainstLocalDB exercises this exactly. CLAUDE.md §6 rule 10: cross-session isolation tests are mandatory.

Additionally, Phase 41 adds ScopeSession to the skills.Scope enumeration (Phase 37 declared only Project | Tenant | Global; the session-scope marker was missing). The validator at skills.Skill.Validate accepts the new value; the localdb driver's existing identity filter already enforces session-only visibility for Scope=session rows (the storage layer's WHERE tenant = ? AND user = ? AND session = ? is unconditional). The Promote API rejects scope=session as contradictory (a promotion target other than the source session at session scope is meaningless).

The internal/skills/generator/concurrent_test.go ships the D-025 N=128 concurrent-reuse stress: per-goroutine identity quadruples + distinct skill names against ONE shared catalog. The identity-bleed detector asserts each receipt's Name + OriginRef reflect the calling goroutine's identity (no cross-goroutine state leaks via shared map or closure capture). Pre-cancelled ctxes on i%5==0 surface either context.Canceled or graceful exit (no cancellation cross-talk); the goroutine baseline is restored within 500ms of WaitGroup join (no leak). The companion TestConcurrent_SameNameResolvesDeterministically proves that 16 concurrent writers proposing the SAME (identity, name) resolve to exactly one persisted state with the remaining 15 reporting idempotent (their hash matches the first writer's). Coverage: 92.2% on internal/skills/generator (target 90%).


D-052 — Phase 39 virtual directory: dual-source pinning (DirectoryConfig.Pinned + Skill.Extra["pinned"]); pinned partition exempted only from MaxEntries cap when fits, never from capability filter; IncludeFields deferred; deterministic Name ASC tie-break

Date: 2026-05-12 Status: Settled Where it lives: RFC §6.7, docs/plans/phase-39-virtual-directory.md, internal/skills/directory.go (Directory, DirectoryConfig, SkillView, SelectionPinnedThenRecent, SelectionPinnedThenTop, NewDirectory, View, partitionByPinning, sortBySelection, filterByCapability, projectToSkillView), internal/skills/capfilter/capfilter.go (BuildSet, Subset, DisallowedNames, Replacement, Scrub — the shared capability-filter / scrub primitives, see the correction note below), brief 04 §3 + §4.5 + §4.6 + §6.

Why: Phase 39 lands the planner-facing virtual-directory snapshot of the SkillStore. Four design calls warrant a settled entry.

  1. Dual-source pinning: DirectoryConfig.Pinned (config-declared name list) PLUS Skill.Extra["pinned"] == true (runtime-stamped boolean). Brief 04 §3 sketches VirtualDir.Pinned []string as a static config field; Phase 39 keeps the config-declared list (operator-authored, survives restart) AND honours a runtime-stamped boolean on the skill itself (Extra["pinned"]) that a future operator tool / Console action will set. The two channels are OR'd at partitionByPinning time — a skill marked pinned by EITHER channel is in the pinned partition. The LocalDB driver's marshalExtra / unmarshalExtra round-trips Extra through JSON unchanged, so no schema change is required at this phase; a future skill_pin planner tool can stamp the boolean without touching the storage shape. The dual source means operators can pin via config (declarative, version-controlled) and the runtime can pin via skill update (dynamic, identity-scoped) without two parallel implementations of the same concept (§13).
  2. Pinned skills are exempted ONLY from the MaxEntries cap, NEVER from the capability filter. Brief 04 §4.5 documents the injection-time concerns (capability filter + redaction + budgeter). The V1 stance: a pinned skill that fails the capability filter under the run's identity is NOT in the View. Reason: if a misconfigured allowed-set could leak a high-capability skill via the pin channel, the pin channel becomes a security bypass. The pin is a prominence signal, not a visibility signal. The pinned partition is filled in declaration order (the config-declared Pinned list first, then Extra["pinned"] skills sorted by the selection rule), then the unpinned remainder is filled until MaxEntries. When count(pinned-after-filter) > MaxEntries, pinned skills truncate to the first MaxEntries (in declaration order, then per-selection sort on the Extra tail) and no unpinned skill appears. This is the load-bearing invariant Property_PinnedAlwaysIncluded_WhenFitsBudget asserts.
  3. IncludeFields is deferred. Brief 04 §3 lists IncludeFields []string on VirtualDir; Phase 39 always emits the four SkillView projection fields (Name, Title, Trigger, TaskType). Rationale: the projection is consumer-side; the cost of carrying the four strings per entry is negligible (≤ 200 rows × four strings); a per-call field knob would introduce a hidden-state branch (some callers see Title, some don't) that breaks the SkillView's wire-stability for downstream consumers. If a future caller surfaces a real need to drop a field (e.g. to keep a Console projection under a render budget), the knob lands then with one test per included combination. The deferred knob matches D-048's stance that operator-facing surfaces narrow at V1 to avoid hidden-state branches.
  4. Deterministic ordering: Name ASC is the tie-break on both selection rules. pinned_then_recent sorts the unpinned remainder by UpdatedAt DESC, Name ASC; pinned_then_top by UseCount DESC, Name ASC. The tie-break is load-bearing because two skills with the same UpdatedAt (or UseCount) would otherwise produce a non-deterministic View across calls, breaking the byte-stability promise downstream Console projections rely on. The pinned partition follows the same per-selection sort on its Extra["pinned"] tail (after the declaration-order config pins). MaxEntries default = 30, range [1, 200] per brief 04 §3 verbatim; pinned by the smoke script so a silent change surfaces here.

Correction (Wave 8 §17.5 checkpoint audit, 2026-05-14). This entry originally claimed Phase 39 "reuses Phase 38's tools.Filter and tools.Redact by direct import — no parallel filter / redactor implementation." That was factually wrong about the import mechanics: internal/skills/tools imports internal/skills, so internal/skills (where directory.go lives) cannot import internal/skills/tools — an import cycle. As shipped, Phase 39 duplicated the subset/scrub logic inline in directory.go ("the two implementations MUST stay in lockstep" comments and all) — the exact CLAUDE.md §13 "two parallel implementations of one feature" anti-pattern. The audit closed it for real (per §17.6 — fix the bug where it lives): the subset gate, disallowed-name computation, replacement selection, and word-boundary scrub were extracted into a new stdlib-only leaf package internal/skills/capfilter. Both internal/skills and internal/skills/tools import capfilter (no cycle — it depends on neither). The capability-filter logic now lives in exactly one place; tools.Filter / tools.Redact keep their skills.Skill-typed signatures and the directory does its own per-Skill field plumbing over the shared primitives. The decision (capability filter is integrity-critical, default-deny, pinned skills not exempt) is unchanged — only the false claim about how the code is shared is corrected.

The directory is the consumer of the catalog primitives the planner already trusts. Identity-mandatory: every View call reads the identity quadruple from ctx (matching internal/skills/tools/'s shape), returns wrapped skills.ErrIdentityRequired on a missing component, AND emits skill.identity_rejected via skills.EmitIdentityRejected so the rejection is observable on the bus, not silent (§13).

The internal/skills/directory_concurrent_test.go ships the D-025 stress: N=128 goroutines invoking View against ONE shared *Directory, per-goroutine identity quadruples, per-goroutine expected pin sets. The shared *Directory is immutable after NewDirectory; per-call state lives in ctx + the CapabilityContext value-type input. Property tests (testing/quick) on three invariants: pinned-always-included when count ≤ MaxEntries, View length ≤ MaxEntries, identity scoping (a skill scoped to identity A is NEVER in the View of identity B).


D-055 — Phase 46 trajectory summariser: Summariser interface + CompressionRunner live in internal/planner/; TrajectorySummary alias on Phase 43's Summary; chars/4 estimator mirrors LLM-edge surface; compression replaces step history in prompt builds; ReAct is the in-PR consumer satisfying the §13 primitive-with-consumer rule

Date: 2026-05-13 Status: Settled Where it lives: RFC §6.2, brief 02 §4, docs/plans/phase-46-trajectory-summariser.md, internal/planner/compression.go (Summariser, TrajectorySummary, TokenEstimator, DefaultTokenEstimator, CompressionRunner, NewCompressionRunner, WithTokenEstimator, MaybeCompress, ErrNilTrajectory, ErrEmptySummary), internal/planner/events.go (EventTypeTrajectoryCompressed, EventTypeTrajectoryCompressionFailed, TrajectoryCompressedPayload, TrajectoryCompressionFailedPayload), internal/planner/planner.go (Budget.TokenBudget), internal/planner/react/prompt.go (defaultBuilder.Build summary-replaces-step-history swap), internal/planner/react/compression_integration_test.go (the §13 consumer gate).

Why: Phase 46 closes the runtime-side trajectory summariser primitive. Six design calls warrant a settled entry.

  1. Summariser interface + CompressionRunner live in internal/planner/ (NOT in internal/planner/trajectory/). The master plan's Subsystem column for Phase 46 reads planner. The Summariser's signature requires planner.RunContext; the trajectory subpackage CANNOT import the planner package without an import cycle (Phase 43's D-049 settled that the planner package imports trajectory via aliases, not the reverse). The compression primitive sits alongside Decision, Planner, and RunContext in the planner package — the same level the rest of Phase 42's load-bearing types sit at. The TrajectorySummary type is a type TrajectorySummary = trajectory.Summary alias declared at the planner-package level so callers outside the trajectory subpackage use the RFC's canonical name without ambiguity. Underlying struct stays in internal/planner/trajectory/trajectory.go (Phase 43's D-049 location); the JSON tag ("summary") is unchanged so wire compatibility is preserved across the pause-record contract (Phase 51's future consumer).

  2. defaultBuilder.Build swaps the per-step assistant/user pair loop for the summary block when rc.Trajectory.Summary != nil. Phase 45 shipped the builder reading Summary ADDITIVELY (the summary appeared as an extra block alongside step history). Phase 46 departs from that shape: when Summary is non-nil, the per-step loop is SKIPPED entirely; the summary IS the trajectory representation. Brief 02 §4 explicitly says: "The compressed digest replaces the raw step history in subsequent prompt builds." Rendering both would double-count tokens and defeat the compression. The Phase 45 additive shape was a forward-compatibility seam against Phase 46 (the master plan called Phase 46 "compression / summariser" and reserved the field); Phase 46 closes the seam by tightening the rendering rule. The existing Phase 45 TestDefaultBuilder_RendersSummary test passes (it doesn't pass Summary AND non-empty Steps simultaneously); the new TestDefaultBuilder_WithSummary_SkipsStepHistory test pins the Phase 46 contract. Background-task outcomes (the D-032 push-wake seam) still surface as a trailing user turn regardless of compaction — they are the LATEST signal the planner has and must reach it on the next step. A future phase MAY route background outcomes through the summariser; Phase 46 keeps them as a separate trailing turn.

  3. DefaultTokenEstimator uses chars/4 over Trajectory.Serialize bytes — mirroring internal/llm/tokens.go::chars4Estimator. §13 bans two parallel implementations of the same conceptual feature; the LLM-edge estimator is the canonical chars/4 surface, and the trajectory-compression estimator deliberately mirrors its len/4 + 1 per-fragment formula. The trajectory is treated as one fragment by the runner since Serialize produces the planner-facing JSON projection. The chars/4 algorithm under-counts multimodal content compared to the LLM-edge estimator (which adds 256 tokens per non-text part); trajectories don't typically carry multimodal parts directly in LLMContext — heavy content is upstream of the trajectory per the D-026 safety pass — so the simpler walker is sufficient at Phase 46. A future estimator that structurally walks the trajectory (re-using the LLM-edge tokeniser to count multimodal parts at 256 tokens each) is a Phase 47+ refinement; the TokenEstimator functional-option seam is the unwind point. Estimator errors propagate verbatim through MaybeCompress — a Phase 43 ErrUnserializable from Serialize is the typical failure mode and is surfaced loudly with the trajectory.compression_failed emit carrying ErrorCode="estimator_error".

  4. Fail-loudly contract at the summariser boundary (§13). A non-nil error from Summariser.Summarise propagates verbatim through CompressionRunner.MaybeCompress; the runner does NOT fall through to "skip compression and use raw history" — silent degradation is the bug §13 explicitly bans. A (nil, nil) return from the summariser is also a contract violation (the implementation MUST return a non-nil summary on success OR a non-nil error); the runner surfaces this as ErrEmptySummary so the bug is loud, not silent. Both failure paths emit trajectory.compression_failed BEFORE returning, classified by an error-code bucket (summariser_error / empty_summary / estimator_error). The success path emits trajectory.compressed. Together the two emits make compression observable in both directions — companion to Phase 44's planner.repair_exhausted and Phase 45's planner.max_steps_exceeded. Identity is mandatory at the runner boundary (§6 rule 9 + D-001): a partial quadruple returns wrapped llm.ErrIdentityMissing — the same sentinel the rest of the runtime uses.

  5. Idempotency short-circuit when tr.Summary != nil. A second MaybeCompress call against an already-compressed trajectory returns nil without invoking the summariser. The engine that owns the cadence policy (Phase 47+ planner-runtime stitch) is the layer responsible for clearing tr.Summary when re-compaction is needed. Phase 46 ships the V1 idempotency contract; cadence + re-compaction triggers land at the engine wire-up phase. Concretely: the unit test TestMaybeCompress_AlreadyCompressed_Idempotent pins the current behaviour; a future engine that decides "after the trajectory grows 2× past the last compression, re-summarise" will clear the field via tr.Summary = nil before re-calling MaybeCompress. This keeps the runner stateless across calls (per-call inspection of Trajectory.Summary is enough) while leaving the cadence seam open for the engine to fill.

  6. ReAct is the in-PR consumer that satisfies CLAUDE.md §13's primitive-with-consumer rule. A primitive that lands without a concrete that exercises it bit-rots. Phase 46's primitive is the Summariser interface + CompressionRunner; the in-PR consumer is the Phase 45 ReAct planner via the prompt.go::defaultBuilder swap (call sites: internal/planner/react/prompt.go → reads rc.Trajectory.Summary; internal/planner/react/compression_integration_test.go → drives the end-to-end test). The integration test wires real events.EventBus + real CompressionRunner + real ReActPlanner: an over-budget trajectory triggers compression, the planner's next prompt is built from the summary only (zero raw-step assistant turns; the LLM is called exactly once). The failure-mode scenario (errSummariser) surfaces trajectory.compression_failed on the real bus with the run's identity. Without this consumer the primitive would have no test-time witness that the prompt builder actually reads Trajectory.Summary correctly; the integration test IS the gate.

The internal/planner/compression_concurrent_test.go ships the D-025 N=128 concurrent-reuse stress: shared *CompressionRunner, per-goroutine identity quadruples + per-goroutine trajectories, the countingSummariser stamps the goroutine's RunID into the summary's Note field for context-bleed detection (no other goroutine's RunID surfaces). Pre-cancelled ctxes on i%5==0 verify cancellation honoring without cross-talk; baseline runtime.NumGoroutine restored within 500ms of WaitGroup join. The supplementary TestCompressionRunner_SharedAcrossGoroutines_NoRaceOnEstimator exercises the idempotent read path under -race (single shared trajectory with pre-stamped summary; 64 goroutines short-circuit cleanly). The TestCompressionRunner_EmitClosure_ConcurrentSafe asserts the emit closure receives every event without drops when N=64 goroutines emit through a shared runner. Coverage targets met: internal/planner ≥ 80% (Phase 46 incremental); internal/planner/react ≥ 85% (Phase 45 surface preserved; the Phase 46 prompt-builder swap is covered by TestDefaultBuilder_WithSummary_SkipsStepHistory + TestDefaultBuilder_NoSummary_RendersStepHistory regression guard).


D-057 — Phase 48 deterministic planner: DecisionTreeStep abstraction over typed Decision returns; WakePoll non-blocking WatchGroup semantics; iface-validation lens proving Planner swappability; §13 primitive-with-consumer closed by in-scenario SpawnTask + AwaitTask emission

Date: 2026-05-13 Status: Settled Where it lives: RFC §6.2 + RFC §11 Q-6, docs/plans/phase-48-deterministic-planner.md, internal/planner/deterministic/deterministic.go (DeterministicPlanner, Option, NewDeterministicPlanner, Next, WakeMode, WithSteps, WithRegistry, WithName), internal/planner/deterministic/steps.go (DecisionTreeStep, CallToolStep, FinishStep, PauseStep, SpawnAndAwaitStep, WatchGroupStep), internal/planner/errors.go (ErrIdentityRequired, ErrInvalidConfig, ErrDeterministicStep), brief 02 §1 + §2 + §5 + §7, brief 05 §1.

Why: Phase 48 lands Harbor's second concrete Planner implementation. Four design calls warrant a settled entry.

  1. DecisionTreeStep interface — typed step abstraction over a sealed-sum Decision return; no magic-string opcodes. Each step exposes Decide(ctx, rc) (planner.Decision, bool, error). The boolean reports whether the step claimed the call: true → walker returns the decision verbatim; false → walker advances; non-nil error → walker propagates wrapped planner.ErrDeterministicStep (fail-loudly per §13 — no silent skip on error). The predecessor's "magic strings as next_node" anti-pattern (brief 02 §2) is rejected: every step returns one of the six sealed-sum Decision shapes directly. The interface is exported so operators can implement custom steps; five in-package types ship (CallToolStep, FinishStep, PauseStep, SpawnAndAwaitStep, WatchGroupStep). A tree that exhausts every step without a claim returns Finish{NoPath, Metadata["deterministic"]="no_step_matched"} — fail-loudly per §13 (a silently looping planner is the worst kind of misconfiguration shape).
  2. WakePoll semantics — non-blocking receive on tasks.WatchGroup from the planner side; emit AwaitTask on not-ready. The deterministic planner declares planner.WakeAware returning planner.WakePoll. The on-disk realisation lives in SpawnAndAwaitStep and WatchGroupStep: each Decide call performs a select { case completion := <-ch: ...; default: AwaitTask } against the channel returned by tasks.TaskRegistry.WatchGroup. When the channel hasn't fired, the planner emits AwaitTask{TaskID: <owner>} and the runtime sleeps the step until the next deterministic boundary; when it has fired, the operator-supplied OnResolved([]MemberOutcome) callback returns the next decision. No LLM, no eager wake — a clean deterministic shape that proves the TaskRegistry's WatchGroup surface is mode-neutral (D-032). The registry has no knowledge that a poller is reading its channel non-blockingly; no WakeMode field on registry types, no Supports* capability protocol.
  3. The deterministic planner is the iface-validation lens — proves CLAUDE.md §1 property 3 ("the Planner is swappable") on disk. RFC §11 Q-6 settled the second V1 planner concrete as deterministic precisely because it exercises a non-LLM Decision shape end-to-end: same Runtime, same Planner interface, same RunContext view, same Decision sum — but no LLM, no prompt builder, no retry / downgrade / corrections / safety / governance composition. If the interface were structurally biased toward an LLM-driven concrete, the deterministic planner would surface the bias loudly at construction time. It does not. Phase 48 is the on-disk proof, not a doc claim.
  4. §13 primitive-with-consumer policy — SpawnTask + AwaitTask emission closes the policy for the deterministic-planner side of the wave. Phase 42 shipped the Decision sum's SpawnTask and AwaitTask shapes; Phase 20/21 shipped the TaskRegistry + TaskGroup + WatchGroup mechanism. The Phase 48 SpawnAndAwaitStep's scenario test (spawn_await_scenario_test.go) wires a real tasks.TaskRegistry (in-process driver) + real events.EventBus (inmem driver) and asserts the planner emits SpawnTaskAwaitTaskCallToolFinish across four Next calls, with the registry's task lifecycle driven through SpawnSealGroupMarkRunningMarkComplete between calls. The §13 rule (added to CLAUDE.md via PR #67) is binary: a primitive lands with its first consumer in the same wave. Phase 48 supplies the deterministic-planner side; future planner upgrades (or Phase 47's ReAct emission upgrade) close the ReAct side. Phase 49's conformance pack uses Phase 48's concrete as the second leg of cross-planner round-trip scenarios — the deterministic planner exercises each of CallTool, SpawnTask, AwaitTask, Finish in its scenario test so Phase 49 has the cross-planner coverage.

Identity is mandatory (§6 rule 9 + D-001). The deterministic planner returns wrapped planner.ErrIdentityRequired on a partial quadruple at Next entry — defensive in depth alongside the runtime engine's identity propagation. Fail-loud construction (§13). NewDeterministicPlanner returns wrapped planner.ErrInvalidConfig when the configured step set is empty, when any configured step is nil, or when a group-aware step is configured without WithRegistry. Configuration errors NEVER surface at Next time; the constructor is the boundary.

Concurrent reuse pinned (D-025). DeterministicPlanner is a reusable artifact: the receiver is read-only after construction; per-run state lives on the stack and in RunContext. SpawnAndAwaitStep holds an internal sync.Map keyed by (SessionID, StepID) so per-run spawn-tracking state is safe across N concurrent runs against the shared planner. internal/planner/deterministic/d025_test.go pins N=128 concurrent Next invocations against one shared instance under -race — asserts no races, no identity bleed (each call's Finish.Metadata["run_id"] matches the goroutine's RunID), no cancellation cross-talk (pre-cancelled ctx on i%5==0 returns ctx.Err() without affecting siblings), no goroutine leak (baseline runtime.NumGoroutine restored within 500ms of WaitGroup join). Coverage: 90.1% on internal/planner/deterministic (target 85%).


D-056 — Phase 47 parallel executor + ReAct CallParallel / SpawnTask / AwaitTask emission: three reserved tool names as V1 emission surface; reduceToSingleAction deletion timing; AbsoluteMaxParallel = 50; JoinSpec enum semantics (JoinAll / JoinFirstSuccess / JoinN); atomic-setup vs in-flight failure handling; §13 primitive-with-consumer compliance

Date: 2026-05-13 Status: Settled Where it lives: RFC §6.2, docs/plans/phase-47-parallel-emission.md, internal/runtime/parallel/parallel.go (Executor, Resolver, Result, New, Execute, normaliseJoin, validateJoin, dispatchAll, dispatchFirstSuccess, dispatchN, invokeBranch), internal/planner/decision.go (JoinN, JoinSpec.N), internal/planner/errors.go (AbsoluteMaxParallel, ErrParallelCapExceeded, ErrParallelInvalidJoin, ErrParallelBranchInvalidArgs, ErrParallelPauseUnsupported), internal/planner/react/react.go (SpawnTaskToolName, AwaitTaskToolName, translateSpawnCall, translateAwaitCall, mapDecision, DefaultSystemPrompt), test/integration/phase47_spawn_await_test.go, scripts/smoke/phase-47.sh.

Why: Phase 47 closes three primitive-with-consumer gaps in one PR per CLAUDE.md §13's "shipping a primitive without its first consumer in the same wave" forbidden practice. Six design calls warrant a settled entry.

  1. Three reserved tool names as the V1 emission surface — _finish (Phase 45 / D-051), _spawn_task and _await_task (Phase 47 / D-056). The reserved-name convention follows D-051: prompt-time strings translated by mapDecision to typed Decisions BEFORE return; the Decision sum stays sealed (no "magic string as next_node" anti-pattern). Two design rationales:

    • Why a reserved-tool convention vs. a top-level JSON envelope {"decision":"spawn","args":{...}}. The reserved-tool shape lets the LLM stay in one prompt-schema mode — it ALWAYS emits a {"tool":..., "args":..., "reasoning":...} shape, never switching between "tool envelope" and "decision envelope" mid-conversation. Single-mode prompts compress better in the LLM's representation (fewer competing patterns to navigate) and reduce repair-loop pressure: the parser already handles tool envelopes; spawn/await go through the same path. The downside (the LLM can in principle emit a _spawn_task shape that looks identical to a real tool with that name) is mitigated by the leading-underscore convention — future runtime catalog registration MAY reject underscore-prefixed tool names; today the dispatcher would reject any _-prefixed tool that wasn't intercepted by mapDecision first.
    • Why fail-loudly on malformed args (vs. silent emit of the literal _spawn_task CallTool the dispatcher would reject anyway). The §13 silent-degradation ban means errors must be explicit. mapDecision returning (Decision, error) surfaces the translation failure at the planner boundary; the runtime sees a clean error rather than a CallTool-shaped pseudo-decision the catalog cannot dispatch.
  2. reduceToSingleAction deletion timing — Phase 47, NOT later. The Phase 45 plan named the deletion timing explicitly ("the reduceToSingleAction method is the unwind point — Phase 47 deletes the override"). The Phase 47 PR honours the hand-off because the §13 "two parallel implementations of the same conceptual feature" rule is active: Phase 44's RepairLoop already produces CallParallel{Join: JoinAll} when multi-action salvage triggers; the Phase 45 collapse override was a V1 stop-gap. Carrying both shapes (the Phase 44 emission + the Phase 45 collapse) past Phase 47 would mean two parallel implementations of "what happens when the LLM emits multiple actions" — Phase 47 picks the deepening (let the executor dispatch) and deletes the override. The smoke script asserts the absence of the symbol via grep-v as the drift gate.

  3. AbsoluteMaxParallel = 50 system cap rationale. RFC §6.2 settled the value at 50. Three rationales:

    • Defence in depth against a runaway emission. A buggy LLM emitting 1000 branches must not consume 1000 goroutines + 1000 tool-dispatch budgets. 50 is comfortably above the "I want to parallelise this small fan-out" use case (3-10 branches typical) while staying below "the LLM ran away."
    • The soft cap is the planner's PlanningHints.MaxParallel. Operators tune the soft cap per session / per tenant; the hard cap is system-wide. Operator-tunable hard cap would re-introduce the "two parallel implementations" smell (a config-driven cap + a code-driven cap); the system cap stays settled.
    • Defence against a malicious / adversarial LLM emission. A jailbreak prompt that coerces the LLM into emitting "1000 branches of delete_everything" gets rejected at the executor boundary; even if every branch's validator was permissive, the cap fires first. The cap is the last line of defence before goroutine + descriptor multiplication.
  4. JoinSpec enum semantics — JoinAll / JoinFirstSuccess / JoinN. Three explicit shapes ship; JoinKeyed remains a documented future surface but is rejected at dispatch with ErrParallelInvalidJoin (the "not implemented at Phase 47" message names the deferral). Per-shape rationales:

    • JoinAll (the default). The most common shape: fan out, collect every observation, surface them all back to the planner for the next step. The Phase 44 repair loop's multi-action salvage uses this as its default join.
    • JoinFirstSuccess. The "race to first success" shape: the planner emits N alternate tool calls and wants whichever responds first (e.g. three different search providers; whichever finishes first). Cancellation: the executor derives a child ctx; on first success, the child ctx cancels; slow branches that honour ctx exit promptly. Failures do NOT cancel until every branch terminates — a slow success can still arrive after a fast failure.
    • JoinN. The "fault-tolerant fan-out" shape: emit N+M branches, wait for N successes, cancel the rest. Setup validates 0 < N ≤ len(Branches). JoinN returns successes in COMPLETION order (each Result still carries its original branch Index for the deterministic merge key downstream — the merge ordering is the branch's input position, NOT completion order).
  5. Atomic-setup vs in-flight failure handling. RFC §6.2's "atomicity contract": atomic setup validation (any branch's invalid args fails the whole call BEFORE execution); in-flight failures land per-branch on Result.Err. Two failure modes, two different shapes:

    • Setup-time failures (atomic): branch count cap exceeded, JoinSpec malformed, descriptor not registered, args validator rejects — ALL surface as the executor's return error. The slice is nil. NO branch executes. This is the load-bearing "atomicity contract" surface.
    • In-flight failures (per-branch): a branch's desc.Invoke returns an error; the executor catches it, populates Result.Err, surfaces the result alongside the successful peers. The call-level error stays nil for JoinAll (mixed-success-and-failure is a normal observation shape); JoinFirstSuccess and JoinN exhaustion return a joined error wrapping every failure when no branch met the threshold. The distinction prevents the planner from seeing a "whole call failed" when in fact one tool returned a soft error the LLM can incorporate into its next step's reasoning.
  6. §13 primitive-with-consumer policy compliance — three primitives, three consumers in one PR. CLAUDE.md §13 forbids shipping a primitive without its first consumer in the same wave. Phase 47 closes three gaps in one wave:

    • Parallel-call executor (the master-plan Phase 47 row's original scope). Consumer: ReAct emits CallParallel (pass-through); Phase 44's repair loop already produces the shape from multi-action salvage. Both ends ship in this PR.
    • SpawnTask Decision shape (shipped Phase 42 without emitter). Consumer: ReAct's _spawn_task reserved tool translation + the integration test's spawn → group → wake → re-entry round-trip.
    • AwaitTask Decision shape (shipped Phase 42 without emitter). Consumer: ReAct's _await_task reserved tool translation.

    The §13 rule explicitly names SpawnTask and AwaitTask as the pair that MUST land together: "a planner that can spawn a background task but cannot join it produces orphan work the runtime cannot recover." Phase 47's PR bundles them per the binding rule. The unified pause/resume primitive (Phase 50) is the next §13 application of the same rule — it will land with a RequestPause-emitting consumer in the same wave.

The internal/runtime/parallel/concurrent_test.go ships the D-025 N=128 reuse stress: one shared *parallel.Executor, per-goroutine identity quadruples (no bleed), pre-cancelled ctxes on i%17==0 (no cross-talk), goroutine baseline restored within 2s of WaitGroup join (no leak). The Phase 45 internal/planner/react/d025_test.go test already covers the upgraded ReAct emission paths transitively (any Next call exercises mapDecision's new cases). Coverage: internal/runtime/parallel ≥ 85% (master-plan target).


D-058 — Phase 49 planner conformance pack: shared scenario suite both Wave 8 concretes pass; capability-gated subtests; wake-mode round-trip wired against real tasks.TaskRegistry + real events.EventBus (D-032 binding); Wave 8 wave-end E2E bundled in same PR per §17.5

Date: 2026-05-13 Status: Settled Where it lives: RFC §6.2, docs/plans/phase-49-conformance-pack.md, internal/planner/conformance/conformance.go (Phase 42 skeleton scenarios + Phase 49 scenario bodies: Capability flags + ScenarioName constants + Harness extensions: ScenarioFactory, Capabilities, TaskRegistryFactory, PrebuiltPlannerFactory; WakeRoundTripDeps + DefaultTaskRegistryFactory + DefaultRunContext + DefaultReactContentMap + SecondStepContent + scenarioContentTrim; per-scenario implementations runTopPromptsScenario, runMalformedLLMScenario, runParallelAtomicityScenario, runWakeRoundTripScenario (with push/poll dispatch), runBudgetAwareScenario, runPauseBoundsScenario, runSteeringDrainScenario, runConcurrentReuseScenario), internal/planner/react/conformance_test.go + internal/planner/react/conformance_helpers_test.go (ReAct's full-suite invocation; scripted multi-response LLM for the push wake-round-trip), internal/planner/deterministic/conformance_test.go (Deterministic's full-suite invocation; parallelEmitStep + SpawnAndAwaitStep-based PrebuiltPlannerFactory), test/integration/wave8_test.go (Wave 8 wave-end E2E — three focused tests covering the push round-trip across the assembled surface, the missing-identity fail-closed scenario, and the N=10 concurrency stress).

Why: Phase 49 closes the planner-track wave by filling the Phase 42 conformance harness skeleton AND landing the Wave 8 wave-end E2E in one PR. Three design calls warrant a settled entry.

  1. Capability-gated scenarios: Capability flags + Harness.Capabilities bitmask let one Run entrypoint drive both LLM-driven and non-LLM concretes without dual-suite drift. A non-LLM concrete (Deterministic) calling the LLM-only scenario (MalformedLLM_Salvage) would either (a) skip silently if we picked permissive defaults — §13's silent-degradation ban catches this — or (b) fail with a nil LLM client shape that's a configuration bug, not a planner bug. The capability flags (CapabilityLLMDriven, CapabilityCanPause, CapabilityWakeRoundTrip, CapabilityHonoursCancelControl) gate scenarios at the entry point; a scenario whose required capability is absent calls t.Skip WITH A REASON — never silently. Phase 49 ships two pre-built capability sets (CapabilitySetReAct, CapabilitySetDeterministic); future concretes (Plan-Execute, Workflow, Graph, Supervisor) pick the capability set that matches their shape, and the conformance pack scales without modification.

  2. The WakeMode_RoundTrip scenario is the LOAD-BEARING D-032 binding — real TaskRegistry + real EventBus, no mocks at the seam. RFC §6.2 + master plan Phase 49 detail block ship the wake-mode round-trip as the unmissable scenario: "Failure to wire tasks.WatchGroup is the test's failure mode, not silent deadlock." Phase 49 wires the scenario against the production inprocess tasks.TaskRegistry driver, the production inmem events.EventBus driver, and the production inmem state.StateStore. Mocks at this seam would defeat the test's purpose (a mock that always delivers GroupCompletion instantly would mask a real wiring bug that delays the delivery; a mock that always blocks would mask a planner that fails to honour the non-blocking receive contract). The harness's TaskRegistryFactory field exposes the production-driver factory (DefaultTaskRegistryFactory); the harness's WakeMode field dispatches the round-trip to push or poll. For push (ReAct): the scenario simulates the runtime engine's role at Phase 60+, spawning the real task and surfacing the MemberOutcome through RunContext.Trajectory.Background. For poll (Deterministic): the scenario uses PrebuiltPlannerFactory to construct the planner WITH the registry bound (via deterministic.WithRegistry), then calls Next repeatedly — observing the non-blocking receive pattern: SpawnTask → AwaitTask (group open) → resolved Decision (group complete). The §17.4 "no time.Sleep for synchronisation" rule holds: bounded eventually-style waits with 2s deadlines and runtime.Gosched yields between retries.

  3. Wave 8 wave-end E2E bundled in same PR per §17.5 — three focused tests covering happy path, failure mode, and concurrency stress. §17.5 makes the wave-end checkpoint audit + wave-end E2E binding at every wave boundary. The wave-end E2E exercises the same primitives the conformance pack tests, but ACROSS the full assembled surface: Skills (localdb) + Planner (ReAct) + Tools (in-process) + Tasks (inprocess) + Memory (inmem) + LLM (mock) + Events (inmem) + State (inmem). Three tests cover §17.3's mandatory dimensions:

    • TestE2E_Wave8_ReactSpawnWakeRoundTrip_AssembledSurface: real ReAct planner emits _spawn_task against scripted mock LLM → real registry spawns and resolves → planner re-enters with MemberOutcome surfaced → emits Finish. Memory captures the turn. Skill store presence on the surface is asserted via an Upsert + Get round-trip.
    • TestE2E_Wave8_MissingIdentity_FailsClosed: ReAct's identity-mandatory pre-check rejects a Next call with no identity in the RunContext quadruple, returning wrapped llm.ErrIdentityMissing BEFORE the LLM completion fires. Memory + skill stores also reject missing identity — same fail-loudly contract. The scenario is the §17.3 #3 "at least one failure mode" requirement.
    • TestE2E_Wave8_Concurrency_NoCrossTalk: N=10 concurrent ReAct runs against ONE shared planner + ONE shared catalog + ONE shared registry + ONE shared memory store. Every 3rd goroutine derives a pre-cancelled ctx (cancellation cross-talk gate); the race detector is the gate for data races. Baseline goroutine count restored on teardown (within a +16 tolerance for driver-retained background workers).

    The §17.6 fix-in-same-PR rule was honoured but no cross-phase bug surfaced — the per-package tests Phase 42-48 land had already covered the seam interactions Phase 49's E2E exercises. The conformance pack DID surface a usability gap (the Phase 42 Harness shape needed extension to drive scenario-specific planner configurations); the extension is additive (existing fields preserved verbatim) so no per-concrete test regresses.

Identity is mandatory across every scenario (§6 rule 9 + D-001). The conformance pack's DefaultRunContext factory stamps a populated quadruple; per-concrete tests build their own factories on the same shape. §13 import-graph contract preserved: the conformance package imports internal/audit, internal/config, internal/events, internal/state, internal/tasks, internal/identity, internal/planner — NONE of which are internal/runtime/.... The Phase 42 importgraph_test.go walks the planner subtree and gates the contract; Phase 49 adds no internal/runtime/... imports.

Concurrent reuse pinned (D-025). The ConcurrentReuse_D025 scenario in the pack runs N=64 parallel Next calls against ONE shared planner from the harness factory; the race detector is the gate; per-goroutine RunID round-trip checks for context bleed. The Wave 8 E2E's concurrency stress (N=10) provides the cross-package complement. Coverage: internal/planner/conformance ≥ 80% (Phase 49 target). The pack's coverage is asserted by Run-against-both-concretes: each concrete's test exercises every non-skipped scenario, and the skip paths are exercised by the capability-gating fallthrough.


D-059 — Agent identity model: agent_id is a runtime registration identity, NOT an isolation principal; the isolation tuple stays (tenant, user, session, run); agents carry a three-ID model (agent_id / incarnation / version_hash)

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.16 (Agent Registry), CLAUDE.md / AGENTS.md §6 (clarifying note), docs/plans/phase-53a-agent-registry.md, docs/glossary.md (agent_id, incarnation, version_hash, registration identity).

Why: During Console information-architecture planning the question "is agent_id a fourth element of the identity tuple?" surfaced repeatedly and threatened to leak implicit assumptions into every identity-touching phase. It is settled here so it does not get re-litigated.

  1. agent_id is a registration identity, not an isolation principal. Harbor's isolation boundary is and stays the tuple (tenant, user, session, run) (RFC §4, §6 rules, D-001). An agent is a runtime entity — it has a planner, tool bindings, memory bindings, policies, health — but it runs within (tenant, user, session); it does not widen the isolation boundary. Memory drivers, state drivers, event subscribers continue to scope by the tuple, never by agent_id. This dissolves the recurring confusion: there are two orthogonal concepts — "agent as a registered, runtime-tracked entity with a stable ID and lifecycle" (this decision) and "agent as an isolation boundary" (explicitly rejected for V1).
  2. Agents carry a three-ID model. agent_id — stable, "which logical agent," minted once at first registration, persisted, rehydrated on restart. incarnation — ephemeral, "which boot of it," bumps on every process start. version_hash — content-derived, "which configuration," a deterministic hash over (prompt set, tool set + schemas, planner config, model policy); bumps only when configuration content changes. The three answer different questions and must not be collapsed: a restart with no config change yields the same agent_id + same version_hash + a new incarnation; a restart after a prompt edit bumps both incarnation and version_hash.
  3. version_hash is load-bearing for the post-V1 Evaluations / version-control program (D-064). If every agent carries a config hash from V1, Evaluations can later attribute success-rate changes to a specific configuration version with zero retrofit. It is cheap to compute at registration and is the free precursor to prompt/tool evolution work.
  4. Consumers. Phase 30 (tool-side OAuth) keys agent-bound tokens by the registration agent_id — never by an isolation-tuple element. The Console Agents page (RFC §7, D-062) renders the three-ID model. See [[D-060]] for the subsystem that owns minting and persistence, and [[D-061]] for the Console-DB boundary.

D-060 — Agent Registry is an in-process, per-runtime-instance, StateStore-backed subsystem; it covers both creation cases (locally-hosted + connect-to-remote); restart rehydrates (restart ≠ recreate)

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.16, CLAUDE.md / AGENTS.md §3 (internal/runtime/registry/ layout entry), docs/plans/phase-53a-agent-registry.md, master plan phase 53a.

Why: "How can the registry mint an agent_id if Harbor can be used by anybody?" — Harbor is a Go-native SDK + single static binary; there is no central Harbor service, and there must not be one. The minting and ownership model is settled here.

  1. The registry is not a central authority — it is an in-process subsystem inside each harbor instance. Every harbor process (or every embedding of the library) maintains its own registry.AgentRegistry, persisted via that instance's configured StateStore driver (in-mem / SQLite / Postgres, §4.4 seam). This is the same shape Harbor already uses for (tenant, user, session): Harbor never mints identity globally — it receives identity from the operator's auth boundary and scopes state locally. agent_id never needs to be globally unique; it only needs to be unique within the runtime instance that issued it, which is collision-free by construction (ULID/UUID).
  2. Two creation cases, both landing in the registry. Locally-hosted agent — the runtime instance is running the agent; it mints a local agent_id. Connect-to-remote agent — the agent runs in someone else's Harbor (or is any A2A-speaking peer); the local runtime assigns a handle (agent_id local to this instance), and the canonical identity of the remote agent is its A2A AgentCard, owned by the remote operator. See [[D-061]] — neither case puts the agent list in a Console DB.
  3. restart rehydrates; restart ≠ recreate. With a durable StateStore driver, a process restart rehydrates the registry and the agent comes back with the same agent_id (a stable fleet view depends on this). The in-mem driver loses the registry on restart and is documented as dev-only — the "id changes on restart" behaviour is a dev-mode artifact, not the intended fleet posture. Teardown-and-recreate is distinct from restart: recreate genuinely mints a fresh agent_id because it is a new logical entity; restart keeps the StateStore record.
  4. The registry emits agent.* events (agent.registered, agent.restarted, agent.health, agent.drained, agent.deregistered) so the Console renders runtime state. See [[D-059]] for the three-ID model the registry owns and [[D-066]] for the fleet-control privilege tier.

D-061 — A Console DB holds Console-local state only; it is never a shadow source of truth for runtime entities

Date: 2026-05-14 Status: Settled Where it lives: RFC §7, CLAUDE.md / AGENTS.md §13 (forbidden practice), docs/glossary.md (Console DB).

Why: The instinct to track "which agents exist" in a Console-side database is exactly how the predecessor's Console drifted into re-implementing runtime APIs. The boundary is settled before any Console phase is authored.

  1. If a Console DB exists, it holds Console-local state only — saved views, dashboard layouts, per-operator preferences, annotations. It must never be the source of truth for runtime entities (agents, sessions, tasks, tools, events, artifacts). Those live in the Runtime and reach the Console exclusively through the Protocol's canonical events / state snapshots / control commands.
  2. Rationale: a Console DB as a shadow source of truth breaks the "Console is a Protocol client" rule (RFC §5, §7, CLAUDE.md §4.5). If the Console DB owned the agent list, a third-party Console would have a different agent list, and the Agents page would be a standalone app rather than a runtime lens. The Agent Registry ([[D-060]]) is the runtime-side owner; the Console renders it.
  3. A runtime-side control-plane client allowlist is the legitimate inverse and is a separate concern from a Console DB — see [[D-066]].

D-062 — Harbor Console is a 14-page observability + control plane organized as runtime lenses; Live Runtime ≠ Sessions; Agents ≠ chatbots; no Console page phase ships without its feeding Protocol-surface phase

Date: 2026-05-14 Status: Settled Where it lives: RFC §7 (expanded), docs/research/11-console-feature-surface.md, docs/rfc/assets/console-agents-page.png, master plan README.md (Console-wave re-decomposition note), CLAUDE.md / AGENTS.md §13 (forbidden practice — Console page without Protocol surface).

Why: The Console is not "the Playground plus widgets" — it is a full control/observability plane. Its information architecture is settled so the (currently under-scoped) phases 72–75 can be re-decomposed against a fixed target.

  1. Fourteen pages, five clusters, all runtime lenses. Runtime (Overview, Live Runtime); Execution (Sessions, Tasks, Agents, Tools, Events, Background Jobs); Resources (Flows, Memory, MCP Connections, Artifacts); Evaluation (Evaluations); Settings. Every page is a projection over state snapshots + realtime events + control commands — never a standalone app feature. The canonical Agents-page mockup is docs/rfc/assets/console-agents-page.png.
  2. Live Runtime ≠ Sessions. Live Runtime is the present-tense interactive workbench (initiate / observe / steer / debug a live execution — the spiritual replacement of the predecessor's Playground, with the chat as one panel among many). Sessions are the past-and-active durable execution records (replay / continue / clone / convert-to-eval). Conflating them produces two half-built versions of the same surface.
  3. Agents ≠ chatbots. Agents are runtime execution entities with planners, tool bindings, memory bindings, policies, task ownership, event streams, and operational health — not personas. The Agents page is fleet management, not an assistant gallery; it is a lens over the Agent Registry ([[D-060]]).
  4. Structuring rule: no Console page phase ships without its feeding Protocol-surface phase landing first or in the same wave. This is the §13 "no primitive without its consumer" rule read backwards — it keeps the Console honest as a Protocol client instead of letting it grow private hooks. The notification.* topic (Overview intervention queue) and search.* Protocol methods (global ⌘K) land as named acceptance criteria of their consuming page phases, not as free-floating primitives.
  5. MCP Apps DisplayMode (inline / fullscreen / pip) is a Protocol-level concern — the MCP app declares its preferred mode, the runtime forwards it, the Console honours it. DisplayMode lives in internal/protocol/types/, not in Console-only state.

D-063 — The Console Flows page is a view over engine graphs scoped to graph-family planners; V1 = read / run / inspect-history; authoring / versioning / import-export is post-V1

Date: 2026-05-14 Status: Settled Where it lives: RFC §7, docs/glossary.md (Flows), master plan README.md (Console-wave note).

Why: "Flows" risked being scoped as a new runtime subsystem when it is really a projection. Settled to bound it.

  1. Flows are engine graphs, not a new subsystem. A "Flow" in the Console is the graph structure that a graph-family planner (Graph / Workflow / Deterministic — RFC §6.2, §12) runs on. It is a view over internal/runtime/engine/ node graphs, filtered to agents whose planner is graph-shaped.
  2. V1 Flows page = read / run / inspect-run-history — a pure lens, needing only a Protocol method that exposes the engine graph structure + run history.
  3. Authoring / versioning / import-export is post-V1 — that is the part that may need a real subsystem, and it is deliberately deferred. This splits Flows along the same present-vs-authoring line as [[D-062]]'s Live Runtime ≠ Sessions.

D-064 — Evaluations is a post-V1 subsystem built as a §4.4 extensibility seam; it depends on fully-replayable sessions, which makes the durable event log (Phase 57) a hard dependency; a premium/hosted variant must be a driver, not a fork

Date: 2026-05-14 Status: Settled Where it lives: RFC §12 (post-V1 / future work), master plan README.md (phase 57 detail block, V1 cut line), docs/glossary.md (Evaluations).

Why: The Console IA lists an Evaluations page; that page implies a substantial runtime subsystem. Its scope and dependencies are settled so V1 does not foreclose it.

  1. Evaluations is a subsystem, not a page. "Eval suites, golden sessions, replay-based evaluation, regression diffs, baseline promotion" is a runtime program — an eval runner, eval storage, replay machinery — with the Console page as its thin front-end. It is explicitly post-V1.
  2. It is the foundation for post-V1 agent version-control — success-rate-over-version_hash ([[D-059]]), prompt evolution, tool evolution.
  3. It depends on fully-replayable sessions, so Phase 57 (durable event log) is a hard dependency. "Create eval from session" / "mark as test case" only work if a session's event log is durable and gap-free. Lossy V1 sessions (ring-buffer-only) would foreclose Evaluations entirely — you cannot retrofit completeness into already-shipped sessions. Phase 57's durability guarantees are therefore binding, not optional.
  4. Built as a §4.4 seam from day one — interface + drivers — so a premium / hosted / enterprise variant is a driver, not a fork of the runtime. This keeps a future monetization path open without polluting the V1 open-source surface.

D-065 — The session priority dimension is dropped from V1

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.9 (sessions), docs/research/11-console-feature-surface.md.

Why: The operator Console mockup showed a "Priority: Normal / High / Low" field on the session detail panel. Brief 11 recommends dropping it from V1 unless real load patterns demand it.

  1. Dropped from V1. No session-level priority field; no router or task-registry plumbing for it. Task-level prioritization via the PRIORITIZE steering control (Phase 52/53) already exists and covers the concrete operator need.
  2. Revisit only on evidence. If post-V1 load patterns show a genuine need for session-level priority, it is a scoped phase touching sessions (RFC §6.9) and routers (Phase 14) — not a V1 retrofit.

D-066 — Fleet control is a distinct, more-elevated privilege tier than fleet observation; a runtime-side control-plane client enrollment allowlist is deferred ("decide later")

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.16 (Agent Registry security), RFC §5.5 (Protocol authentication), docs/plans/phase-53a-agent-registry.md.

Why: A Console deployed to manage a fleet of Harbor runtimes is a control plane; the security model for that needs to be explicit so it is not discovered late.

  1. Fleet control is a distinct, more-elevated privilege tier than fleet observation. Observation (read events, view topology, list agents) and control (pause / drain / restart / force-stop) are different privilege tiers. Control requires a more-elevated scope claim than observation — this extends the §6.5 / §6 elevated-scope-claim concept to the fleet surface. A leaked read-only Console token must not be able to force-stop a fleet. Every fleet-control command is audit-redacted and emitted ("who restarted which agent, from which Console, when").
  2. The Console is just another Protocol client — it authenticates to each runtime with an operator-issued JWT (asymmetric algorithms only, §7), and the Protocol never accepts a request without an identity scope (§8). Deployment posture (private subnet, Console as the only reachable client, optional transport mTLS) is defense-in-depth and is mostly the operator's responsibility, not a runtime feature.
  3. A runtime-side control-plane client enrollment allowlist is deferred. A runtime recording "control-plane client with key-fingerprint F is authorized at scope S" is stronger than per-request JWT scope alone, but the JWT scope covers the core V1 need. This is a "decide later" item, not V1 scope. It is the legitimate inverse of a Console DB ([[D-061]]) — a runtime-side record of authorized controllers, not a Console-side record of agents.

D-067 — Pause/Resume Coordinator: opaque runtime-owned Token, process-local handle/pause registry at V1, durability rides on state.StateStore (no parallel persistence seam), §13 primitive-with-consumer obligation discharged by Phase 53

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.3 + §3.3 + §3.4, docs/plans/phase-50-pauseresume-coordinator.md, internal/runtime/pauseresume/pauseresume.go (Coordinator interface + Pause / PauseRequest / Status value types + Token + Reason typedef bridge), internal/runtime/pauseresume/coordinator.go (process-local coordinator + New + Options), internal/runtime/pauseresume/checkpoint.go (checkpointRecord envelope over state.StateStore), internal/runtime/pauseresume/errors.go (sentinels), internal/runtime/pauseresume/events.go (pause.requested / pause.resumed payloads), test/integration/phase50_durability_test.go, scripts/smoke/phase-50.sh, brief 02 §3 + §4 + §5 + §6.

Why: Phase 50 lands Harbor's ONE pause/resume primitive (CLAUDE.md §7 rule 4) — the master plan flags it a highest-risk critical-path phase ("if it leaks abstractions to planner code, the swappable-planner property regresses"). Four design calls warrant a settled entry.

  1. Token is opaque, runtime-owned, ULID-encoded. RFC §6.3 specifies type Token string "opaque to clients; the runtime owns the encoding". Phase 50 mints tokens via ulid.MustNew(ulid.Now(), crypto/rand.Reader) — monotonic-ish, lexicographically sortable, crypto-strong entropy, concurrency-safe with a stateless entropy source. There is no exported parse/construct helper: clients receive a Token from Request and hand it back to Resume / Status verbatim. The encoding is an implementation detail the Protocol projection (a later phase) never exposes.

  2. The handle registry and the pause registry are both process-local at V1. RFC §6.3 already settled this for the handle directory: "V1: process-local. Resume must run in the same Runtime process. The seam for a distributed handle directory exists … but no production driver ships at V1." Phase 50 reuses Phase 43's trajectory.HandleRegistry (already an interface with a process-local sync.Map driver — D-049) rather than minting a parallel registry, and re-attaches the non-serialisable half of ToolContext through it on Resume — a lost handle surfaces trajectory.ErrToolContextLost verbatim, never a silent nil context. The Coordinator's OWN pause registry (Token → live pause record, behind a documented-invariant sync.Mutex per the D-025 concurrent-reuse contract) is likewise process-local. A distributed handle/pause directory is a post-V1 RFC concern (RFC §6.3 + RFC §12).

  3. Durability rides on the existing state.StateStore — no parallel persistence-driver seam. A literal reading of §4.4 might suggest pauseresume needs its own CheckpointStore interface + driver registry. Phase 50 deliberately does NOT mint one: state.StateStore (Phase 07) is ALREADY the §4.4 persistence seam, with three V1 drivers (in-mem / SQLite / Postgres) at conformance parity (CLAUDE.md §9 + internal/state/conformancetest). A second persistence seam would be the §13 "two parallel implementations of the same conceptual feature" smell. The Coordinator takes an OPTIONAL state.StateStore at construction via WithCheckpointStore; when present, Request serialises the pause record (via trajectory.Trajectory.Serializetrajectory.ErrUnserializable propagates verbatim, no half-persist) into a minimal checkpointRecord{FormatVersion int; ...} envelope keyed on a per-token Kind (pauseresume.checkpoint:<token>) with the Token doubling as the state.EventID so LoadByEventID(token) resolves a pause from a Token alone. The acceptance criterion is verbatim from the master plan: "pauses survive Runtime restart only when StateStore-backed checkpoint is configured." When NO store is configured, pauses are process-local only and a fresh Coordinator returns ErrPauseNotFound for the token. The FormatVersion field is the forward-compatibility hinge: Phase 51 authors the full RFC §6.3 format_version: 1 pause-state serialise contract by deepening this envelope's typed fields, not rewriting it.

  4. The §13 primitive-with-consumer obligation is discharged by Phase 53 — tracked, not forgotten. CLAUDE.md §13 names the unified pause/resume primitive explicitly: "Phase 50 (the primitive) cannot ship without at least one planner (or planner upgrade) emitting RequestPause for a real reason … in the same wave." The Wave 9 coordinator decided the first end-to-end RequestPause-driven-through-the-Coordinator consumer is Phase 53 (steering wiring), same wave, Stage 3 — when a planner emits the RequestPause Decision shape, Phase 53's steering/executor wiring calls Coordinator.Request, drives the protocol-level pause event, and resumes via Coordinator.Resume on an inbound RESUME steering control. Phase 51 (Stage 2) also consumes Phase 50's surface (the pause-record serialise contract on top of the checkpointRecord envelope). The producer side already exists: PauseStep (Phase 48, internal/planner/deterministic/steps.go) emits the planner.RequestPause Decision shape — Phase 53 closes the loop by wiring that emission into the Coordinator. Phase 50's own tests are NOT a substitute for the §13 consumer; they are the direct exercise of the primitive (round-trip, durability across all three StateStore drivers, Status, idempotent/concurrent Request/Resume). The obligation is satisfied at the wave level by Phase 53, which lands before Wave 9 closes.

The D-025 concurrent-reuse contract is pinned in concurrent_test.go: N=200 goroutines (≥100 per the contract) run the full Request → Status → Resume → Status lifecycle against one shared Coordinator under -race — distinct per-goroutine identity quadruples (a context bleed surfaces as a wrong triple or foreign payload), a pre-cancelled-ctx subset (no cross-cancellation), baseline runtime.NumGoroutine restored after join (no leak). A companion test races N=32 goroutines to Resume the same token and asserts exactly one success + N-1 ErrAlreadyResumed (idempotent, no double-apply). The §11 mandatory pause/resume serialisation test (TestRequest_FailsLoudlyOnUnserializableTrajectory) constructs a PauseRequest whose trajectory carries a live channel and asserts Request returns trajectory.ErrUnserializable with a non-empty field path — never a half-persisted checkpoint. Coverage on internal/runtime/pauseresume is 93.9% (master-plan target 90%).


D-068 — Agent Registry implementation calls: version_hash is SHA-256 over canonical JSON of AgentConfig; the connect-to-remote "handle" is a normal locally-minted ULID agent_id discriminated by Hosting; the registry consumes the existing StateStore seam (no registry-specific driver seam)

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.16 (Agent Registry — the algorithm and handle encoding are implementation calls within the RFC's envelope), docs/plans/phase-53a-agent-registry.md (Risks / open questions), internal/runtime/registry/versionhash.go, internal/runtime/registry/registry.go (the Hosting discriminator + AgentRecord.AgentCardRef), internal/runtime/registry/registry_impl.go (StateStore-backed persistence via the generic (Quadruple, Kind, Bytes) surface).

Why: Phase 53a implements against the already-settled D-059 / D-060 / D-061 / D-062 / D-066 — the three-ID model, the in-process per-instance subsystem shape, the Console-DB boundary, the 14-page Console IA, and the fleet-control privilege tier are all design decisions that were NOT re-litigated. But the implementation surfaced three concrete calls the RFC deliberately left open; they are settled here so they do not get re-decided at the next refactor.

  1. version_hash is SHA-256 over a canonical JSON encoding of AgentConfig, hex-encoded. RFC §6.16 specifies a "deterministic content hash over (prompt set, tool set + schemas, planner config, model policy)" but pins neither the algorithm nor the canonicalisation rule. Phase 53a settles: the four configuration dimensions are folded into a canonicalConfig struct with fixed JSON field names; the Prompts slice and the Tools slice are sorted (their order is not semantic), and PlannerConfig / ModelPolicy maps are flattened to key-sorted {k,v} slices (so the encoding is reproducible independent of any encoder's map-key-ordering behaviour); the canonical form is json.Marshal-ed and sha256.Sum256-ed; the result is lowercase hex. This makes version_hash deterministic — same content in, same hash out, regardless of caller-side construction order — which is the property the post-V1 Evaluations program (success-rate-over-version_hash, D-064) depends on. The hash function is pure and holds no package-level state, so it is concurrent-safe by construction (D-025).

  2. The connect-to-remote "handle" is a normal locally-minted ULID agent_id, discriminated by Hosting, not a distinct type. D-060 says the local agent_id for a connect-to-remote agent is a "handle" whose canonical identity lives in the remote A2A AgentCard. Phase 53a settles the encoding: a RegisterRemote mints the same ULID agent_id as Register does, sets AgentRecord.Hosting = HostingRemote, and stores the remote operator's AgentCard reference in AgentRecord.AgentCardRef (empty for local agents). The handle is therefore not a separate type or a separate id-space — it is the same agent_id field, with Hosting as the discriminator and AgentCardRef as the pointer to the authoritative remote record. This keeps the three-ID model uniform across both creation cases (a remote agent still has agent_id + incarnation; version_hash is empty because the configuration is owned remotely) and avoids a parallel "handle table." It is runtime-instance-local and never assumed globally unique, exactly as D-060 requires.

  3. The registry consumes the existing state.StateStore seam — it does NOT define a registry-specific driver seam. The master plan says "StateStore-backed (in-mem / SQLite / Postgres, §4.4 seam)"; this could be read as "the registry needs its own registry/drivers/ seam." It does not. Driver pluralism already lives at the StateStore layer (D-027): the registry persists through the generic (identity.Quadruple, Kind, Bytes) surface — one per-identity agent.index document (the enumeration source for List, since the StateStore surface has no scan operation — the same typed-wrapper-owns-enumeration shape as sessions.Registry) plus one agent.record.<agent_id> document per agent. This is the same call D-027 settled for every persistence-shaped consumer; the registry is not special. "Rehydration on restart" is therefore automatic — a fresh *Registry over a durable StateStore (SQLite / Postgres) sees the prior process's agents because it reads the store on every operation; a fresh *Registry over a fresh in-mem store does not (the in-mem driver is dev-only and non-persistent, D-060).

None of these three calls reaches into RFC territory — they are implementation decisions within the envelope RFC §6.16 + D-059 / D-060 left open. Phase 53a's design is otherwise fully covered by the five pre-settled decisions; D-068 records only the implementation-level calls that warranted a durable home.


D-069 — Pause-state serialise contract closes on the pause-record envelope by SHARING Phase 43's reflective walker (exported as trajectory.ValidateEncodable), not forking a second fail-loudly serialiser; format_version: 1 is stamped by SerializeRecord and enforced by DeserializeRecord; the negative tests are the acceptance gate

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.3 + §3.4, docs/plans/phase-51-pause-serialise-contract.md, internal/planner/trajectory/serialize.go (ValidateEncodable — the exported reflective walker), internal/runtime/pauseresume/pauserecord.go (FormatVersion + SerializeRecord + DeserializeRecord), internal/runtime/pauseresume/errors.go (ErrUnsupportedFormatVersion), internal/runtime/pauseresume/checkpoint.go (saveCheckpoint / loadCheckpoint routed through the pair), internal/runtime/pauseresume/coordinator.go (unconditional Payload-encodability check in Request), internal/runtime/pauseresume/pauserecord_test.go + pauserecord_contract_test.go (the negative-test gate), test/integration/phase51_pause_serialise_test.go (conformance-with-phase-43), brief 02 §4.

Why: Phase 51 closes the load-bearing predecessor-bug — the silent-context-loss path — for the pause record's OWN wire envelope. Phase 43 (D-049) closed it for the trajectory; Phase 50 (D-067) propagated trajectory.ErrUnserializable verbatim out of Coordinator.Request for the trajectory. But the pause record carries one more caller-controlled, JSON-tree-shaped field — Payload map[string]any — and Phase 50's checkpoint save reached it via a bare json.Marshal, which on a non-encodable leaf returns a plain *json.UnsupportedTypeError: technically loud, but without the actionable dotted field path RFC §3.4 mandates ("MUST return ErrUnserializable naming the offending field path"). Three design calls warrant a settled entry.

  1. The pause-record serialise contract SHARES Phase 43's reflective walker — it does NOT fork a second fail-loudly serialiser. The master plan's Phase 51 "Tests" line reads "Conformance with phase 43 Trajectory.Serialize" — a strong steer that Phase 51 must not re-implement the walker. Phase 51 exports Phase 43's walkEncodable as trajectory.ValidateEncodable(v any, root string) error — a pure, stateless, reusable primitive — and Trajectory.Serialize's own pre-flight pass is re-pointed at the exported entry so there is exactly ONE walker entry point. pauseresume.SerializeRecord pre-flight-walks the whole checkpointRecord envelope via trajectory.ValidateEncodable(rec, "PauseRecord") and propagates the SAME trajectory.ErrUnserializable struct sentinel verbatim. The observable proof the walker is shared, not copy-pasted, is asserted in TestSerializeRecord_SharesTrajectoryWalker and TestE2E_PauseSerialise_ConformsWithPhase43: a non-encodable leaf in EITHER the trajectory or the pause record's Payload surfaces the same error type out of Request. A second fail-loudly serialiser would be the CLAUDE.md §13 two-parallel-implementations anti-pattern — exactly the shape the Wave 8 §17.5 checkpoint audit's capfilter extraction (D-052) spent a chore PR killing. Exporting one function is the cheap, correct alternative to forking ~170 lines of reflective walker.

  2. format_version: 1 is stamped by SerializeRecord and enforced by DeserializeRecord. RFC §6.3 settles the pause-state serialisation format as "JSON with format_version: 1" (resolves brief 02 Q-2). Phase 50 shipped the checkpointRecord.FormatVersion int field as the forward-compat hinge but did not yet author the contract on it. Phase 51 settles: pauseresume.FormatVersion (= 1) is the single source of truth; SerializeRecord STAMPS it on every write regardless of the caller's value (so "what version did we write" is single-sourced, not caller-trusted); DeserializeRecord ENFORCES it on load — a record whose format_version is not FormatVersion (a zero/absent version from a corrupt or pre-contract write, OR a higher unknown version from a forward-incompatible newer-Runtime write) is rejected loud with ErrUnsupportedFormatVersion, never silently mis-decoded against the current schema. Bumping FormatVersion is an RFC change. Phase 51 ships the guard, not a multi-version decoder — V1 has exactly one format.

  3. The negative tests ARE the acceptance gate; the serialise check in Coordinator.Request is unconditional. The master plan's Phase 51 "Acceptance" line is verbatim: "Negative tests are the gate. CI fails on any silent-drop regression." Phase 51 takes this literally — the in-package pauserecord_test.go constructs checkpointRecord envelopes with non-encodable Payload leaves (function / channel / complex / nested-function) and asserts SerializeRecord returns (nil, trajectory.ErrUnserializable) with a PauseRecord.payload.<key> field path, never half-encoded bytes, never (nil, nil); the black-box pauserecord_contract_test.go asserts the same through the real Coordinator.Request surface. A consequential implementation call: the Payload-encodability check in Request is unconditional — it runs whether or not a checkpoint store is configured, before a Token is minted. Phase 50 only serialised the envelope when store != nil; that left a no-store pause free to silently carry a Payload field that could never round-trip. The Payload is the pause record's wire shape regardless of persistence, so RFC §3.4's "no silent degradation" applies unconditionally — a non-encodable Payload fails the Request loud either way, and no Token / pause / checkpoint is ever produced.

Phase 51 ships no new reusable artifact: SerializeRecord, DeserializeRecord, and ValidateEncodable are pure stateless functions with no receiver and no package-level mutable state — concurrent-safe by construction, no new D-025 test required. Phase 50's concurrent_test.go (N=200 goroutines against one shared Coordinator) now exercises the Phase 51 serialise path on every Request and still passes under -race. Coverage on internal/runtime/pauseresume is 94.0% (master-plan Phase 51 target 90%); the trajectory package stays at 90.8% after the ValidateEncodable export (the full Phase 43 suite passes unchanged — the export does not alter the trajectory's observable contract).


D-070 — Steering inbox: per-run Runtime-owned inbox + process-wide Registry, the nine-type control taxonomy is a fixed enum (no Register escape hatch), payload bounds enforced at the edge fail-loud (never truncate), per-event scope is a three-tier trust-based claim, the Phase-52-vs-53 boundary is taxonomy+inbox+validation here / run-loop wiring there

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.3, docs/plans/phase-52-steering-inbox.md, internal/runtime/steering/taxonomy.go (ControlType enum + IsValidControlType / ControlTypes), internal/runtime/steering/validate.go (the five payload-bound constants + ValidatePayload), internal/runtime/steering/scope.go (Scope enum + requiredScope mapping + CheckScope), internal/runtime/steering/inbox.go (ControlEvent + per-run Inbox), internal/runtime/steering/registry.go (process-wide RegistryOpen / Lookup / Retire), internal/runtime/steering/events.go (control.rejected event + EmitRejection), internal/runtime/steering/errors.go (sentinels), test/integration/phase52_steering_test.go, scripts/smoke/phase-52.sh, brief 02 §2 + §3 + §4 + §6.

Why: Phase 52 lands the steering primitive — the control taxonomy + the per-run inbox + the Protocol-edge validation + the per-event scope checks. RFC §6.3 settles the what (the nine types, the payload bounds, the per-event scope mapping); five implementation calls warrant a durable home so they are not re-decided at the next refactor.

  1. The inbox is per-run and Runtime-owned; a process-wide Registry owns inbox lifecycle. RFC §6.3 says "the Runtime owns the inbox" and brief 02 §3 says the inbox is "per-run" with the planner observing "RunContext.Control only; it does not receive the inbox." Phase 52 settles the shape: a per-run Inbox (FIFO Enqueue + atomic Drain, identity-scoped to exactly one identity.Quadruple, mutex-guarded so N Protocol-edge goroutines may Enqueue while the run loop Drains) is minted / looked up / retired by a process-wide Registry keyed on the run quadruple. The run component is part of the key — two runs in the same session get distinct inboxes (TestRegistry_SameTripleDifferentRun). Open on a live run fails loud with ErrInboxExists rather than orphaning the first inbox's queued events; Lookup / Drain after Retire fail loud with ErrInboxNotFound. The Registry is the D-025 compiled artifact (the run→inbox map behind a documented-invariant mutex); the Inbox is itself concurrent-safe. Per-run state never leaks across runs — an event whose Identity is not the inbox's own quadruple is rejected with ErrIdentityRequired (the per-run isolation gate, CLAUDE.md §6).

  2. The nine-type control taxonomy is a fixed enum — there is no RegisterControlType escape hatch. RFC §6.3 names the nine types "(Settled)". Phase 52 reads "Settled" strictly: canonicalControlTypes is a fixed package-level map, not a write-once registry like events.RegisterEventType. A tenth control type is an RFC change, not a phase addition — so there is no registration seam to drift through. IsValidControlType is the O(1) gate; ControlTypes() returns the deterministic sorted snapshot for the Phase 54 Protocol allow-list. The wire strings are the RFC's verbatim uppercase identifiers (INJECT_CONTEXT, … USER_MESSAGE).

  3. Payload bounds are enforced at the edge and fail loud — they NEVER truncate. RFC §6.3 settles the five caps (depth ≤ 6, ≤ 64 keys, ≤ 50 list items, ≤ 4096 chars/string, ≤ 16 KiB total). Phase 52 ships them as named constants and ValidatePayload, with two calls settled: (a) the depth cap counts containers only — a scalar leaf inside a depth-6 map does not push the count to 7; a seventh nested map/list is the rejection (this matches "depth ≤ 6" meaning six nesting levels, not five-plus-a-leaf); (b) the key cap is per-map, not cumulative — two 64-key maps in one payload is valid, one 65-key map is not; the 16 KiB total-bytes cap (checked first, via the canonical JSON encoding) bounds cumulative size regardless. A string is rune-counted, not byte-counted. A leaf whose Go type is outside the JSON-shaped accepted set (chan, func, complex, a typed container) is rejected loud with ErrUnsupportedPayloadValue. There is no silent-truncation path (CLAUDE.md §5 "fail loudly") — ValidatePayload returns only an error and never mutates the caller's payload.

  4. Per-event scope is a three-tier, totally-ordered, trust-based Scope claim. RFC §6.3's per-event scope mapping (resolving brief 02 Q-3) is shipped as a Scope enum — ScopeSessionUser < ScopeOwnerUser < ScopeAdmin — and a requiredScope map: INJECT_CONTEXT / USER_MESSAGEScopeSessionUser; REDIRECT / CANCEL / PAUSE / RESUME / APPROVE / REJECTScopeOwnerUser (admin satisfies it by rank — RFC §6.3's "originating user/admin"); PRIORITIZEScopeAdmin. CheckScope also enforces "Cross-tenant steering requires admin" — a caller whose tenant differs from the run's tenant needs ScopeAdmin regardless of the per-type minimum, and an empty caller tenant fails closed. The Scope claim is trust-based at Phase 52, exactly as events.Filter.Admin is until Phase 61 Protocol auth; the control.rejected audit emit on every rejected submission makes abuse retroactively detectable. The Protocol edge (Phase 54) derives the Scope from the caller's JWT and maps ErrScopeMismatch to a 403 — Phase 52 ships the check and the audit emit, not the HTTP status.

  5. The Phase-52-vs-53 boundary: taxonomy + inbox + validation + scope checks here; engine run-loop wiring there. Phase 52 is the primitive. It does NOT drain the inbox in the run loop, propagate CANCEL, block on PAUSE, project onto RunContext.Control, or cap control-history — all of that is Phase 53 (Wave 9, Stage 3), the §13 first consumer, landing in the SAME wave. PAUSE / RESUME / APPROVE / REJECT are taxonomy + scope-check entries in Phase 52; their side effects wire onto the unified pause/resume primitive (internal/runtime/pauseresume, Phase 50) in Phase 53 — Phase 52 mints no parallel pause coordinator (CLAUDE.md §7 rule 4). Phase 52 also ships no §4.4 driver seam: an in-process per-run inbox has no plausible alternate backend, so a drivers/ tree would be ceremony. Phase 52's own tests exercise the primitive directly; the §13 obligation is discharged at the wave level by Phase 53.

The D-025 concurrent-reuse contract is pinned in concurrent_test.go: TestConcurrentReuse_Registry runs N=200 goroutines (≥100 per the contract) through the full Open → Enqueue → Lookup → Drain → Retire lifecycle against one shared Registry under -race — distinct per-goroutine run quadruples (a context bleed surfaces as a foreign RunID on a drained event), and baseline runtime.NumGoroutine restored after join (no leak). TestConcurrentReuse_SingleInbox races N=120 concurrent producers against a concurrent draining consumer on one shared Inbox and asserts every event is drained exactly once (no loss, no duplication). The auth-scope-per-event integration test (test/integration/phase52_steering_test.go) wires the real events.EventBus (in-mem driver) + the real patterns redactor on the seam, walks every one of the nine control types at min scope (accepted) and below-min (rejected → control.rejected audit event), and covers two failure modes (oversize payload, cross-tenant non-admin) plus cross-run isolation. Coverage on internal/runtime/steering is 96.6% (master-plan target 85%).


D-071 — Steering wiring: the RunLoop per-run planner-step loop is the §13 first consumer of BOTH the Phase 50 Coordinator and the Phase 52 inbox; drain-between-steps is the binding invariant; CANCEL is soft-by-default with an optional hard-cancel hook; RequestPause routes through the unified Coordinator; control-history is a per-session newest-wins ring

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.3, docs/plans/phase-53-steering-wiring.md, internal/runtime/steering/runloop.go (RunLoop + NewRunLoop + RunSpec + Run + the WithXxx options + requestPause + mergeSignals), internal/runtime/steering/apply.go (applier + stepControl + applyEvent + the per-control-type side-effect functions), internal/runtime/steering/history.go (controlHistory + MaxControlHistory), internal/runtime/steering/inbox.go (Inbox.WaitForEvent + the coalesced notify channel), internal/runtime/steering/events.go (control.received / control.applied + ControlLifecyclePayload), internal/runtime/steering/errors.go (ErrNoPlanner / ErrRunLoopMisconfigured / ErrNoOutstandingPause / ErrMaxStepsExceeded), test/integration/phase53_steering_wiring_test.go, scripts/smoke/phase-53.sh, brief 02 §3 + §4 + §5 + §6.

Why: Phase 53 closes Wave 9's steering cluster — it wires the Phase 52 steering inbox and the Phase 50 pause/resume Coordinator into a real per-run planner loop. Five implementation calls warrant a durable home.

  1. Phase 53 builds the per-run planner loop (RunLoop) — there was no existing run loop to "wire steering into." The Phase 53 dispatch frame and several upstream phase plans carried an assumption that an engine-level planner run loop already existed. It did not: internal/runtime/engine is a typed graph executor (Phase 10–14), and the only code driving Planner.Next before Phase 53 was the Phase 49 conformance harness and per-planner unit tests. Phase 53 therefore builds the loop — RunLoop in internal/runtime/steering — as the wiring vehicle. This is a §4.3 reasonable plan deviation (a speculative framing turned out wrong once the code was read), not an RFC departure: RFC §6.3 §4 explicitly says "the runtime implements this loop", and RFC §3 lists internal/runtime/steering as the steering home. The loop lives in internal/runtime/steering for V1 because steering wiring IS its reason to exist; the graph engine stays the substrate for graph-family planners. Explicit exit condition (added by the Wave 9 §17.5 audit): RunLoop is really a planner-runtime component — it imports planner / pauseresume / tasks / events, and steering is only one of the ~five things it does — so leaving it in internal/runtime/steering is a layering smell, accepted for V1 only. Issue #81 tracks relocating RunLoop (+ apply.go, runloop.go, history.go) to a dedicated planner-runtime package (internal/runtime/runloop or internal/planner/runtime) at the next planner-runtime phase; that issue IS the named exit condition, not an open-ended "MAY relocate." No new top-level directory at V1; no RFC change.

  2. Drain-between-steps is the binding invariant — the drain happens exactly ONCE per step boundary, never mid-tool-call. brief 02 §6 + sharp-edge #2 are explicit: the predecessor's _apply_steering drained a SteeringInbox inside the planner loop and mutated the trajectory directly, so every alternate planner had to replicate it. RunLoop.Run drains the per-run Inbox once at the top of each step (after the prior decision finished executing, before the next Planner.Next), applies each drained event's side effect, then projects the result onto a fresh RunContext.Control — the planner sees ONLY RunContext.Control, never the Inbox. The integration test TestE2E_Phase53_NoEventAppliedMidToolCall pins the invariant: a control event enqueued mid-step is observed on the NEXT step, never the current one.

  3. CANCEL is soft-by-default; hard: true additionally fires an optional hard-cancel hook. RFC §6.3 + brief 02 §6 distinguish soft and hard CANCEL. Phase 53 settles: a soft CANCEL sets RunContext.Control.Cancelled and the planner returns Finish{Cancelled} at the next boundary; a CANCEL whose payload carries hard: true ADDITIONALLY fires a func(ctx, runID) error hook (WithHardCancelHook) that propagates a cancellation context into an in-flight decision execution. The hook is a functional-option seam, not a hard import of internal/runtime/engine — production wiring passes engine.Cancel, but RunLoop holds only the closure, keeping the step-loop family decoupled from the graph engine. A nil hook is tolerated (a hard CANCEL still terminates the run at the next boundary; the hook only accelerates an in-flight tool's teardown). A CANCEL that arrives while a run is paused terminates the run with Finish{Cancelled} — there is no point waiting for a resume that will never come.

  4. RequestPause routes through the unified pauseresume.Coordinator; PAUSE/RESUME/APPROVE/REJECT side effects converge on the same one primitive — no parallel pause coordinator. This is the §13 obligation and CLAUDE.md §7 rule 4. When a planner emits the planner.RequestPause Decision shape, RunLoop calls Coordinator.Request (issuing a Token, and — when a checkpoint store is configured — a durable checkpoint), then blocks the loop via Inbox.WaitForEvent (a coalesced 1-buffered notify channel — no busy-spin) until a steering control arrives. A RESUME or APPROVE calls Coordinator.Resume and the planner re-enters; a REJECT calls Coordinator.Resume with a rejected: true marker and terminates the run with Finish{ConstraintsConflict} — a rejected HITL gate is a constraint conflict the planner cannot resolve. RFC §6.3 originally left the reject-vs-re-enter question unpinned; the Wave 9 §17.5 audit flagged that settling an open RFC question in a phase-plan decision is RFC drift, so RFC §6.3 was amended (in the same audit chore PR) with a "Rejected HITL gate is terminal" paragraph that pins this behaviour — D-071 now records the implementation of an RFC-settled rule, not the settling itself. Phase 53 mints no second coordinator — D-070 §5 explicitly deferred these side effects to Phase 53, and Phase 53 closes that onto the existing Phase 50 primitive.

  5. Control-history is a per-session, capped, newest-wins ring. RFC §6.3 says "control-history capped per session." Phase 53 settles the shape: controlHistory keys an applied-control ring by SessionID (the steering-relevant scope — a session hosts the operator's steering attention across its runs), caps each ring at MaxControlHistory (default 256, overridable via WithMaxControlHistory), and evicts oldest-first on overflow. A failed side-effect apply is STILL recorded (with its error) — the history is the audit trail, and a silent drop would violate CLAUDE.md §5 "fail loudly". The cap is per session, not per run, because that is what the RFC line says and because a session is the coherent steering-attention scope. Accepted V1 limitation (Wave 9 §17.5 audit): each ring is capped, but the session-keyed map gains one entry per distinct session and is never pruned — controlHistory.forget is implemented + tested but not yet wired, because run-end (the RunLoop.Run defer) is the wrong signal to forget on (a session hosts multiple runs) and no session-end signal exists yet. Issue #79 tracks wiring forget to a real session-end signal at the next session-lifecycle phase; for V1 the per-session map growth is a slow, bounded-per-entry leak deemed acceptable.

§13 primitive-with-consumer — discharged here for BOTH Wave-9 primitives. Phase 53 is the §13 first consumer of (a) the Phase 50 pauseresume.Coordinator — the integration test TestE2E_Phase53_PauseRoundTrip_ThroughCoordinator drives the Phase 48 deterministic.PauseStep (the RequestPause-emitting consumer) through Coordinator.Request → block → APPROVE via the Phase 52 inbox → Coordinator.Resume → planner re-enters; and (b) the Phase 52 steering inbox + nine-type taxonomy — the integration test TestE2E_Phase53_NineEventMatrix exercises all nine control-event side effects through a real steering.Inbox. Both obligations are discharged in-wave (Wave 9, Stage 3) per the binding coordinator decision recorded in D-067 §4 and D-070 §5.

The D-025 concurrent-reuse contract is pinned in concurrent_test.go: TestConcurrentReuse_RunLoop runs N=120 goroutines (≥100 per the contract) each driving ONE distinct run to completion against ONE shared RunLoop under -race — distinct per-goroutine run quadruples (a context bleed surfaces as a foreign run_id in Finish.Metadata), a ~20% pre-cancelled-ctx subset (no cross-cancellation — cancelled runs fail with context.Canceled, non-cancelled runs finish cleanly regardless), and baseline runtime.NumGoroutine restored after join (no leak). RunLoop is a compiled artifact: every field is set once at construction; per-run loop state (stepControl, the outstanding pause Token) lives on the run's own goroutine stack, never on the struct. Coverage on internal/runtime/steering is 92.4% (master-plan Phase 53 target 85%).


D-072 — Protocol task control surface: internal/protocol/ is created with the types/methods/errors single-source layout; Phase 54 ships the transport-AGNOSTIC surface (the ten methods + wire types + the in-process ControlSurface dispatcher) and the SSE+REST wire binding is Phase 60; the ten methods map onto the already-shipped runtime — starttasks.TaskRegistry.Spawn, the nine controls → a steering.ControlEvent on the run's inbox; identity scope is enforced at the Protocol edge; no parallel pause coordinator, no §4.4 driver seam

Date: 2026-05-14 Status: Settled Where it lives: RFC §5.2 + §5.3 + §5.4 + §5.5 + §6.3, docs/plans/phase-54-protocol-control-surface.md, internal/protocol/types/version.go (ProtocolVersion pin), internal/protocol/types/control.go (IdentityScope + StartRequest / StartResponse + ControlRequest / ControlResponse wire types), internal/protocol/methods/methods.go (Method enum + the ten canonical method-name constants + IsValidMethod / IsControlMethod / Methods), internal/protocol/errors/errors.go (Code enum + the seven Protocol error codes + the Error wire type), internal/protocol/protocol.go (ControlSurface + NewControlSurface + ErrMisconfigured), internal/protocol/control.go (Dispatch + the per-method handlers + methodToControlType), internal/protocol/errors.go (the runtime-error → Protocol-code mapping), test/integration/wave9_test.go (the Wave 9 wave-end E2E), scripts/smoke/phase-54.sh, brief 02 §3 + §5, brief 06 §1 + §"Wire format", brief 07.

Why: Phase 54 creates the Harbor Protocol layer — internal/protocol/ did not exist before this phase — and ships its first surface, the task control surface (RFC §5.2 "Task control" row). It also closes Wave 9 (the steering / pause-resume cluster: 50 → 54) and authors the Wave 9 wave-end E2E. Five design calls warrant a durable home.

  1. internal/protocol/ is created with the types / methods / errors single-source layout from the start — Phase 58 is then a no-op formalisation, not a cleanup. CLAUDE.md §8 is binding: "All wire types live in internal/protocol/types/... Method names live in internal/protocol/methods/methods.go. No hardcoded method strings elsewhere... Error codes live in internal/protocol/errors/errors.go." Phase 54 lays the tree exactly per CLAUDE.md §3 — types/ (the ProtocolVersion pin + the four control wire types), methods/ (the Method enum), errors/ (the Code enum + the Error wire type) — so the Phase 58 lint ("Protocol types/methods/errors single source") finds nothing to fix. The protocol package itself defines NONE of those — it only consumes them. The transports/ subdirectory (CLAUDE.md §3) is deliberately NOT created here: it is Phase 60's home for the SSE+REST binding.

  2. Phase 54 ships the transport-AGNOSTIC surface; the SSE+REST wire binding is Phase 60. The master-plan acceptance line reads "all nine endpoints + start round-trip via SSE+REST (phase 60)" — a forward reference. RFC §5.4 settles SSE+REST as the current lean but explicitly says "the relevant phase blocks until [Q-1] resolves", and brief 06's open "Wire format" question confirms the transport is not locked. Phase 54 takes the explicit consequence: it ships ControlSurface.Dispatch(ctx, method, req) — a plain Go entry point, in-process-invocable and fully testable today — and leaves the HTTP/SSE adapter to Phase 60. A Phase 60 HTTP handler is a thin adapter that decodes a request, calls Dispatch, and encodes the response (or maps a *protocol/errors.Error onto an HTTP status). This is not a plan departure: the master-plan detail block, RFC §5.4, and brief 06 all point the same way. The smoke script's HTTP/wire assertions skip with a reason per the 404/405/501 → SKIP convention; the surface is exercised in-process by the package + Wave 9 integration tests. The §13 in-wave consumer of the Phase 54 primitive is the Wave 9 E2E (TestE2E_Wave9_ProtocolDrivenRun_AssembledSurface), which drives a HITL-gated run end to end entirely through ControlSurface.Dispatchstart → the RunLoop reaches a pause → inject_contextapprove → the planner re-enters and finishes. Phase 60 and the Console are the later consumers; the E2E is the in-wave one.

  3. The ten methods map onto the already-shipped runtime through the public Phase 20 / Phase 52 surfaces — starttasks.TaskRegistry.Spawn, the nine controls → a steering.ControlEvent enqueued on the run's steering.Inbox. Dispatch branches on the method: MethodStart builds a tasks.SpawnRequest (a KindForeground task, identity triple from the request's IdentityScope, RunID left empty — Spawn assigns the TaskID) and calls TaskRegistry.Spawn; each of the nine control methods is bridged to its steering.ControlType via the fixed methodToControlType map, a steering.ControlEvent is constructed, the run's inbox is resolved via steering.Registry.Lookup, and the event is handed to Inbox.Enqueue. The Protocol method names are lowercase snake_case (inject_context, user_message — RFC §5.2 verbatim); the steering ControlType wire strings are uppercase (INJECT_CONTEXT, USER_MESSAGE — RFC §6.3 verbatim). The two namespaces are kept distinct on purpose — the Protocol surface owns its own client-facing method vocabulary (brief 07's "the runtime owns the protocol it speaks"); methodToControlType is the single bridge, and an init() assertion keeps it in lockstep with the methods package. Phase 54 does NOT re-implement validation: Inbox.Enqueue runs the whole Phase 52 gauntlet — the identity-match gate, the canonical-type check, CheckScope (per-event scope + cross-tenant-requires-admin), ValidatePayload (the RFC §6.3 payload bounds). A second validator at the Protocol edge would be the CLAUDE.md §13 two-parallel-implementations anti-pattern. The surface's only job is to construct the event, hand it to Enqueue, and map the steering / tasks sentinel onto a stable Protocol error code (internal/protocol/errors.go is that one mapping site).

  4. Identity scope is enforced at the Protocol edge on every method; the scope claim is trust-based until Phase 61. RFC §5.5: "the Protocol rejects any request without an identity scope." Dispatch validates the identity triple before any method reaches the runtime — an incomplete triple fails closed with CodeIdentityRequired (CLAUDE.md §6 rule 9; there is no identity-downgrading knob). The nine control methods additionally require a non-empty Run (a steering control targets a specific run's inbox) and resolve the caller's steering.Scope from the request's IdentityScope.Scope claim — an unrecognised scope fails closed with CodeScopeMismatch, and Inbox.EnqueueCheckScope does the per-event minimum-scope + cross-tenant enforcement. The Scope claim is trust-based at Phase 54, exactly the posture events.Filter.Admin (Phase 05) and steering.CheckScope (Phase 52) hold until Protocol auth (Phase 61) lands — Phase 54 takes the already-derived triple + scope as request inputs and enforces them; it does not parse a JWT. The seven Protocol error codes (invalid_request, identity_required, scope_mismatch, payload_invalid, unknown_method, not_found, runtime_error) are the stable client-facing contract a transport adapter maps onto HTTP statuses; *protocol/errors.Error implements error so an in-process caller reaches the Code via errors.As until the wire transport lands.

  5. No parallel pause coordinator, no §4.4 driver seam. The pause-family control methods (pause / resume / approve / reject) do NOT reach pauseresume.Coordinator directly — they enqueue a steering.ControlEvent, and the Phase 53 RunLoop routes the side effect through the unified Coordinator (CLAUDE.md §7 rule 4; the same convergence D-070 §5 + D-071 §4 settled). Phase 54 mints no second coordinator. And the ControlSurface is an in-process handler with no plausible alternate backend — there is no internal/protocol/drivers/ tree; a §4.4 driver-registry would be ceremony, the same call D-070 / D-071 made for the steering primitives. The wire transport is a plausible alternate-implementation axis (gRPC vs SSE+REST vs WebSocket — RFC §5.4), but that pluralism lives at the transports/ layer Phase 60 builds, not at the ControlSurface dispatcher.

The D-025 concurrent-reuse contract is pinned in concurrent_test.go: TestConcurrentReuse_ControlSurface runs N=150 goroutines (≥100 per the contract) each driving a distinct identity quadruple through start + inject_context against ONE shared ControlSurface under -race — a context bleed would surface as a foreign triple on a drained inbox event or a wrong-tenant spawned task; baseline runtime.NumGoroutine is restored after join (no leak). ControlSurface is a compiled artifact: the TaskRegistry and the steering Registry are set once at NewControlSurface; Dispatch reads run-specific data from ctx + the request argument, never from the struct. The Wave 9 wave-end E2E (test/integration/wave9_test.go) wires real drivers across the full Wave 9 surface — pauseresume.Coordinator over a real in-mem state.StateStore checkpoint store, registry.AgentRegistry, steering.Registry + steering.RunLoop, the Phase 54 ControlSurface — and covers the Protocol-driven HITL round-trip, identity propagation through every layer, four fail-closed-at-the-edge failure modes, and an N=16 concurrency stress. Coverage on internal/protocol is 93.8%, internal/protocol/methods 100%, internal/protocol/errors 100% (master-plan Phase 54 target 85%).


D-073 — OTel traces: a Tracer wrapper deriving spans from events.Event (no public Start), W3C TraceContext propagation via three carrier idioms (traceparent HTTP / _meta MCP / HARBOR_TRACEPARENT env), and a §4.4 span-exporter driver seam (noop default + otlp)

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.14, docs/plans/phase-55-otel-traces.md, internal/telemetry/tracing.go (Tracer + NewTracer + the SpanExporter interface + the exporter factory/registry + SpanFromEvent + LogAttrs + ErrTracerNotConfigured / ErrExporterUnknown), internal/telemetry/propagation.go (InjectHTTP / ExtractHTTP, InjectMeta / ExtractMeta, InjectEnv / ExtractEnv, EnvTraceparent / EnvTracestate), internal/telemetry/drivers/noop + internal/telemetry/drivers/otlp (the two self-registering exporter drivers), cmd/harbor/main.go (the two blank imports), test/integration/phase55_otel_test.go (the cross-subsystem E2E), scripts/smoke/phase-55.sh, brief 06 §1 + §2 + §6 + §"Key data shapes".

Why: Phase 55 closes the predecessor's "no OpenTelemetry in the runtime" gap — brief 06's lessons section is explicit that OTel traces must be a first-class derivation of the event bus, shipped from t=0, not retrofitted. Phase 04 already reserved trace_id / span_id as passthrough Logger attribute names for exactly this phase. Four design calls warrant a durable home.

  1. Tracer derives spans from events.Event — there is no public Start method. Spans are a derivation of the event bus, not a parallel instrumentation path (brief 06 §1). The Tracer's only span-creating entry point is SpanFromEvent(ctx, ev): the span name derives from ev.Type, the identity quadruple + ev.Extra (the bounded, low-cardinality, metric-label-safe map) become span attributes, and NO EventPayload bytes are stamped — payload content is not span-safe and the audit redactor is the only sanctioniser of payload bytes (D-020). A contributor cannot sprinkle tracer.Start(...) across subsystems and grow a second observability channel: subsystems emit events, the event-to-span bridge produces the spans. Run/step alignment falls out for free — an event carrying a run_id produces a run-attributed span, and a step-granularity event derived under the run span's ctx becomes a child span, so the trace tree mirrors the run/step hierarchy.

  2. W3C TraceContext propagation ships as three standalone carrier idioms, each with an Inject* / Extract* half. RFC §6.14 names all three: traceparent HTTP header (HTTP southbound), _meta.traceparent per-request map (stdio MCP), HARBOR_TRACEPARENT env var (stdio child-process spawn). All three encode the SAME W3C bytes — the stdio idioms are just different carriers. The helpers are standalone functions, NOT Tracer methods, so the southbound transport drivers (Phase 27 tools/HTTP, Phase 28 tools/MCP — both already shipped) can wire them in without holding a *Tracer reference. Extraction is best-effort by W3C design: a malformed or absent traceparent yields a ctx with no valid span context (no panic, no partial state) — SpanFromEvent then starts a root span instead of a child. The loud failure mode is reserved for exporter misconfiguration (ErrExporterUnknown), not for extraction of a remote trace id.

  3. The span exporter sits behind a §4.4 driver seam: noop (default) + otlp. NewTracer selects the exporter by TelemetryConfig.OTelEndpoint — empty → noop (spans are still created so in-process propagation works; they are just never shipped), non-empty → otlp (OTLP/gRPC, lazy-connect, insecure transport at V1). The SpanExporter interface lives in the telemetry package; the two drivers live in internal/telemetry/drivers/{noop,otlp}/ and self-register from init(); the factory dispatches by name and its ErrExporterUnknown message lists the registered drivers. The OTLP/gRPC exporter connects lazily, so NewTracer is fast and a collector can come up after the Runtime — the acceptance criterion "integration with a Jaeger/OTLP collector" is satisfied structurally (the integration test uses an in-memory recorder exporter via the documented WithSpanExporter test seam; a real collector run is an operator smoke, not a CI gate). The OTel SDK trace dependency is RFC-sanctioned — RFC §6.14 names go.opentelemetry.io/otel/trace.Tracer explicitly — and go.opentelemetry.io/otel/* was already an indirect dependency; Phase 55 promotes the trace SDK + OTLP/gRPC trace exporter to direct. No RFC PR needed.

  4. Tracer.LogAttrs(ctx) closes the logs↔traces correlation loop without a parallel channel. It returns the trace_id / span_id slog.Attr pair from the span context in ctx (empty slice when no span is active — composes cleanly with the Phase 04 Logger, which elides absent attributes). logger.With(tracer.LogAttrs(ctx)...) stamps trace correlation onto every log line, so logs and traces share the trace id rather than being parallel channels (brief 06 lessons). NewTracer sets the OTel global TextMapPropagator once to the W3C composite (TraceContext + Baggage) — write-once mutable SDK state, idempotent on repeated construction. TelemetryConfig is NOT changed — the OTelEndpoint + ServiceName fields already existed (Phase 02); Phase 55 consumes them. Phase 55 ships traces only — metrics / OTLP-metrics / the Prometheus exporter are Phase 56 (RFC §6.14 §11 Q-5); the metrics-cardinality discipline ("never tag metrics by trace_id / run_id") is Phase 56's concern.

The D-025 concurrent-reuse contract is pinned in internal/telemetry/tracing_test.go: TestConcurrentReuse_Tracer runs N=150 goroutines (≥100 per the contract), each with a goroutine-unique identity quadruple, driving SpanFromEvent + the three propagation round-trips + LogAttrs against ONE shared *Tracer under -race — a context bleed surfaces as a foreign quadruple on a recorded span, baseline runtime.NumGoroutine is restored after join + tracer shutdown (no leak). Tracer is a compiled artifact: every field (the SDK TracerProvider, the trace.Tracer, the TextMapPropagator) is set once at construction; per-span state lives in the returned ctx, never on the struct. The §17 integration test (test/integration/phase55_otel_test.go) wires the real events.EventBus (inmem) + the real telemetry.Logger + a real *Tracer backed by an in-memory recorder exporter — an event published on the bus → SpanFromEvent → a span carrying the event's identity quadruple; identity propagation ctx → event → span attributes → LogAttrs → a Logger line carrying the same trace_id; trace continuity across all three carriers; the ErrExporterUnknown failure mode; and an N=24 concurrency stress with no identity cross-talk and no goroutine leak. Coverage on internal/telemetry is 87.8%, internal/telemetry/drivers/noop 100%, internal/telemetry/drivers/otlp 85.7% (master-plan Phase 55 target 85%).


D-074 — Durable event log: a standalone "durable" events driver persists every event through StateStore keyed by (SessionID, Sequence) for exact gap-free replay across restarts; the events.Factory signature is unchanged (the registry-path factory opens its own StateStore from new optional EventsConfig.StateDriver/StateDSN fields); a missing StateStore degrades to a best-effort ring buffer LOUDLY (runtime.warning + slog.Warn); replayed payloads rehydrate as events.RedactedMap

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.13 (the "durable log driver, StateStore-backed, Phase 57" line + the ring-buffer-vs-durable replay split), RFC §9 (the persistence triad the driver consumes), docs/plans/phase-57-durable-event-log.md, internal/events/drivers/durable/durable.go (the driver + the durable registry factory), internal/events/drivers/durable/record.go (the persisted-event + head-record codec), internal/events/drivers/durable/subscription.go (the live fan-out path), internal/config/config.go (EventsConfig.StateDriver / StateDSN), internal/config/validate.go (allowedEventDrivers gains "durable"; validateEvents validates the new fields), cmd/harbor/main.go (blank import), test/integration/durable_eventlog_test.go (the cross-StateStore-driver E2E), brief 06 §"Roadmap" item 4 + §"Replay semantics" + §"Persistence", brief 05 §"StateStore".

Why: Phase 57 makes replay-from-cursor exact and gap-free across a Runtime restart — the load-bearing dependency for the post-V1 Evaluations program (D-064), which is built on fully-replayable sessions. Six implementation calls warrant a durable home.

  1. The durable driver is a standalone §4.4 driver, not a wrapper over the inmem driver. It implements events.EventBus + events.Replayer directly, owning its own monotonic gap-free sequence counter, its own subscriber fan-out (drop-oldest + windowed bus.dropped), and its own persistence path. Wrapping the inmem driver was rejected: the inmem bus assigns Sequence internally and never returns it, and its internal sibling events (bus.dropped, bus.subscription_idle_closed) advance its counter — so a wrapper could not keep the persisted sequence in lockstep with the bus sequence. A standalone driver is the honest §4.4 shape and mirrors the independence of the internal/state drivers. The durable driver deliberately has NO idle reaper goroutine — idle-subscriber reaping is the inmem driver's concern; durability is this driver's concern.

  2. Keying scheme: the durable log is SESSION-scoped (matching events.Cursor = (SessionID, Sequence)), built from one mutable head record plus one immutable entry record per event, because StateStore has no list/scan method. Both record kinds are stored under the session triple with RunID="" — an event's own RunID is preserved INSIDE the persisted JSON, not in the storage key. The head record (Kind = "events.durable.head") holds the ordered list of bus-sequences persisted for that session; each entry record (Kind = "events.durable.entry/<zero-padded-seq>") holds the JSON-encoded event. Publish assigns the next bus sequence, writes the entry record, then read-modify-writes the head's sequence list — all under one publishMu acquisition, so the head list and the sequence counter never disagree and the persisted log is in strict sequence order. A torn write (entry persisted, head not yet advanced) never produces a gap in a served replay: Replay only ever returns sequences the head record lists, and the next Publish re-derives the head list. Replay reads from the StateStore — not an in-memory ring — so a late subscriber connecting after the Runtime was rebuilt against the same StateStore sees the full gap-free history.

  3. The events.Factory signature is unchanged; the registry-path factory opens its own StateStore from two new OPTIONAL EventsConfig fields. events.Factory is func(config.EventsConfig, audit.Redactor) (EventBus, error) — it carries no StateStore, and widening it would ripple through every existing driver and call site. Instead, EventsConfig gains StateDriver and StateDSN (both omitempty, StateDSN tagged secret:"true"), and the durable factory calls state.Open itself when StateDriver is set. This is a backward-compatible config addition (CLAUDE.md §10): every other driver ignores the fields, and an empty StateDriver is a valid config (it routes to the best-effort degradation, not a config error). validateEvents validates the pairing exactly as validateState does — a non-inmem StateDriver requires a StateDSN. Store ownership follows who opened it: the registry-path factory marks the bus as the store's owner (an unexported withOwnStore() option) so bus.Close closes the store; a caller that passes a store into the exported New owns the store's lifecycle and Close leaves it open.

  4. A missing StateStore degrades to a best-effort in-memory ring buffer LOUDLY — never silently. Brief 06 §"Persistence" says "without [the StateStore], replay degrades to ring-buffer-only" and the runtime "ships a usable in-memory experience without StateStore." Phase 57 honours that but tightens it per CLAUDE.md §13 "no silent degradation": when New is handed a nil store (or the registry factory sees an empty StateDriver), the driver runs in best-effort mode AND emits an slog.Warn at construction stating that replay is NOT durable across restarts. In best-effort mode ReplayBufferSize sizes the fallback ring and Replay applies the same ErrCursorTooOld / ErrReplayUnavailable semantics as the inmem driver. This is a strengthening of the brief finding, not a departure — there is no "Findings I'm departing from" entry on the phase plan because the brief is followed, just made loud.

  5. Replayed payloads rehydrate as events.RedactedMap, not their concrete typed shape. StateStore.Bytes is opaque; the durable driver JSON-encodes the event and, on replay, reconstructs the payload as events.RedactedMap{Data: ...} — the exact same generic post-redaction shape the inmem bus already produces for any payload that is not SafePayload. Concrete typed payloads are NOT round-tripped: reconstructing them would require a payload-type registry the durable log deliberately does not own. Replay consumers read fields via RedactedMap.Data. This matches the existing redaction-boundary contract (D-020 / D-028); revisit only if a future consumer needs typed replay.

  6. Persistence failures and bus-internal notices fail loudly / are transient, respectively. A StateStore.Save failure surfaces from Publish as a wrapped error — the event is NOT enqueued and nextSeq is NOT advanced (the failed sequence is retried by the next Publish, keeping the persisted log gap-free). Silently dropping a persistence failure would foreclose the gap-free guarantee Phase 57 exists to provide (CLAUDE.md §5 + §13). Bus-internal sibling notices (audit.admin_scope_used, audit.redaction_failed) are per-call observability, NOT session event history — an admin_scope_used event for a fully-admin filter does not even carry a complete identity triple, so it cannot be a StateStore record. Such notices are assigned a bus sequence (for live ordering) and fanned out, but are NOT persisted to the durable log; the durable log is the gap-free session history, and transient per-call notices are not part of it.

The D-025 concurrent-reuse contract is pinned in concurrent_test.go: TestConcurrentReuse_DurableBus runs N=120 goroutines (≥100 per the contract) each publishing a batch for a DISTINCT identity quadruple against ONE shared durable bus under -race, then replaying its own session — a context bleed surfaces as a foreign triple in a replayed event; a ~20% pre-cancelled-ctx subset proves no cross-cancellation; baseline runtime.NumGoroutine is restored after Close. The driver is a compiled artifact: every field is set once at construction; per-publish state lives under publishMu, per-subscriber state on the subscription, and nothing run-specific is stored on the struct. The cross-subsystem integration test (test/integration/durable_eventlog_test.go) wires the durable driver against all three real StateStore drivers (in-memory, SQLite, Postgres — Postgres t.Skips with a reason when HARBOR_PG_DSN is unset), covering the publish→teardown→rebuild→replay-no-gaps acceptance scenario, identity propagation through every layer, cross-tenant isolation across replay, a closed-store-mid-publish failure mode, and an N=16 concurrency stress across the events↔state seam. Coverage on internal/events/drivers/durable is 85.2% (master-plan Phase 57 target 85%).

Amended (Wave 10 audit fixes — PR #91 / D-082) after the CLAUDE.md §13 "Test stubs as production defaults on operator-facing seams" rule: the registry-path factory (events.Register("durable", ...)) NO LONGER auto-degrades to the in-memory ring when EventsConfig.StateDriver is empty — it returns a wrapped error naming the missing config key and pointing the operator at events.driver=inmem (the explicit non-durable path). An operator who configures events.driver=durable has signalled they want durability; silently producing a non-durable bus matches exactly the operator-confusion failure mode §13 forbids. The in-process durable.New(... store=nil ...) constructor still emits the loud slog.Warn + falls back to the in-memory ring — that constructor is retained as the test-only seam for exercising the degraded mode (and is not reachable from the registry path). The Wave 10 audit's WARN-1 surfaced this drift; the registry path now fails closed at boot.


D-075 — Protocol single-source enforcement: a go/parser AST-walking go test (internal/protocol/singlesource) is the build gate; it lints internal/protocol/ only (not all of internal/); it lints _test.go files too; the Error wire type is single-sourced in internal/protocol/errors (not types); the master-plan "§8" citation resolves to CLAUDE.md §8

Date: 2026-05-14 Status: Settled Where it lives: RFC §5.1 + §5.2 + §5.3, CLAUDE.md §8 + §13, docs/plans/phase-58-protocol-single-source.md, internal/protocol/singlesource/singlesource.go (ScanProtocolTree + Violation + CanonicalMethods + CanonicalWireTypes + the Kind* constants + scanFile + dirAllowsKind + isProtocolErrorsCodeType), internal/protocol/singlesource/singlesource_test.go (the build-gating clean-tree lint + the per-kind detection / no-false-positive / lockstep tests), internal/protocol/singlesource/internal_test.go (the unexported-predicate unit tests), internal/protocol/control.go + internal/protocol/errors_internal_test.go + internal/protocol/types/types_test.go (the consolidated pre-existing method literals), scripts/smoke/phase-58.sh, brief 06 §1, brief 07.

Why: Phase 58 formalises the Harbor Protocol single-source discipline CLAUDE.md §8 mandates and Phase 54 (D-072 §1) laid the foundation for — the canonical packages internal/protocol/methods, internal/protocol/errors, internal/protocol/types are the only definition sites for Protocol method names, error codes, and wire types. Phase 54 built the layout correctly; Phase 58 adds the mechanical gate so the discipline cannot erode as Phases 59–62 extend the Protocol surface. Five design calls warrant a durable home.

  1. The enforcement is a go/parser AST-walking go test, not a custom golangci-lint analyzer and not a shell script. The repo already proves the pattern: internal/planner/conformance/importgraph_test.go is a go/parser walk that gates the §13 planner-does-not-import-runtime invariant with zero external-tool dependency. Phase 58 reuses that shape. A golangci-lint plugin would need a separate build + a .golangci.yml entry (a new linter needs a PR rationale per CLAUDE.md §5) and would not be runnable as a plain go test. A shell grep could not be precise — a method name inside a comment, a doc string, or a struct tag is not a violation; only a real string-literal expression / const declaration / type declaration is. go/parser sees the AST, so the checker flags a BasicLit STRING whose unquoted value is a canonical method name, a GenDecl CONST of type protocol/errors.Code, or a TypeSpec redeclaring a canonical wire type — and nothing else. The checker (ScanProtocolTree) is a reusable pure function over a filesystem root with no package-level mutable state; the Phase 58 test is its first consumer, and a later phase (a harbor lint subcommand, Phase 59's versioning discipline) can call it without a second implementation.

  2. The method-literal lint is scoped to internal/protocol/ only — not all of internal/. Strings like "cancel", "pause", "reject", "user_message" are legitimate, unrelated domain vocabulary in other subsystems: tasks/groups.go's GroupAction / PatchAction, runtime/registry's agent-command strings, planner/trajectory's trajectory-entry kinds. A repo-wide scan for those literals would be all false positives. CLAUDE.md §8's "No hardcoded method strings elsewhere" is read as "elsewhere in Protocol-surface code" — the checker walks internal/protocol/ (skipping vendor/, testdata/, and its own singlesource/ package, which necessarily names the canonical methods in its CanonicalMethods set). If a future phase ever grew Protocol-method handling outside internal/protocol/ — which would itself be a layering smell — the checker's root would need widening; this is the named exit condition, not an open question.

  3. The checker lints _test.go files too. A Protocol method string hardcoded in a test is the same drift as one hardcoded in production — the importgraph_test.go precedent treats test files identically (an import is an import). Phase 58 surfaced and consolidated three pre-existing literals this catches: internal/protocol/control.go's dispatchStart hardcoded "start" (now reads methods.MethodStart), internal/protocol/errors_internal_test.go passed "cancel" / "start" as method-context arguments (now string(methods.MethodCancel) / string(methods.MethodStart)), and internal/protocol/types/types_test.go used Method: "pause" as a JSON round-trip fixture (now string(methods.MethodPause)). None were bugs — but each was a second textual definition site the lint now forbids. Consolidating them in the same PR is the §17.6 "fix what the lint finds" discipline.

  4. The Error wire type is single-sourced in internal/protocol/errors, not internal/protocol/types. Single-sourcing means "exactly one home", not "all wire types in the same directory". Phase 54 (D-072 §1) deliberately placed the Error wire struct in internal/protocol/errors/errors.go alongside the Code constants it carries — the error wire type and the error codes are one cohesive surface. So the checker's CanonicalWireTypes is a map[string]string (type name → home package), not a flat set with one assumed home: IdentityScope / StartRequest / StartResponse / ControlRequest / ControlResponse live in types, Error lives in errors. A TestSingleSource_CanonicalWireTypesInLockstep test parses both canonical packages and asserts every exported struct type they declare is recorded in the checker under the right home — so a wire type moving home, or a new one landing, fails CI.

  5. The master-plan Phase 58 row cites "§5, §8"; RFC-001 has no §8 — "§8" is CLAUDE.md §8. RFC-001-Harbor.md's section numbering stops at the runtime/protocol/console chapters; there is no RFC §8. The master-plan citation "§8" resolves to CLAUDE.md §8 "Harbor Protocol rules" — the binding operational spec this phase mechanically enforces ("All wire types live in internal/protocol/types/... Method names live in internal/protocol/methods/methods.go. No hardcoded method strings elsewhere... Error codes live in internal/protocol/errors/errors.go"). RFC §5 is the design anchor; CLAUDE.md §8 is the rule Phase 58 turns into a build gate. The phase plan's "RFC anchor" therefore lists RFC §5.1/§5.2/§5.3 (which resolve), with a documentation note explaining the "§8" resolution. This is a §4.3 citation clarification, not a design departure.

§13 primitive-with-consumer — N/A by construction. Phase 58 ships no primitive (no interface, no control instruction, no decision shape, no runtime mechanism) — it ships an enforcement checker over an already-shipped layout. The checker's first and only consumer is its own build-gating test, which exercises it against the real internal/protocol/ tree in the same PR. There is no §17 integration test: Phase 58 is a build-time static source checker that wires no runtime drivers and opens no cross-subsystem seam (CLAUDE.md §17.1 — a phase that consumes no shipped subsystem's runtime surface is exempt). There is no D-025 concurrent-reuse test: ScanProtocolTree is a pure function with no construction-time dependencies and no per-invocation goroutines — it is not a "compiled artifact" in the D-025 sense. Coverage on internal/protocol/singlesource is 94.5% (master-plan Phase 58 target 90%).


D-076 — OTel metrics: a MetricsRegistry deriving the canonical harbor_events_total counter from events.Event (labels event_type / producer / node only — ev.Identity is physically unreachable), the metric exporter behind a §4.4 driver seam (prometheus default + otlpmetric), a built-in Prometheus /metrics http.Handler, and a go/parser static cardinality-lint as the CI gate

Date: 2026-05-14 Status: Settled Where it lives: RFC §6.14, docs/plans/phase-56-metrics.md, internal/telemetry/metrics.go (MetricsRegistry + NewMetricsRegistry + the MetricExporter interface + the exporter factory/registry + RegisterEvent + PrometheusHandler + the PromGatherer contract + ErrMetricsNotConfigured / ErrMetricExporterUnknown / ErrPrometheusHandlerUnavailable), internal/telemetry/drivers/otlpmetric + internal/telemetry/drivers/prometheus (the two self-registering metric-exporter drivers), internal/telemetry/cardinalitylint (ScanMetricsTree + Violation + the forbidden-label set + the go/parser AST walk + the testdata/badmetric negative fixture), cmd/harbor/main.go (the two blank imports), test/integration/phase56_metrics_test.go (the cross-subsystem E2E), scripts/smoke/phase-56.sh, brief 06 §1 + §"Lessons from the predecessor" / "Metrics cardinality footgun" + §"Roadmap" item 7.

Why: Phase 56 closes the second half of the predecessor's "no OpenTelemetry in the runtime" gap (Phase 55 closed traces) and makes the brief 06 "metrics cardinality footgun" lesson mechanically un-violatable. RFC §6.14 settles the exporter set: OTLP default, a built-in Prometheus /metrics endpoint at V1 (resolves brief 06 Q-2). Five design calls warrant a durable home.

  1. MetricsRegistry derives the canonical counter from events.Event — there is no public Counter / Meter accessor. Metrics are a derivation of the event bus, not a parallel instrumentation path (brief 06 §1) — the same load-bearing decision Phase 55 applied to spans. The registry's only recording entry point is RegisterEvent(ctx, ev), which increments harbor_events_total with labels event_type (ev.Type), producer (ev.Extra["producer"], "unknown" when absent), and node (ev.Extra["node"], "" when absent). RegisterEvent reads NO field of ev.Identity — there is no code path on the registry that touches the run quadruple, so a metric tagged by RunID / TraceID is impossible by construction. This is the cardinality firewall: the predecessor's docs warn "Never tag metrics by trace_id"; Harbor makes the production boundary closed by structure, not by discipline. NewMetricsRegistry touches NO OTel global — it builds a private MeterProvider and the registry is passed to callers explicitly; a global MeterProvider would be an ambient parallel metrics channel.

  2. Producer / NodeName are realised as the reserved Event.Extra["producer"] / Event.Extra["node"] keys — not new events.Event struct fields. Brief 06's Event shape sketch has Producer / NodeName fields, but the Phase 05 events.Event doc explicitly reserves the Extra map "for Phase 56's bounded low-cardinality metric labels" — and Event.Extra is documented as "bounded, low-cardinality; safe for metric labels". Phase 56 consumes that reserved slot rather than widening the Event struct (which would ripple through every Publish call site and every driver). This is a §4.3 plan-shape clarification, not a brief departure: the two values stay inside the bounded Extra map, which is the same cardinality boundary the brief intends. The phase plan's "Findings I'm departing from" is "None".

  3. The metric exporter sits behind a §4.4 driver seam mirroring Phase 55's SpanExporter seam exactly: prometheus (default) + otlpmetric. NewMetricsRegistry selects the exporter by TelemetryConfig.OTelEndpoint — empty → prometheus (the built-in /metrics pull endpoint; no collector needed), non-empty → otlpmetric (OTLP/gRPC push, lazy-connect, insecure transport at V1, identical stance to Phase 55's otlp span driver). The MetricExporter interface lives in the telemetry package; the two drivers live in internal/telemetry/drivers/{otlpmetric,prometheus}/ as sibling dirs to Phase 55's {noop,otlp}/, self-register from init(), and the factory dispatches by name with an ErrMetricExporterUnknown message that lists the registered drivers. The prometheus driver builds a FRESH prometheus.Registry per NewMetricsRegistry (via WithRegisterer) — NOT the process-global prometheus.DefaultRegisterer — so two registries in one process never collide on the global, keeping the D-025 contract. telemetry.PrometheusHandler(reg) recovers that per-registry Gatherer through the PromGatherer contract and returns a promhttp http.Handler; called on an otlpmetric-backed registry it fails loudly with ErrPrometheusHandlerUnavailable (an OTLP-push registry has no pull surface — a genuine fact, not an optional-capability toggle). The OTel metrics SDK + the OTLP-metric/Prometheus exporters are RFC-sanctioned new direct dependencies — RFC §6.14 names the OTel Metrics SDK AND the built-in Prometheus /metrics endpoint explicitly; go.opentelemetry.io/otel/metric was already an indirect dependency.

  4. The cardinality discipline is enforced by a go/parser AST-walking go test (internal/telemetry/cardinalitylint), scoped to metric.WithAttributes(...) so span attributes are untouched. Brief 06 calls for "a static check that fails CI if any metric registers a label deriving from TraceID/RunID/free-form input". ScanMetricsTree reuses the proven repo pattern (internal/protocol/singlesource from Phase 58, internal/planner/conformance/importgraph_test.go): a precise AST walk, zero external-tool dependency, no golangci-lint plugin. It flags two kinds — KindForbiddenLabelKey (an attribute.* constructor whose literal key matches run_id / trace_id / span_id / task_id / the identity-triple keys) and KindIdentitySourcedLabel (an attribute.* constructor whose value argument is a selector ending in an events.Event Identity field). The critical scoping call: BOTH spans and metrics build attribute lists with the same attribute.String(...) constructors, and a span legitimately carries run_id (Phase 55 / D-073 stamps the run quadruple onto event-derived spans on purpose — correct trace correlation). So the checker only inspects attribute.* calls lexically nested inside a metric.WithAttributes(...) call; a span's attribute.String("run_id", …) inside trace.WithAttributes is left alone. TestCardinalityLint_TelemetryTreeIsClean is the build gate; a testdata/badmetric negative fixture (NOT compiled into the build) proves the checker actually catches both violation kinds AND does NOT flag the fixture's span-like attribute slice.

  5. The /metrics endpoint ships as the standalone telemetry.PrometheusHandler http.Handler constructor; the live Runtime server that mounts it is Phase 60+. There is no internal/server/ package and cmd/harbor is a stub until the Phase 09+/60 bootstrap. Phase 56 ships PrometheusHandler as a standalone constructor — the same pattern Phase 55's propagation carriers used (standalone helpers, wired by a later phase). The §13 primitive-with-consumer obligation is discharged by the Phase 56 unit + integration tests exercising the handler via httptest end-to-end. The master-plan "§11 Q-5" citation is a §4.3 clarification: RFC §11's Q-5 is the skill-versioning question — unrelated to metrics; the metrics-exporter question is brief 06 Q-2, resolved by RFC §6.14, and "§11 Q-5" is read as "the §11-tracked metrics-exporter question is settled".

The D-025 concurrent-reuse contract is pinned in internal/telemetry/metrics_test.go: TestConcurrentReuse_MetricsRegistry runs N=150 goroutines (≥100 per the contract), each with a goroutine-unique identity quadruple AND goroutine-unique Extra producer/node, calling RegisterEvent against ONE shared *MetricsRegistry under -race — a label cross-talk surfaces as a foreign producer/node series or an identity value leaking into a label; baseline runtime.NumGoroutine is restored after join + registry shutdown (no leak). MetricsRegistry is a compiled artifact: every field (the SDK MeterProvider, the reader, the Int64Counter) is set once at construction; per-call state (one event's attribute set) is built on the stack in RegisterEvent, never stored on the struct. The §17 integration test (test/integration/phase56_metrics_test.go) wires the real events.EventBus (inmem) + the real telemetry.Logger + a real *MetricsRegistry backed by the REAL prometheus driver — events on the bus → a real bus subscription → RegisterEvent → the /metrics httptest body carries harbor_events_total with the right per-label counts AND is identity-free even though the published events carried full quadruples (the cardinality firewall, end-to-end); the ErrMetricExporterUnknown + ErrPrometheusHandlerUnavailable failure modes; and an N=16 concurrency stress across the events↔metrics seam asserting no label cross-talk and no goroutine leak (the inmem bus is drop-oldest, so the stress asserts the seam's concurrency-safety contract, not lossless totals — the exact-count contract is the D-025 unit test's job). Coverage on internal/telemetry is 88.6% (master-plan Phase 56 target 85%); internal/telemetry/cardinalitylint 89.6%, internal/telemetry/drivers/prometheus 85.7%, internal/telemetry/drivers/otlpmetric 85.7%.

Amended (Wave 10 audit fixes — PR #91 / D-082) per CLAUDE.md §13: the events→metrics bridge helper was extracted from the test-only drainToMetrics goroutine into the production-side telemetry.BridgeBusToMetrics(ctx, bus, reg, filter) (stop func(), err error). This pins one canonical contract for the future Phase 64 harbor dev server bootstrap to consume rather than letting it reinvent the wiring — the existing integration test now consumes telemetry.BridgeBusToMetrics so the same code path is exercised end-to-end. The Wave 10 audit's WARN-6 surfaced this drift.


D-077 — Protocol versioning discipline: the ProtocolVersion string pin stays the RFC-change trip-wire; Phase 59 adds the parsed Version (semver Major/Minor/Patch, same-major Compatible), the structured Deprecation note format + empty Deprecations() registry, and the Capability set + VersionHandshake capability-negotiation shape — all in the single canonical home internal/protocol/types, no version bump

Date: 2026-05-14 Status: Settled Where it lives: RFC §5.3 (+ §5.2 the surface table the capability set mirrors), CLAUDE.md §8, docs/plans/phase-59-protocol-versioning.md, internal/protocol/types/version.go (ProtocolVersion pin unchanged + Version / ParseVersion / ErrInvalidVersion / CurrentVersion / Compare / Compatible; DeprecationKind + Deprecation / Validate / String / ErrInvalidDeprecation + Deprecations(); Capability / CapTaskControl / Capabilities / IsValidCapability; VersionHandshake / CurrentHandshake / Accepts), internal/protocol/types/version_test.go (the versioning-discipline unit suite), internal/protocol/singlesource/singlesource.go (CanonicalWireTypes += Version / Deprecation / VersionHandshake under home types), scripts/smoke/phase-59.sh, brief 06 §1, brief 07.

Why: Phase 59 turns the Harbor Protocol version pin — the ProtocolVersion string Phase 54 (D-072 §1) placed in internal/protocol/types/version.go — into a versioning discipline: the mechanism a Protocol surface needs to live with versions, deprecate elements, and let a client negotiate which surfaces are live. Four design calls warrant a durable home.

  1. The ProtocolVersion string constant stays the RFC-change trip-wire; the parsed Version is derived from it, never a second source. CLAUDE.md §8 + RFC §5.3 are binding: "The Protocol version is pinned in internal/protocol/types/version.go. Bumping the version is an RFC change." Phase 59 does not bump the version — it stays "0.1.0", and the Phase 54 TestProtocolVersion_Pinned trip-wire is untouched. The new Version struct (Major/Minor/Patch) is the parsed form a client uses to reason about a version — Compare for ordering, the same-major Compatible rule for skew detection — instead of string-comparing. CurrentVersion is mustParseVersion(ProtocolVersion) evaluated at package-init: one source (ProtocolVersion), one derived parse, and TestCurrentVersion_MatchesProtocolVersion pins CurrentVersion.String() == ProtocolVersion so the two can never drift. This is not the CLAUDE.md §13 "two parallel implementations" smell — it is a single source plus a derivation with a test gating the derivation. Compatible is same-major because a Major bump is the breaking change (which is why bumping it is an RFC change); Minor/Patch differences are backward-compatible by construction. ParseVersion fails loud with a wrapped ErrInvalidVersion on any malformed input (empty, wrong arity, non-numeric or negative component, surrounding whitespace, a pre-release suffix) — no silent zero-Version degradation (CLAUDE.md §5).

  2. The deprecation window gets a structured Deprecation note format with a single-home registry — even though the registry is empty at 0.1.0. RFC §5.3: "Breaking changes require a deprecation window so third-party Consoles aren't whipsawed." Phase 59 settles that window's format: a typed Deprecation wire struct (Subject / Kind / DeprecatedIn / RemovedIn / Replacement / Note) with Validate (fail-loud on an empty subject, an unknown DeprecationKind, or a RemovedIn not strictly after DeprecatedIn — an empty or inverted window is malformed) and a canonical String rendering (<kind> "<subject>" is deprecated in <deprecated_in>, removed in <removed_in>[; use <replacement>][ — <note>]). DeprecationKind is a fixed four-value enum — the four kinds of Protocol element the versioned surface exposes (method / error_code / wire_field / capability). Deprecations() is the registry — empty at Protocol 0.1.0 because the task control surface just shipped and nothing has been superseded — but it exists, with its consumer, from day one: per CLAUDE.md §13's primitive-with-consumer rule, shipping the Deprecation format without a home that returns it would let the format bit-rot. The first real deprecation lands in the registry in the phase that supersedes a Protocol element, populating the format Phase 59 settled rather than inventing a new free-text comment convention.

  3. Capability negotiation ships as the Capability set + the VersionHandshake wire shape — the vocabulary, not the enforcement. brief 06 §1's decoupling rule ("Console, third-party consoles, and harbor dev see exactly the same data shape") needs a negotiable surface: a third-party Console built against Protocol 0.1.0 must be able to ask the Runtime "which surfaces are live?" and get a structured answer, not discover a missing surface by a 404. Phase 59 ships Capability (a fixed string enum mirroring methods.canonicalMethodsCapTaskControl is the one V1 capability, the Phase 54 surface; RFC §5.2's other five surfaces add their constant as their phase lands), Capabilities() (the deterministic sorted advertised set), and VersionHandshake (ProtocolVersion + advertised Capabilities, with CurrentHandshake() building the Runtime's handshake and Accepts(cap) reporting advertisement). Phase 59 does not ship capability enforcement — a handler that rejects a request for an un-advertised capability is the Phase 60/61 transport + auth surface's job. Phase 59 gives them the vocabulary, transport-agnostically: a Phase 60 SSE+REST adapter serves CurrentHandshake(), and the harbor version subcommand (Phase 63) renders it; Phase 59 binds to no transport and adds no CLI subcommand (the master-plan acceptance line — "version constant returned on harbor version (after phase 63)" — is an explicit forward reference, not a Phase 59 deliverable).

  4. Everything lands in the single canonical home internal/protocol/types, and the three new wire structs are registered in the Phase 58 single-source checker's lockstep map in the same PR. CLAUDE.md §8: the version is pinned in internal/protocol/types/version.go; CLAUDE.md §13: no second definition site for Protocol wire types. Version, Deprecation, and VersionHandshake are exported struct types declared in internal/protocol/types, so D-075 §4's TestSingleSource_CanonicalWireTypesInLockstep — which parses the canonical packages and asserts every exported struct type is recorded in singlesource.CanonicalWireTypes under the right home — fails until the checker's map records all three under home types. That map edit is not scope creep: it is the mandatory CLAUDE.md §17.6 "fix what the lint finds" coupling — the Phase 58 lockstep test is designed to fail when a new wire type lands without updating the checker, and Phase 59 is the first phase to exercise that coupling. There is no §4.4 driver seam (Version / Deprecation / Capability are value types + pure functions — no plausible alternate backend, the same call D-072 §5 and D-075 made for the Protocol layer's other surfaces).

§13 primitive-with-consumer. Phase 59's primitives ship with their consumers in the same PR: the Version parse/compare/compatible surface is consumed by CurrentVersion (the derived pin) and the version_test.go skew/ordering suite; the Deprecation format is consumed by Deprecations() (its single-home registry — empty, but present, so the format cannot bit-rot) and the Validate / String / round-trip tests; the Capability set + VersionHandshake shape is consumed by CurrentHandshake() + Accepts and their tests. The later consumers — a Phase 60 SSE+REST negotiation endpoint, a Phase 63 harbor version subcommand — are forward references the master plan already pins; the in-PR consumers discharge the rule. There is no §17 integration test: Phase 59 ships additive value types + pure functions inside internal/protocol/types; it wires no runtime drivers and opens no cross-subsystem seam (CLAUDE.md §17.1 exempts a phase that consumes no shipped subsystem's runtime surface — the only thing Phase 59 consumes is the build-time Phase 58 single-source checker's lockstep map, a static-source coupling proven by re-running the Phase 58 suite, not a runtime seam). There is no D-025 concurrent-reuse test: Version / Deprecation / Capability / VersionHandshake are immutable value types and the functions over them are pure with no construction-time state and no goroutines — none is a "compiled artifact" in the D-025 sense. Coverage on internal/protocol/types is 86.1% (master-plan Phase 59 target 85%).


D-078 — Protocol wire transport: SSE for the event stream + REST/JSON for the control surface, both as http.Handlers under internal/protocol/transports/{stream,control} composed by NewMux; identity-scope enforced at the edge; the listen/shutdown lifecycle deferred to the harbor dev server phase

Date: 2026-05-14 Status: Settled Where it lives: RFC §5.4 (Q-1 RESOLVED 2026-05-14 — SSE + REST) + §5.5, CLAUDE.md §3 + §8, docs/plans/phase-60-protocol-wire-transport.md, internal/protocol/transports/transports.go (NewMux + the Option knobs), internal/protocol/transports/control (NewHandler + the REST/JSON control handler + status.go's errors.Code → HTTP status table + RoutePattern), internal/protocol/transports/stream (NewHandler + the SSE event handler + frame.go's SSE framing + the identity-carrier header names + RoutePattern), internal/protocol/transports/concurrent_test.go (the D-025 N≥120 concurrent-reuse + the goroutine-leak test), test/integration/phase60_wire_transport_test.go (the both-directions wire E2E), scripts/smoke/phase-60.sh, brief 06 §1 + §6, brief 07.

Why: RFC §11 Q-1 resolved on 2026-05-14 to SSE for the event stream + REST/JSON for the control surface (owner sign-off; RFC §5.4 + §11 Q-1 updated), so Phase 60 is a normal implementation phase, not a decision gate. It binds the transport-agnostic surfaces two prior phases shipped — Phase 54's protocol.ControlSurface and Phase 05's events.EventBus — onto the wire. Four design calls warrant a durable home.

  1. The two transports are sibling sub-packages under internal/protocol/transports/, composed by NewMux — there is no driver-registry / factory ceremony. CLAUDE.md §3 pins internal/protocol/transports/{stream,control}; Phase 60 fills exactly that layout: transports/control is the REST/JSON control handler over ControlSurface.Dispatch, transports/stream is the SSE event handler over events.EventBus, and transports/transports.go's NewMux wires both onto one *http.ServeMux. RFC §5.4 explicitly leaves WebSocket as an additive alternate transport; the seam that makes it additive is the package layout itself — a third sub-package (transports/websocket) plus one more mux.Handle line in NewMux, with neither control nor stream reshaped. This is NOT a §4.4 driver seam: there is no Register / factory / blank-import dispatch-by-name, because the transport set is small, closed, and mounted in code at boot — the same posture Phase 54 took for the ControlSurface (D-072 §3). A transports/drivers/ tree would be the ceremony §4.4's "no optional-capability ceremony" clause warns against.

  2. Identity-scope enforcement sits at a single per-transport choke point — ControlSurface.Dispatch for REST, resolveIdentity for SSE — and fails closed. RFC §5.5: "the Protocol rejects any request without an identity scope." On the REST transport the handler decodes the request and hands the whole request to ControlSurface.Dispatch, which already fails closed on an incomplete triple with CodeIdentityRequired (D-072 §2) — status.go maps that to HTTP 401. The handler does NOT re-validate identity (CLAUDE.md §13 forbids a second validator). On the SSE transport the triple travels in carrier headers (X-Harbor-Tenant / X-Harbor-User / X-Harbor-Session — headers, not a query string, so the triple is not logged in access logs by default) and resolveIdentity validates it via identity.Validate before any events.Subscription is opened — a missing component is HTTP 401, fail-closed. Both choke points are deliberately single: Phase 61's JWT validation replaces the header reads / inherits the Dispatch gate without reshaping either ServeHTTP. The identity claim is trust-based until Phase 61 — exactly the posture events.Filter.Admin and types.IdentityScope.Scope hold today; events.Filter.Admin (cross-tenant fan-in) is NOT exposed on the wire in Phase 60, so the SSE stream is always triple-scoped.

  3. SSE framing + keepalive + reconnect-cursor discipline. Each events.Event is one SSE frame: an event: line (the event type), an id: line (the per-bus monotonic Sequence — this IS the reconnect cursor), and a data: line (a flat, Protocol-owned wireEvent JSON projection — never a re-export of the internal events.Event struct, per the RFC §5.1 no-1:1-internal-mapping rule). An idle stream emits a : keepalive SSE comment on an interval (default 15s, under the common 30–60s proxy idle-timeout) so intermediaries do not reap the connection. On reconnect a client echoes the last frame's id: back as the standard SSE Last-Event-ID header; parseLastEventID maps it onto an events.Cursor and, when the bus driver implements events.Replayer, the handler replays everything strictly newer than the cursor before live-tailing — so the reconnecting client does not miss the gap. When replay is unavailable (the driver does not implement Replayer, the ring is configured off, or the cursor is too old) the gap is SURFACED with an explicit : stream.replay_unavailable <reason> comment frame — never a silent nil stream that looks complete but skipped events (CLAUDE.md §5 — fail loudly, no silent degradation). The keepalive path is testable without a time.Sleep-as-synchronisation antipattern because the keepalive frame is observable on the wire: a test supplies a short WithKeepalive interval and scans for the comment frame.

  4. Phase 60 ships the transport http.Handlers, NOT the server that listens. There is no net.Listener, no graceful-shutdown lifecycle, and no /healthz in Phase 60 — that is the harbor dev subcommand's job (Phase 64, master-plan Deps: 63, 60). NewMux returns a plain *http.ServeMux a future server mounts. Keeping the listen/shutdown lifecycle out of internal/protocol/transports/ means the transports are exercised end-to-end today via httptest — the package tests and test/integration/phase60_wire_transport_test.go open real httptest.Servers over a real ControlSurface (over a real inprocess tasks.TaskRegistry) + a real in-mem events.EventBus, submit start over REST, and observe the resulting task.spawned lifecycle event arrive on the SSE stream — both directions, no mocks at any seam. Consequently scripts/smoke/phase-60.sh runs those httptest-backed tests + static layout/single-source/Console-boundary guards, and skips the live-HTTP assertions per the 404/405/501 → SKIP convention until Phase 64 lands the server — the identical posture scripts/smoke/phase-54.sh took for the transport-agnostic surface ahead of its wire binding.

§13 primitive-with-consumer — discharged in-phase. Phase 60's wire transport is itself the consumer of Phase 54's transport-agnostic ControlSurface and Phase 05's events.EventBus — it ships no new primitive, it binds two existing ones onto the wire. The obligation is discharged by test/integration/phase60_wire_transport_test.go, which exercises both directions end-to-end (SSE event stream out, REST control in) against the real runtime surface, plus the missing-identity fail-closed mode and an N≥16 full-duplex concurrency stress under -race. The D-025 concurrent-reuse test (transports/concurrent_test.go, N=120 mixed REST + SSE requests against one shared mux) + the goroutine-leak test (baseline runtime.NumGoroutine restored after every stream drains and the server closes) cover the reusable-artifact contract — the mux and both handlers are compiled artifacts, immutable after construction, with every per-request goroutine joined before its ServeHTTP returns. Coverage on the three touched packages meets the master-plan Phase 60 target (85%).

Amended (Wave 10 audit fixes — PR #91 / D-082) after the CLAUDE.md §13 "Test stubs as production defaults on operator-facing seams" rule: transports.NewMux now requires an explicit auth choice — EITHER WithValidator(v) (the production posture; Phase 61 JWT bearer auth at the edge) OR WithoutValidator() (the explicit, test-only escape hatch for the Phase 60 trust-based posture). Omitting both fails closed at boot with ErrMisconfigured. The Phase 60 surface still serves the trust-based posture verbatim — WithoutValidator opts into it explicitly — but a future Runtime that builds a mux without considering auth no longer ships an unauthenticated production surface by default. transports.WithValidator(nil) is also rejected (treated as "WithValidator not supplied") rather than being silently treated as a no-op. The Wave 10 audit's WARN-2 surfaced this drift; the audit assertion is in internal/protocol/transports/transports_test.go::TestNewMux_MissingAuthChoice_FailLoud.


D-079 — Protocol auth: a JWT validator + http.Handler middleware at the Phase 60 transport edge; asymmetric algorithms only (RS256/RS384/RS512/ES256/ES384/ES512); HS* and none rejected at the parser level via jwt.WithValidMethods; (tenant, user, session) from claims into ctx via identity.With; scope claims (admin, console:fleet) gated through auth.HasScope; new CodeAuthRejected Protocol error; every rejection audited

Date: 2026-05-15 Status: Settled Where it lives: RFC §5.5 + §4.2, CLAUDE.md §7 (rule 1) + §6 + §8 + §13, docs/plans/phase-61-protocol-auth.md, internal/protocol/auth/auth.go (the Validator interface + the jwtValidator concrete + the eight typed sentinels + the KeySet interface + Verified + the WithIssuer/WithAudience/WithClock/WithLogger/WithRedactor options + the AllowedAlgorithms slice), internal/protocol/auth/middleware.go (the Middleware decorator + extractBearer + protocolErrorFor + reasonForWire), internal/protocol/auth/scopes.go (Scope + ScopeAdmin/ScopeConsoleFleet + WithScopes/ScopesFrom/HasScope), internal/protocol/auth/security_test.go (the algorithm-confusion + alg:none + scope-escalation + kid-substitution + expired + tampered-body + audit-no-leak suite), internal/protocol/auth/concurrent_test.go (D-025 N≥128 + goroutine-leak), internal/protocol/auth/testdata/ (the documented dummy RS256 + ES256 keypairs + their README), internal/protocol/errors/errors.go (the new CodeAuthRejected constant + canonicalCodes registration), internal/protocol/transports/transports.go (the WithValidator NewMux option), internal/protocol/transports/control/status.go (the CodeAuthRejected → 401 mapping), internal/protocol/transports/control/control.go (the assertBodyMatchesAuthedIdentity defence-in-depth check + the body-backfill path), internal/protocol/transports/stream/stream.go (the ctx-first resolveIdentity preference + the ?admin=1 scope gate), test/integration/phase61_auth_test.go (the end-to-end real-keypair test exercising every rejection mode + the N≥16 concurrency stress), scripts/smoke/phase-61.sh, brief 09 §"Identity-scoped JWT enforcement on resume", brief 07 §"the runtime owns the protocol it speaks", brief 06 §"server-enforced identity".

Why: RFC §5.5 settled the authentication primitive ("JWT, asymmetric algorithms only ... the triple (tenant, user, session) is in the JWT claims; the Protocol rejects any request without an identity scope") and CLAUDE.md §7 rule 1 + §13 made the algorithm allowlist a rejection-on-sight rule. Phase 60 left the actual JWT validation as Phase 61 work, with both transport handlers wired so that "Phase 61's JWT validation replaces the header reads / inherits the Dispatch gate without reshaping either ServeHTTP" (D-078 §2). Phase 61 makes good on that. Five design calls warrant a durable home.

  1. The validator + middleware split is the single transport-agnostic shape, sitting at the Phase 60 edge — not a per-handler reinvention. internal/protocol/auth/auth.go ships the Validator interface (Validate(ctx, raw string) (Verified, error)), a transport-agnostic surface that takes a raw JWT string and returns the verified identity + scopes. middleware.go is the net/http binding: it reads Authorization: Bearer <token>, calls Validator.Validate, and on success injects identity into r.Context() via identity.With + the verified scope set via auth.WithScopes. The middleware is the ONE choke point where identity transitions from "untrusted carrier" to "verified ctx claim" — and transports.NewMux wraps BOTH the Phase 60 control + stream handlers in it via the new WithValidator option (RFC §5.5: "the Protocol rejects any request without an identity scope" — so the gate is at the mux composition, not per-handler). When WithValidator is NOT supplied, the Phase 60 trust-based posture is preserved verbatim — the option is additive and opt-in, the same posture Phase 60 took for WebSocket as an alternate transport. This keeps every Phase 60 test passing without modification.

  2. The asymmetric-algorithm allowlist is enforced at the PARSER level via jwt.WithValidMethods, not post-parse. CLAUDE.md §7 rule 1 + §13 ban HS* and alg:none rejection-on-sight; the load-bearing question is where the rejection happens. golang-jwt/jwt/v5's parser, when configured with WithValidMethods([RS256,RS384,RS512,ES256,ES384,ES512]), rejects any token whose alg header is outside the list BEFORE the keyfunc runs — so the classical algorithm-confusion CVE family (an HS256 token whose verifier-side keyfunc would otherwise hand it the RSA public key as the HMAC secret) is structurally impossible: the HS-signed token never reaches the verification step. The keyfunc itself is belt-and-braces — it re-asserts isAllowedMethod(t.Method) AND structurally rejects a non-asymmetric public key (a *rsa.PublicKey or *ecdsa.PublicKey is the only acceptable shape) — but the load-bearing gate is the parser's, and security_test.go pins the exact CVE shape (HS256 token signed with the RS256 public key as the HMAC secret) plus alg:none plus a kid-substitution attack plus a tampered-body attack. The eight typed sentinels (ErrTokenMissing / ErrTokenMalformed / ErrAlgNotAllowed / ErrSignatureInvalid / ErrTokenExpired / ErrTokenNotYetValid / ErrUnknownKey / ErrIdentityClaimMissing, plus the optional ErrAudienceMismatch / ErrIssuerMismatch) cover every rejection path; mapParserError is the deliberate translation table from golang-jwt/jwt/v5's vocabulary to ours, with an order-sensitive cascade (our keyfunc-returned sentinels first, then the WithValidMethods message, then the alg:none message, then the standard signature/malformed cases) so a sentinel a Validator caller branches on is the one we documented.

  3. The new CodeAuthRejected Protocol error code lands ONLY in internal/protocol/errors/; the Phase 58 single-source checker is the gate. CLAUDE.md §8: "Error codes live in internal/protocol/errors/errors.go. Add new codes there and only there." Phase 61 adds exactly one new code (CodeAuthRejected = "auth_rejected") — distinct from CodeIdentityRequired (which signals an absent identity scope, what RFC §5.5 calls "request without an identity scope") because a present-but-invalid JWT is a different operator-actionable failure: a client that gets identity_required needs to attach a token, a client that gets auth_rejected has one but it failed cryptographic / structural verification. The HTTP-status mapping pins both at 401 (the request is unauthenticated at the Protocol edge); the Code is what a Protocol client branches on. The middleware's protocolErrorFor is the deliberate sentinel → Code map: ErrTokenMissing and ErrIdentityClaimMissingCodeIdentityRequired; everything else → CodeAuthRejected. The singlesource.CanonicalWireTypes lockstep map needs no update — Verified, Scope, KeySet live in internal/protocol/auth, NOT in types/ or errors/, and the lockstep test only audits the canonical packages; auth-internal types are not Protocol wire types.

  4. Identity flows JWT-claim → r.Context() via identity.With; the Phase 60 handlers prefer ctx-identity but fall back to their existing carriers — the Phase 60 surface is preserved verbatim. The middleware calls identity.With(ctx, verified.Identity) on success. The SSE handler's resolveIdentity prefers identity.From(r.Context()) when present, falling back to the X-Harbor-* carrier headers (Phase 60 trust-based) when no middleware ran — every Phase 60 test still passes. The control handler adds assertBodyMatchesAuthedIdentity: when ctx carries a verified identity, the body's IdentityScope MUST match (or be empty, in which case the handler backfills from ctx — the JWT IS the source of truth). A body claiming a different (tenant, user, session) than the JWT is rejected 401 before Dispatch runs — defence in depth so a caller cannot present a valid JWT for tenant T1 while submitting a control body claiming tenant T2. This is deliberately defence-in-depth, not the primary gate: the primary gate is the JWT verification; the body-match is the second perimeter. Together they discharge RFC §5.5's "the Protocol rejects any request without an identity scope" with no escape hatch.

  5. Scopes (admin, console:fleet) are a closed canonical set; an unknown scope on a JWT is silently dropped from the verified set, NOT honoured as a privilege. auth.Scope is a typed string with two constants — ScopeAdmin and ScopeConsoleFleet. IsValidScope is the membership check; WithScopes filters the supplied slice through IsValidScope so an attacker-injected unknown scope (a token claiming "future:scope") cannot reach HasScope and trigger a privilege check we did not document. The closed set means an attacker cannot grant themselves an undocumented privilege by inventing a scope name — every privilege check is against a constant the runtime knows. The SSE handler's ?admin=1 gate is the first consumer (RFC §6.13 admin subscriptions): a request with ?admin=1 AND a verified ScopeAdmin (or ScopeConsoleFleet) gets events.Filter.Admin = true; without the scope it is rejected 403. The Phase 50 unified pause/resume primitive's OAuth callback handler (Phase 30) will be the second consumer — admin-bound OAuth flows require ScopeAdmin on the agent's tenant per brief 09 §"Admin-scope authz on agent-bound flows". ScopesFrom returns a defensive copy so a downstream caller cannot mutate the in-context slice; the bare-context check returns (nil, false) so a Phase 60 (no-middleware) request explicitly has no scopes attached.

§10 dependency posture — golang-jwt/jwt/v5 was already an indirect dependency. The library was pulled transitively by aws-sdk-go-v2/credentials (which uses it for its STS token exchange). Phase 61 promotes it to a direct dependency via go mod tidy — no new module joins the build, no new license surface, no new transitive footprint. golang-jwt/jwt is the de-facto Go JWT library (pure Go, well-maintained, ~6k stars, used by Caddy, Hashicorp Boundary, Grafana, …) and satisfies CLAUDE.md §13's "Pulling in heavy frameworks ... additions require RFC update" via a documentation-only RFC expansion (the indirect dep was already in go.sum; the surface change is the import in internal/protocol/auth/auth.go, not a new module download). The validator implementation is small (≈400 LOC including the eight sentinel mappings) and could be written against crypto/rsa + crypto/ecdsa directly if golang-jwt/jwt/v5 is later removed — but the library's WithValidMethods parser-level allowlist is the load-bearing gate against the alg-confusion CVE family, and reimplementing that in-tree is exactly the silent-degradation risk CLAUDE.md §5 cautions against.

§13 primitive-with-consumer — discharged in-phase. Phase 61's Validator and auth.Middleware are both primitives; their consumers ship in the same PR. Validator is consumed by Middleware, by the integration test's real-key end-to-end suite, and by the security suite's five attack shapes. Middleware is consumed by transports.NewMux (via the WithValidator option), by the Phase 60 handlers' ctx-first identity resolution, and by test/integration/phase61_auth_test.go. The Scope + HasScope primitives are consumed by the SSE handler's ?admin=1 gate; a future Phase 30 OAuth callback handler is the second consumer the master plan already pins (Phase 30's Deps lists Phase 61). CodeAuthRejected is consumed by the middleware (as the rejection wire code), by the transport status table (status.go's 401 mapping), and by every rejection-mode test. The D-025 concurrent-reuse test (auth/concurrent_test.go, N=128 with distinct per-goroutine identity quadruples + a goroutine-baseline assertion) covers the reusable-artifact contract — Validator is a compiled artifact, immutable after NewValidator returns, with no per-call state on the struct (every per-call value lives on the function stack / the returned Verified). Coverage on the four touched packages meets the master-plan Phase 61 target (auth 90%, errors 100%, control 89.5%, transports 94.3%, stream 86.6% — all ≥ targets).

Amended (Wave 10 audit fixes — PR #91 / D-082) per CLAUDE.md §7 rule 6 + §13:

  • WithRedactor is mandatory. NewValidator now returns a wrapped ErrMisconfigured when an audit.Redactor is not supplied (the prior in-package noopRedactor permissive stub default was a §13 "Test stubs as production defaults on operator-facing seams" violation). The noopRedactor type moved to internal/protocol/auth/testhelpers_test.go and is reachable only from _test.go files. Production callers wire audit/drivers/patterns.New() as the redactor.
  • A new auth.rejected canonical event type (internal/protocol/auth/events.go) — registered via events.RegisterEventType in the package's init(). The WithEventBus(b) validator option (optional — a nil bus preserves the prior slog-only contract) makes the audit emit ALSO publish the canonical event onto the bus, so a Console subscribing through the Protocol's canonical event channel sees auth rejections alongside every other rejection-class signal. Conformance + Wave 10 integration wiring inject the bus by default; per-package tests remain slog-only. The Wave 10 audit's WARN-3 surfaced this drift; the assertion lives in internal/protocol/auth/events_test.go::TestValidate_BusEmit_PublishesAuthRejectedEvent (real bus, real subscriber, end-to-end with the published payload's Reason asserted).

D-080 — Protocol conformance suite: a single RunSuite(t, factory) shape under internal/protocol/conformance that exhaustively exercises every Protocol method, every error code, every documented event-filter shape, the Phase 59 VersionHandshake, and the Phase 61 auth pipeline against TWO transports (in-process ControlSurface.Dispatch AND the over-the-wire Phase 60 mux under httptest.Server); matrix exhaustiveness asserted at boot so a new method/code/capability lands in the same PR as its scenario or fails the suite

Date: 2026-05-15 Status: Settled Where it lives: RFC §5 (+ §5.2 + §5.3 + §5.4 + §5.5), CLAUDE.md §8 (single-source) + §11 (conformance suites + D-025) + §13 (no-primitive-without-consumer) + §17.5 (wave-end E2E) + §17.7 (wave delivery cadence), docs/plans/phase-62-protocol-conformance.md, internal/protocol/conformance/conformance.go (the Stack + Factory + RunSuite + the method matrix + the error-code matrix + the event-filter matrix + runVersionHandshake + runAuthPipeline + runWireStatusMapping + the D-025 runConcurrentReuse scenario + every helper), internal/protocol/conformance/conformance_test.go (the package-local consumer TestProtocol_Conformance), internal/protocol/conformance/internal_test.go (the helper-surface unit tests), test/integration/wave10_test.go (the Wave 10 wave-end E2E: the suite consumed against the assembled Wave 10 surface + the unknown-kid failure mode + the N=16 concurrency stress with full identity-isolation cross-checking + the VersionHandshake contract pin), scripts/smoke/phase-62.sh, docs/glossary.md (the "Protocol conformance suite" entry), brief 07 §"the runtime owns the protocol it speaks", brief 06 §"server-enforced identity".

Why: RFC §5 settled the Protocol surface ("streaming events, task control surface, observability APIs") and CLAUDE.md §8 settled single-source discipline (methods/errors/types live in their canonical packages and nowhere else). Phases 54 / 58–61 shipped the Protocol layer's primitives in waves; Phase 62 is Wave 10's primitive-with-consumer closer for the entire Protocol layer — a single binding pass/fail definition of "the Protocol surface works at version 0.1.0" that mechanically prevents silent surface drift. Four design calls warrant a durable home.

  1. One conformance suite under internal/protocol/conformanceRunSuite(t, factory) is the consumer entry point; a Stack carries the real-driver Protocol stack a scenario reaches the runtime through. The suite IS the consumer of Phase 58 (single-source) + Phase 60 (wire transport) + Phase 61 (auth) — the master plan pins it that way. The shape mirrors the StateStore / MemoryStore / RemoteTransport / planner conformance suites already in the repo: one package, one RunSuite, one Factory seam per consumer profile. A future Protocol transport (WebSocket, stdio) consumes the same suite via the same Factory — no second conformance implementation. The Stack is a D-025 compiled artifact: every field is set once at construction (the surface, the bus, the steering registry, the task registry, the mux, the per-instance token-minting closures, the cleanup), nothing is mutated after; runConcurrentReuse runs N=100 mixed-method invocations against ONE shared Stack under -race with distinct per-goroutine identity quadruples and the goroutine-baseline assertion. The default factory (NewDefaultFactory(testdataRoot)) wires real drivers everywhere on the seam (real tasks.TaskRegistry inprocess + real events.EventBus inmem + real state.StateStore inmem + real protocol.ControlSurface + real protocol/auth.Validator over the real ES256 keypair from internal/protocol/auth/testdata/ + real Phase 60 transports.NewMux) — no mocks at the boundary, exactly the §17.3 + §11 conformance discipline.

  2. The matrix asserts exhaustiveness at boot — a new method/code/capability landing without a corresponding scenario fails the suite at the top of RunSuite. assertMethodMatrixExhaustive walks methods.Methods() and pins the ten canonical methods (the Phase 54 set); a new canonical method that lands without a wantSet entry fails the suite. assertErrorCodeMatrixExhaustive does the same for the eight errorCodeMatrix entries (the Phase 54 + Phase 61 set); a new code that lands without a matrix entry fails the suite. runVersionHandshake pins types.Capabilities() returning exactly {task_control} at 0.1.0 and the deprecation registry being empty — a new capability or a new deprecation surface as a conformance failure, not a silent surface drift. The suite runs the same scenario bodies against TWO transports (the in-process ControlSurface.Dispatch and the over-the-wire mux under httptest.Server) so a conformance pass means the surface is consistent across the two consumer profiles a Console reaches the Runtime through — the same consistency property the §13 primitive-with-consumer rule exists to guarantee.

  3. The error-code matrix asserts each canonical code's wire-status mapping AND surfaces every code with at least one failure scenario. expectedHTTPStatus mirrors the mapping in internal/protocol/transports/control/status.go; runWireStatusMapping walks the matrix and asserts every code has a 4xx/5xx entry (a 2xx with an error body would be a silent-degradation shape banned by §13). Each canonical code is exercised by at least one named scenario: CodeInvalidRequest_NilStartBody, CodeIdentityRequired_MissingTriple, CodeScopeMismatch_PrioritizeWithSessionUser (PRIORITIZE's RFC §6.3 admin minimum vs. session_user), CodePayloadInvalid_OversizeString (the §6.3 4096-rune cap exceeded), CodeUnknownMethod_NonCanonicalName, CodeNotFound_GhostRunInbox (a steering control for a run with no live inbox), CodeAuthRejected_HS256Token_AlgConfusion (the classical alg-confusion attack — an HS256-signed token rejected at the parser). CodeRuntimeError is the catch-all for unclassified runtime-side failures; every production path lands on a typed sentinel, so the test pins the constant's presence in the canonical set + the 500 mapping in runWireStatusMapping rather than fabricating a coerced runtime fault (which would require a fault-injecting factory — out of scope for V1 + acceptable per §15 "smallest change that solves the problem").

  4. The Wave 10 wave-end E2E (test/integration/wave10_test.go) consumes the conformance suite from a different consumer profile against the assembled real-driver Wave 10 surface — per §17.5 step 5 of the wave delivery cadence. TestE2E_Wave10_Conformance_AgainstAssembledSurface calls conformance.RunSuite(t, conformance.NewDefaultFactory(wave10TestdataRoot(t))) — same suite, different cwd, real testdata path resolution. TestE2E_Wave10_VersionHandshake_ContractStable pins the negotiation contract end-to-end (types.ProtocolVersion == "0.1.0", types.Capabilities() returning {task_control}, deprecation registry empty). TestE2E_Wave10_FailureMode_UnknownKidTokenRejected is the §17.3 #3 failure-mode coverage — a JWT with an unknown kid is rejected at the auth edge with CodeAuthRejected before the runtime is reached, AND tasks.List under the rejected identity is empty (defence-in-depth assertion that the middleware short-circuits before Dispatch). TestE2E_Wave10_Concurrency_NoCrossTalk runs N=16 distinct identity stacks against one shared mux + asserts identity isolation by reading every spawned task back via tasks.Get and checking the tenant matches the originating goroutine's triple + checks the goroutine baseline is restored on teardown. Two consumer profiles (in-package + integration) is deliberate: a regression that affects only one profile's wiring (e.g. testdata-path resolution) surfaces here, not in a follow-up.

§4.3 deviation — realised statement coverage 81.2% vs. the master-plan 85% target. Matches the precedent set by internal/planner/conformance (Phase 49) which shipped at 70.8% under the same 85% target. Conformance suites are dominated by t.Fatalf rollback branches that fire only on assertion failure — branches that are correct production code but cannot be exercised by a passing test. Tooling-side attempts to lift the number further (helper consolidation via a deferred-rollback pattern in buildDefaultStack, branch combining in helper bodies) bumped the number from an initial 79.5% to 81.2%; the remaining gap is the irreducible floor of "assertion-rich test code." The assertion density — 10 methods × 2 transports × {happy, malformed}; 8 error codes × ≥1 failure path; every event-filter shape; the version handshake; the auth pipeline; the wire-status mapping; an N=100 D-025 stress — is the load-bearing surface, not the percentage. Documented per §4.3 + §15.

§13 primitive-with-consumer — closed for the entire Protocol layer. Phase 62 IS the consumer for Phase 54's ControlSurface + Phase 58's single-source enforcement + Phase 59's VersionHandshake + Phase 60's transports.NewMux + Phase 61's auth.Validator and auth.Middleware simultaneously. The conformance suite exercises all of these end-to-end against real drivers; the Wave 10 wave-end E2E consumes the same suite from a second profile. A future Protocol-surface phase (state snapshots, topology, artifacts, traces, metrics) extends the suite rather than adding a parallel surface to validate — the matrix-exhaustiveness check at the top of RunSuite is the trip-wire.

Amended (Wave 10 audit fixes — PR #91 / D-082) per the Wave 10 audit's WARN-7: the conformance matrix gained a TracePropagation scenario (runTracePropagation) that exercises the Phase 55 W3C TraceContext carriers end-to-end against the assembled Protocol stack — an inbound traceparent header rides a real REST start request through to a real task.spawned event, and tr.SpanFromEvent on the receiver side produces a span sharing the inbound TraceID. The matrix exhaustiveness side now derives the canonical error-code set from protoerrors.Codes() (the new accessor added in PR #91) rather than a hardcoded count, so a new code landing without a matrix entry surfaces by NAME. The event-filter matrix gained RunScoped_StreamOpens exercising the new X-Harbor-Run carrier header (WARN-5). The Wave 10 audit's WARN-4/-5/-7 surfaced these gaps; the assertions live in internal/protocol/conformance/conformance.go::runTracePropagation, runEventFilterMatrix/RunScoped_StreamOpens, and assertErrorCodeMatrixExhaustive (rewritten to use protoerrors.Codes()).


D-081 — Governance config consolidation: remove the pre-Phase-36a default_max_tokens / cost_ceiling_usd / rate_limit_tps knobs from GovernanceConfig; the loader emits a structured config.deprecated_field slog warning when the legacy YAML keys appear and drops the value; all enforcement flows through IdentityTiers

Date: 2026-05-15 Status: Settled Where it lives: CLAUDE.md §10 + §13 (the "Test stubs as production defaults on operator-facing seams" entry), internal/config/config.go (the GovernanceConfig field set — DefaultMaxTokens / CostCeilingUSD / RateLimitTPS removed, RepairAttempts / DefaultTier / IdentityTiers retained), internal/config/deprecations.go (the stripDeprecatedGovernanceKeys YAML AST pre-processor + the deprecatedGovernanceKeys closed set + the config.deprecated_field warning emitter + the deprecatedFieldReplacement / deprecatedFieldRemovedIn constants), internal/config/loader.go (LoadOption + WithLogger + Load / LoadFromBytes opts vararg + the strip-before-strict-decode wiring in loadFromBytesNamed), internal/config/validate.go::validateGovernance (the default_max_tokens / cost_ceiling_usd / rate_limit_tps fieldError calls removed), internal/config/deprecations_test.go (the per-field warning tests + the all-three-at-once test + the negative test + the identity_tiers-wins test + the llm.model_profiles[...].default_max_tokens-not-stripped test), examples/harbor.yaml (the governance block's legacy lines replaced with a migration pointer to identity_tiers), internal/config/testdata/{valid_minimal,invalid_enum,invalid_missing_required}.yaml (the governance.default_max_tokens line dropped from each fixture).

Why: The three legacy fields were validated-but-ignored stubs preserved (per the godoc comment they carried) "so existing yaml files don't break." The governance enforcement engine that landed in Phase 36a/36b reads exclusively from IdentityTiers — it has never consumed default_max_tokens / cost_ceiling_usd / rate_limit_tps from the GovernanceConfig block. An operator setting cost_ceiling_usd: 100 in YAML saw the value pass validation, saw it land in the in-memory *Config, and then saw silent no-op behaviour at runtime: no event, no warning, no enforcement. This is the same confusion trap CLAUDE.md §13's "Test stubs as production defaults on operator-facing seams" entry closes one layer up — there the concern was a stub LLM that the binary defaulted to, here the concern is a YAML knob that promises behaviour the runtime does not deliver. Validated-but-ignored fields on an operator-facing seam are an even purer expression of the failure mode: the validator's success message tells the operator the value is being consumed.

Two design calls and one explicit non-choice warrant a durable home.

  1. The three keys are removed from the Go struct AND from the strict-decode path; appearance in YAML is a deprecation warning, not a validation error. Keeping the fields on the struct with yaml:"-" would still be a confusion trap — an operator could set them in YAML, see the warning, AND see the value end up on cfg.Governance.X via env override or a hand-built test config; the post-decode consumer would still ignore them. Removing the fields is the only shape that makes the no-op behaviour structurally impossible. The strict decode rejects unknown keys, so the loader's deprecation path is a YAML AST pre-processor (internal/config/deprecations.go::stripDeprecatedGovernanceKeys): parse the byte stream with goccy/go-yaml/parser.ParseBytes, walk the top-level governance: mapping, drop each child whose key matches the closed deprecatedGovernanceKeys set, emit one slog.Warn("config.deprecated_field", field=..., replacement="governance.identity_tiers", removed_in="v0.x", source=...) per stripped key, and re-marshal the cleaned AST for the strict decode that follows. The AST walker is targeted by position (the top-level governance: block only), not by key name globally — a real llm.model_profiles[<name>].default_max_tokens field (a Phase 36b knob on ModelProfile, a different struct) is untouched. A pre-existing config that sets the legacy keys still loads cleanly today; an operator's logs surface one warning per legacy key per load (the strip path emits at the AST walk, not at every subsequent validator pass).

  2. Migration is via IdentityTiers, not via a "default tier" auto-promotion. The prompt for this work considered and explicitly rejected the alternative of consuming the three legacy fields as a synthesised default tier under IdentityTiers so a pre-Phase-36a config kept working without operator intervention. The reason is the same one §13 forbids parallel implementations of the same conceptual feature: a second config surface ("knobs that map onto a default tier") would inevitably diverge from the primary one (identity_tiers.<name>.{budget_ceiling_usd, max_tokens, rate_limit.{capacity, refill_tokens, refill_interval}}), and the divergence would re-introduce the confusion the removal is closing. The example yaml's governance block now shows the exact migration: build a default tier under IdentityTiers, place the equivalent values under max_tokens (was default_max_tokens) and budget_ceiling_usd (was cost_ceiling_usd), and convert the rate-limit-per-second knob into the token-bucket shape (capacity + refill_tokens + refill_interval). The token-bucket model is strictly more expressive than rate_limit_tps; there is no equivalence for a partial migration, which is the right call because the post-Phase-36b enforcement engine has no "TPS" notion at all — it has a per-(identity, model) token bucket.

  3. No suppression flag, no env-var escape hatch, no per-deployment opt-out for the warning. Operators wanting to silence the warning remove the field from their YAML — which is the point. A --no-deprecation-warnings CLI flag (or HARBOR_DEPRECATIONS_QUIET=1) would defeat the migration signal, and the §13 forbidden-practice entry on "identity-downgrading knobs" pins the general posture: capabilities are mandatory, and so are the warnings that surface a capability gap.

§14 pre-merge checklist — config-schema change posture. internal/config/config.go's GovernanceConfig field set changed, so the pre-merge checklist's "config schema changed" gate fires. Backward compatibility for pre-Phase-36a YAML is preserved via the strip-then-warn path: an existing operator YAML with governance.default_max_tokens: 4096 loads cleanly, the legacy value is dropped, the operator sees one warning per legacy key in their logs, and the rest of the config is honoured. Forward compatibility is unchanged: the IdentityTiers shape that landed in Phase 36a/36b is untouched, and the default_tier cross-check still requires the named tier to exist in the map. The example config (examples/harbor.yaml) is updated in this PR per §10's "Example configs in examples/ updated whenever the schema gains a top-level field" rule — applied here to a removal, which is the same surface change in reverse.

§13 primitive-with-consumer — discharged in-PR. This PR ships no new primitive; it removes three operator-facing fields and adds one in-loader pre-processor. The "consumer" of the pre-processor is the strict YAML decoder that runs immediately afterwards, exercised by internal/config/deprecations_test.go (per-field, all-three-at-once, none-present, identity_tiers-still-wins, and the cross-section "real model_profiles.default_max_tokens is NOT stripped" guard) and by every pre-existing config test that round-trips a YAML with no legacy keys present (the no-op path). The WithLogger option is the testable surface — callers capture warnings via an in-memory JSON handler, asserting the field / replacement / removed_in / source attrs match the documented shape exactly, so a future drift in the warning text fails the test. No new compiled artifact lands; no D-025 concurrent-reuse test is required (stripDeprecatedGovernanceKeys is a pure function over its inputs with no construction-time state). Coverage on internal/config is unaffected — the test suite is strictly enlarged.


D-082 — Wave 10 audit fixes: Phase 57 durable + Phase 61 WithValidator + Phase 61 noopRedactor flipped to fail-loud per the §13 amendment (PR #91); follow-up filed for Phase 55 carrier wiring

Date: 2026-05-15 Status: Settled Where it lives: CLAUDE.md §13 (the "Test stubs as production defaults on operator-facing seams" entry), internal/protocol/auth/auth.go (the WithRedactor mandatoriness + the WithEventBus optional bus injection + the removal of the in-package noopRedactor permissive stub), internal/protocol/auth/events.go (the canonical auth.rejected events.EventType + the AuthRejectedPayload shape + the events.RegisterEventType(EventTypeAuthRejected) init), internal/protocol/auth/testhelpers_test.go (the test-only testNoopRedactor + the withTestRedactor() helper — replaces the production stub), internal/protocol/auth/events_test.go (the end-to-end auth.rejected bus-emit assertion), internal/protocol/transports/transports.go (the WithValidator + new WithoutValidator() option + the mandatory auth-choice fail-loud in NewMux), internal/events/drivers/durable/durable.go (the registry-path factory fails loud when cfg.StateDriver == "" + the optWithOwnedStore() rename for clarity), internal/protocol/errors/errors.go (the new Codes() accessor mirroring methods.Methods()), internal/protocol/conformance/conformance.go (the exported FixedNow, the runTracePropagation scenario, the RunScoped_StreamOpens event-filter case, the protoerrors.Codes()-derived assertErrorCodeMatrixExhaustive rewrite, the WithEventBus wiring on the conformance Validator), internal/protocol/singlesource/singlesource_test.go (the lockstep test rewritten to surface "missing in checker map" / "extra in checker map" by NAME), internal/protocol/transports/stream/stream.go (the HeaderRun X-Harbor-Run carrier + the events.Filter.Run wiring), internal/events/events.go (the Filter.Run field + the Matches predicate update), internal/telemetry/metrics.go (the production BridgeBusToMetrics helper extracted from the test-only drainToMetrics + ErrBridgeMisconfigured), internal/telemetry/cardinalitylint/cardinalitylint.go (the PromGatherer scope-note expansion), test/integration/wave10_test.go + test/integration/phase60_wire_transport_test.go + test/integration/phase56_metrics_test.go + test/integration/phase61_auth_test.go (the consumer-side wiring updates that pass WithRedactor + WithEventBus / WithoutValidator / BridgeBusToMetrics + the conformance.FixedNow alias), README.md (Phase 61 / Phase 62 row order corrected), docs/decisions.md (the D-074 / D-078 / D-079 / D-076 / D-080 in-place amendments + this entry), docs/plans/phase-60-protocol-wire-transport.md (the X-Harbor-Run post-PR amendment note), docs/plans/phase-55-otel-traces.md (the DEFER-1 follow-up reference), docs/plans/README.md (the Phase 64 pre-plan note's "First production consumer of Phase 55's W3C carriers" paragraph + the issue #94 reference), GitHub issue #94 (the Phase 55 carrier-into-transport wiring follow-up).

Why: The Wave 10 checkpoint audit (§17.5) surfaced eight load-bearing issues — two FAIL, seven WARN, five NIT — across Phases 56–62. Three of the WARN items (WARN-1, WARN-2, FAIL-2 as it was reclassified) are the same shape one layer down: an operator-facing seam landed with a permissive default that silently degrades to a non-production posture (the durable bus's auto-degrade-to-ring on empty StateDriver; NewMux accepting no validator and producing an unauthenticated mux; NewValidator defaulting to a noopRedactor stub when WithRedactor was omitted). All three match the §13 "Test stubs as production defaults on operator-facing seams" entry (introduced in PR #91 as the §13 amendment): the trip-wire wasn't there when the phases originally shipped, so the §13 amendment is RETROACTIVELY APPLIED to those three seams in the same audit-fix PR. The remaining items (WARN-3 → WARN-7, NIT-1 → NIT-5) close adjacent drift the audit caught: an auth rejection that emitted only to slog instead of the canonical event bus; a conformance matrix exhaustiveness check hardcoding len(matrix) != 8; an SSE transport with no X-Harbor-Run carrier; a test-only events→metrics bridge that the Phase 64 server bootstrap would otherwise reinvent; a conformance suite that didn't observe trace propagation end-to-end; a README phase-order glitch; a name disambiguation; and two minor cleanups.

Three durable design calls warrant being recorded here so they don't get re-litigated.

  1. The §13 amendment retroactively applies to operator-facing seams shipped before its introduction. This audit IS the worked example. Before PR #91 amended §13, the audited seams (Phase 57's durable factory, Phase 61's noopRedactor default, Phase 60+61's optional-WithValidator) were drift-free by the rule-set in force at their shipping. The §13 amendment changed the rule-set. The retrospective fix is mandatory — §17.6 ("Fix what the integration test finds — no matter where the bug lives") applies the same way to a wave-end audit: when the audit surfaces a drift against a NEW rule, fix the OLD seam in the audit PR rather than filing follow-ups. The amended D-074 / D-078 / D-079 entries record the "amended after the §13 amendment" lineage so a future reader sees the rule's history.

  2. Events the runtime emits for auth-edge rejections use a sentinel identity triple (harbor-auth/auth-edge/auth-edge). Auth rejections happen BEFORE identity verification — there is no verified (tenant, user, session) to publish under. events.ValidateEvent requires the full triple. The choice considered was (a) skip the bus emit on missing identity (silent degradation — §13 violation), (b) use the request-claimed identity (echoes unverified claims back, lets an attacker confirm valid triples), or (c) a documented sentinel triple. (c) is the only shape that publishes loudly without leaking the unverified claim or skipping the emit; a Console subscribes via Admin filter (or directly by tenant harbor-auth) to surface auth.rejected events. The values are constants in internal/protocol/auth/auth.go::authEdgeIdentity so no operator data is mistaken for the sentinel.

  3. The carrier-into-transport wiring (Phase 55 follow-up) is RIGHTLY deferred to Phase 64 + sibling phases, not folded into this audit PR. DEFER-1 surfaced that Phase 55's standalone InjectHTTP/ExtractHTTP/InjectMeta/etc. helpers have no production consumer in the repo today — the Tracer's SpanFromEvent IS a consumer of the Tracer primitive, and the integration test exercises the carriers via unit-test-shaped round-trips, but the actual wiring into tools/drivers/{http,mcp,a2a} is the next-wave concern. Folding that into a checkpoint-audit PR would conflate two concerns (audit hygiene vs. a multi-phase wiring change) and would block the audit on a deeper refactor. The follow-up issue #94 is the canonical tracking surface; Phase 64's pre-plan note now names the issue explicitly so the plan author wires the carriers in the same PR as the server bootstrap.

§13 primitive-with-consumer — closed in-PR. No new primitive lands. The auth.rejected event type is the only new canonical surface; its first consumer is the same PR's internal/protocol/auth/events_test.go::TestValidate_BusEmit_PublishesAuthRejectedEvent (end-to-end real-bus assertion) plus the conformance suite's WithEventBus(bus) wiring on the default Validator factory. The BridgeBusToMetrics helper is also new but its production-side first consumer is the existing integration test (drainToMetrics now consumes the helper), discharging the §13 primitive-with-consumer obligation in-PR. The WithoutValidator() option is the explicit, grepable test-only escape hatch — its consumers are the Phase 60 + audit-rewritten transports + wave10 + phase60_wire_transport tests, all named in the "Where it lives" section above. Coverage on every touched package remains ≥ the master-plan target — the audit fixes are strictly additive (new tests, new helpers) and remove no existing assertions.


D-083 — Tool-side OAuth subsystem: TokenStore as a typed wrapper over state.StateStore (D-067 / D-068 precedent), AES-256-GCM encryption at rest with mandatory KEK, single OAuthProvider covering both binding scopes, ErrAuthRequired typed sentinel converges on the Phase 50 Coordinator

Date: 2026-05-15 Status: Settled Where it lives: docs/plans/phase-30-tool-oauth.md (the per-phase plan + the §4.3 deviation language), docs/plans/README.md (the Phase 30 row flipped Pending → Shipped + the detail block's "§4.3 deviation (shipped)" paragraph), docs/glossary.md (the seven new entries: auth.OAuthProvider, auth.TokenStore, auth.BindingScope, auth.ErrAuthRequired, PKCE, RFC 7591 dynamic client registration, tool.auth_required, tool.auth_completed), internal/tools/auth/auth.go (the OAuthConfig + Token + ErrAuthRequired + TokenStore + OAuthProvider interfaces), internal/tools/auth/sealer.go (the AES-256-GCM Sealer + EnvelopeVersion + KEKSizeBytes constants), internal/tools/auth/tokenstore.go (the stateStoreTokenStore typed wrapper + the tools.auth.access.<scope>.<subject>.<source> / tools.auth.refresh.<scope>.<subject>.<source> Kind shapes), internal/tools/auth/pkce.go (RFC 7636 verifier + S256 challenge), internal/tools/auth/provider.go (the concrete *Provider: Token / InitiateFlow / CompleteFlow / Revoke / Close + per-(scope, subject, source) single-flight refresh + .well-known/oauth-authorization-server discovery + RFC 7591 dynamic registration), internal/tools/auth/events.go (the tool.auth_required / tool.auth_completed events.EventType + ToolAuthRequiredPayload / ToolAuthCompletedPayload SafePayload shapes + the init() registrations), internal/tools/auth/conformancetest/conformancetest.go (the shared Run(t, factory) cross-driver suite: Put/Get round-trip both scopes, cross-tenant / cross-user / cross-agent isolation, mixed-scope coexistence, encryption-at-rest, delete idempotency, missing-identity fail-loud, tamper-rejection), internal/tools/auth/conformance_test.go (the in-mem leg call site), internal/tools/auth/concurrent_test.go (the D-025 N=128 concurrent-reuse test + the single-flight-refresh storm test), internal/tools/auth/{sealer,tokenstore,provider}_test.go (the unit-test coverage), test/integration/phase30_tool_oauth_test.go (the §17 wave-end integration test: cross-driver E2E for both binding scopes against in-mem + SQLite + Postgres, real httptest.Server authorization server emulating PKCE + RFC 7591 dynamic registration + metadata discovery, A2A AUTH_REQUIRED shape-parity, initiate-then-cancel goroutine-leak, cross-identity CompleteFlow failure mode, N=16 concurrency stress), scripts/smoke/phase-30.sh (the smoke counter), README.md (the Status table row Phase 30 → Shipped).

Why: The Phase 30 master-plan line — "TokenStore interface (InMem + SQLite + Postgres drivers) with encryption-at-rest" — taken at face value implies the standard §4.4 driver-registry shape (three init() blank-imports, a tokenstore.Open factory, a registry of named drivers). Brief 09's design sketch followed that shape verbatim. Three durable calls warrant being recorded here so they don't get re-litigated.

  1. TokenStore is a typed wrapper over state.StateStore, not a fresh §4.4 driver registry. The Phase 50 (D-067) and Phase 53a (D-068) decisions already settled this precedent for the runtime's persistence-shaped subsystems: when a subsystem's persistence needs are satisfied by (Quadruple, Kind, Bytes) slots, the state.StateStore §4.4 seam (D-027) is the §9 persistence triad — there is no need for a parallel driver registry. A second registry would be the §13 two-parallel-implementations smell ("a tokenstore driver registry AND a state-store driver registry, both saying 'three V1 drivers, three init() blank-imports'"). The shape Phase 30 ships:

    • One concrete *stateStoreTokenStore consumes whatever state.StateStore the binary opened at boot. NewTokenStore(store state.StateStore, sealer Sealer) (TokenStore, error) — no Open / Register ceremony.
    • The composite-key encoding lives in the Kind suffix: tools.auth.access.<scope>.<subject_id>.<source> for the access-token record, tools.auth.refresh.<scope>.<subject_id>.<source> for the refresh-token sibling. subject_id is user_id for ScopeUser, agent_id for ScopeAgent. The StateStore's (Quadruple, Kind) key gives identity-scoped isolation automatically; no Phase-30-specific WHERE clause math.
    • Refresh tokens encrypt under a separate Kind so a (post-V1) caller that reads only the access-token TTL does not pay the refresh-decode cost — and so a compromise of the access-token cache does not yield refresh capability (brief 09 §"Encryption at rest").
    • Driver pluralism (in-mem / SQLite / Postgres) is inherited from the state.StateStore triad; the Phase 30 conformance suite (internal/tools/auth/conformancetest) runs the same TokenStore assertions against every StateStore driver to prove parity. The cross-driver wave-end integration test (test/integration/phase30_tool_oauth_test.go) drives the suite three times — once per V1 StateStore driver, with the Postgres leg skipping per the existing HARBOR_PG_DSN convention. This is the precedent set by test/integration/phase50_durability_test.go (Phase 50's pause checkpoints) and test/integration/agent_registry_test.go (Phase 53a's registration records).
    • The §4.3 deviation language is recorded in the per-phase plan's "Findings I'm departing from" section AND in docs/plans/README.md's detail block.
  2. AES-256-GCM encryption at rest with a 4-byte version + 12-byte fresh-nonce envelope; KEK is mandatory at construction. The master-plan acceptance criterion is "token material is encrypted at rest (driver conformance asserts ciphertext on disk)." The shape:

    • Envelope: [4-byte BE version][12-byte fresh nonce][AES-GCM ciphertext + 16-byte tag]. EnvelopeVersion = 1. The version header is mandatory so a post-V1 KEK-rotation driver can decrypt legacy records before re-encrypting under a new KEK.
    • KEK is 32 raw bytes (AES-256). NewAESGCMSealer(kek []byte) returns wrapped ErrKEKMissing on a wrong-length input — a missing / empty / wrong-length KEK fails the boot loud per CLAUDE.md §13's "Test stubs as production defaults on operator-facing seams" amendment (PR #91 / D-082). There is no degraded-no-encryption mode.
    • NewTokenStore(store, sealer) also rejects nil store / nil sealer at construction. Encryption-at-rest is structurally mandatory; an operator wanting no encryption configures one (we do not ship a NoOpSealer).
    • The conformance suite (runEncryptionAtRest) plants a known marker string in the access token, calls Put, then peeks at the raw StateStore.Bytes and asserts the marker does NOT appear. The same assertion runs against in-mem + SQLite + Postgres; the SQLite leg implicitly proves ciphertext-on-disk because the raw bytes the conformance suite reads are the same bytes the SQLite driver stores.
    • The fresh-nonce-per-Seal invariant is pinned by TestSealer_FreshNoncePerCall: two Seals of identical plaintext produce different ciphertext. Nonce reuse with AES-GCM is catastrophic; the test is the trip-wire.
    • Tampered-ciphertext + wrong-KEK + bad-version + too-short blob all surface ErrTokenCipherCorrupt — never a half-decoded record. auth.IsCipherCorrupt(err) is the call-site convenience.
  3. Single OAuthProvider covers both binding scopes; BindingScope is a declared config field, never inferred. Brief 09 §"What Harbor must add" item 2 is explicit: bifrost models user vs server as two separate halves of one interface; Harbor's OAuthConfig.BindingScope is the single discriminator that drives lookup keying, pause-record targeting, and Console UX. The shape:

    • OAuthConfig.BindingScope is ScopeUser | ScopeAgent. Required at construction; an invalid value rejects the config via Validate.
    • For ScopeAgent, OAuthConfig.AgentID is also required (must match a registered agent in Phase 53a's registry). The check fails closed at config-validation time.
    • Agent-bound tokens key on (tenant, agent_id, source); user-bound on (tenant, user_id, source). Per CLAUDE.md §6 + D-059, agent_id is NOT an isolation principal; the isolation tuple stays (tenant, user, session). The composite-key suffix includes both BindingScope AND subject_id so a user-bound and agent-bound token for the same source coexist (the master-plan acceptance criterion the conformance suite's runMixedScope pins).
    • The *ErrAuthRequired typed sentinel carries BindingScope verbatim; the tool.auth_required event's payload exposes it as a string. The Console branches on the field to render either a user-facing prompt (ScopeUser) or an admin-targeted banner (ScopeAgent).
    • ScopeAgent flows (InitiateFlow / CompleteFlow / Revoke) require registry.HasControlScope(ctx) — Phase 53a's existing control-scope claim primitive. Token (the read path) does not require the claim: it returns *ErrAuthRequired to whomever asked, and the BindingScope on the error tells the Console which principal to prompt. The admin-scope authz gate fails closed (wrapped ErrAdminScopeRequired) with no opt-out.
    • The single-flight refresh gate keys on (scope, subject_id, source) per brief 09's mitigation for "concurrent refresh storm on agent-bound tokens shared across N sessions." The TestProvider_ConcurrentReuse_RefreshSingleFlight test asserts ≤ 4 /token round-trips for N=32 concurrent callers — a refresh storm would produce N round-trips.

§13 primitive-with-consumer — discharged in-PR. Phase 30 ships three new primitives:

  • The OAuthProvider interface + *Provider concrete. First consumer: test/integration/phase30_tool_oauth_test.go::TestE2E_Phase30_FullPauseResumeCycle_BothBindingScopes exercises the full pause/resume cycle end-to-end against real Phase 50 Coordinator + real audit Redactor + real events EventBus + an httptest.Server authorization server emulating PKCE + RFC 7591 + discovery. Both binding scopes covered.
  • The TokenStore interface + stateStoreTokenStore concrete. First consumer: the OAuthProvider itself (every Token / CompleteFlow / Revoke call routes through the store). The conformance suite at internal/tools/auth/conformancetest is the cross-driver consumer.
  • The tool.auth_required + tool.auth_completed event types. First consumers: provider.go::emitEvent (the producer) + provider_test.go::TestProvider_CompleteFlow_Emits_ToolAuthCompletedEvent (subscribes to the bus and asserts the payload shape end-to-end through real inmem.New + real patterns.New() redactor).

The §13 obligation is closed in-PR for every new shape; no consumer is deferred.

§17 integration test in same PR. test/integration/phase30_tool_oauth_test.go wires real drivers across every seam (CLAUDE.md §17.3 #1): real state.StateStore (in-mem + SQLite + Postgres), real audit.Redactor (the patterns driver), real events.EventBus (the inmem driver), real pauseresume.Coordinator, real httptest.Server authorization server. Identity propagation asserted via runOAuthCycle's tenant/user/agent checks (CLAUDE.md §17.3 #2). Failure mode covered: cross-identity CompleteFlowErrStateMismatch (CLAUDE.md §17.3 #3). Concurrency stress at N=16 distinct identity stacks (CLAUDE.md §17.3 final). The Postgres leg skips with reason when HARBOR_PG_DSN is unset — the standard Phase 16 / Phase 50 / Phase 53a convention.

Coverage on internal/tools/auth: go test -cover reports above the master-plan 85% target on the touched package — the test surface is dense (per-method happy path + per-method failure mode + cross-scope conformance + cross-driver conformance + D-025 + single-flight refresh + goroutine-leak).


D-084 — Harbor CLI skeleton: cobra-rooted binary registering the seven RFC §8 subcommands; only harbor version fully implemented; six stubs exit non-zero with structured CLIError{Code: "not_implemented", Hint} pointing to their implementing phase; CLI structured-error type single-sourced in cmd/harbor/errors.go (NOT under internal/protocol/errors); global --quiet / --json global flags; cobra promoted to direct go.mod dep; preflight tolerates the §13-mandated non-zero stub exit

Date: 2026-05-15 Status: Settled Where it lives: cmd/harbor/main.go (the cobra-rooted entry point — the driver blank-import block is preserved; NewRootCmd().Execute() replaces the old empty main), cmd/harbor/root.go (NewRootCmd + global flag wiring + the emitCLIError hook every stub body calls + HarborVersion = "v0.0.0-dev" pin), cmd/harbor/errors.go (the CLIError struct with the pinned JSON tags {"error","code","hint"} + the single sink PrintCLIError(w, jsonMode, err) + CodeNotImplemented constant), cmd/harbor/cmd_version.go (the only fully-implemented subcommand — assembles versionInfo{Harbor, Protocol, BuildHash} from HarborVersion + types.ProtocolVersion + runtime/debug.ReadBuildInfo's vcs.revision setting with "unknown" sentinel on absence), cmd/harbor/cmd_{dev,scaffold,validate,inspect_events,inspect_runs,inspect_topology}.go (the six stub subcommands), cmd/harbor/{errors_test,root_test,cmd_version_test,cmd_stub_test}.go (the test surface — CLIError shape, root golden + global-flag inheritance, version human + JSON, stub structured-error shape both modes), cmd/harbor/testdata/golden/help.txt (the harbor --help golden — regenerable via go test -update), go.mod (cobra promoted from indirect to direct: github.com/spf13/cobra v1.10.1; brings in indirect pflag v1.0.9 + mousetrap v1.1.0 — both already present as indirect via Bifrost), scripts/preflight.sh (the boot-detection block amended to recognise a structured "code":"not_implemented" stderr from a Phase 63+ bin/harbor dev and treat it as the stub posture, exactly as the existing clean-exit-zero branch already does), scripts/smoke/phase-63.sh (the new smoke — cmd/harbor tests under -race, harbor --help golden match, harbor version human shape + --json shape + protocol-version pin, every stub subcommand's non-zero exit + structured code: not_implemented + phase-hint regex, the direct-cobra go.mod guard, the no-internal/protocol/errors-import guard), docs/plans/phase-63-cli-skeleton.md (the Phase 63 plan), docs/plans/README.md (Phase 63 row PendingShipped), README.md ("Harbor CLI" prose updated; Status table Phase 63 row added), docs/glossary.md (the CLIError / Golden file (CLI) / Stub subcommand terms), this decisions entry.

Why: RFC §8 settles the seven Harbor CLI subcommands (dev / scaffold / validate / inspect-events / inspect-runs / inspect-topology / version) and pins cobra as the CLI library (RFC §10 stack table). The master plan splits the work across phases — Phase 63 is the skeleton (cobra root + the seven subcommand registrations + global flag conventions + the structured-error vocabulary + golden tests + the only fully-working subcommand, version), Phase 64 populates dev, Phases 65–70 the rest. The split exists because a single phase that shipped both the skeleton AND a working dev (which itself wires the LLM, the Phase 60 transports onto a listener, identity injection, hot-reload, draft saving) would be unbounded; brief 06 §7 #8 explicitly sizes the skeleton at ~1 phase.

Five durable design calls warrant being recorded so they don't get re-litigated.

  1. The CLI's structured-error type is cmd/harbor.CLIError, NOT a new protocol/errors.Code. Two different surfaces are at stake. internal/protocol/errors is the single-source home for Protocol wire error codes Protocol clients consume over REST/SSE (CLAUDE.md §8, D-075). The CLI's structured error is the operator-facing exit surface — stderr JSON + non-zero exit code from the harbor binary. The two surfaces evolve independently: a Protocol client reading protocol/errors.Code cannot meaningfully consume a CLI exit code (it never sees the harbor binary), and a script reading cmd/harbor.CLIError cannot meaningfully consume a Protocol error code (Protocol responses go over the wire, not via process exit). Mixing them — adding a not_implemented Protocol code to satisfy the CLI, or routing CLI exits through protocol/errors.Error — would conflate the surfaces and violate CLAUDE.md §8's single-source pin on Protocol error codes. The smoke script's static guard (grep -rIn '"github.com/hurtener/Harbor/internal/protocol/errors"' cmd/harbor/) makes the boundary mechanically enforced.

  2. Stub subcommands exit non-zero with a structured CLIError{Code: "not_implemented"}, NOT a clean os.Exit(0). The §13 amendment "Test stubs as production defaults on operator-facing seams" (introduced in PR #91, D-082) requires this posture: a harbor dev against a Phase 63 build that returned exit 0 with a "not yet implemented" stderr message would fool a deployment script into thinking the boot succeeded. The combination (non-zero exit + structured code: not_implemented + hint: "see phase NN — <slug>") makes the stub state unambiguous to both humans (the Error: ... stderr line) and scripts (the exit code + the structured-error JSON in --json mode). This is the §13 amendment's intended posture; Phase 64's real dev body will of course exit 0 on the happy path.

  3. scripts/preflight.sh is amended to recognise the §13-mandated stub exit code. Before Phase 63, preflight treated any non-zero harbor dev exit as a hard failure (the binary was either a clean-exit stub or a working server). The §13 amendment forces a non-zero exit on stub subcommands, which conflicts with the old preflight contract. The amendment adds a single grep on the captured server log: when the stub's structured error ("code":"not_implemented" OR the literal human-mode "not yet implemented (see phase 64" marker) appears, preflight treats the non-zero exit as the "stub binary" posture — same as the existing clean-exit-zero branch. This is forward-compatible: Phase 64's real harbor dev will not exit with that code, so preflight reverts to the original boot-and-wait posture without further changes. The alternative (treating the structured error as a hard failure and forcing Phase 63 to ship harbor dev as exit 0) was rejected: exit 0 + "not implemented" violates §13.

  4. The cobra dependency is promoted from indirect to direct, with no new modules. Cobra (github.com/spf13/cobra v1.10.1) was already a transitive dep before Phase 63 (Bifrost pulls in cobra-using internal packages); promoting it to direct is a go.mod cleanliness change, not a new dependency. The same posture Phase 61's golang-jwt/jwt/v5 promotion took (D-079): the module was already in the indirect set; declaring it direct documents the binary's intent and pins the version explicitly. Two new indirect deps come along (spf13/pflag + inconshreveable/mousetrap); both are MIT-licensed and CGo-free, so the static-binary invariant (CGO_ENABLED=0 go build -ldflags='-s -w') is preserved. RFC §10's stack table lists cobra as Settled, so this is not an RFC change.

  5. harbor --help is a golden-file test; the -update flag is the regeneration path. Brief 06 §6 explicitly names "CLI golden tests" as a Harbor CLI requirement. Phase 63 establishes the pattern: cmd/harbor/testdata/golden/help.txt is the golden, cmd/harbor/root_test.go::TestRoot_Help_MatchesGolden is the diff, the -update flag rewrites the golden in place (the standard go-test idiom). Every future phase that ADDS a subcommand (64 → 70) mutates the help golden in the same PR — a phase that lands without regenerating the golden fails its own CI. The golden is intentionally simple text (not JSON) because that is what harbor --help emits today and what brief 06 sized the test against; the test does NOT pin internal cobra render details.

§13 primitive-with-consumer — closed in-PR. The primitive Phase 63 introduces is the cmd/harbor.CLIError structured-error type + the PrintCLIError sink. Its first consumer is cmd/harbor.emitCLIError (the hook every subcommand body calls — six stub subcommands consume it on every invocation, exercised end-to-end by cmd_stub_test.go's table-driven assertions on both human and --json modes). The version subcommand is also a first consumer of currentVersionInfo + renderVersionHuman + renderVersionJSONcmd_version_test.go round-trips the JSON shape and pins the field labels. The CLI-as-Protocol-version-consumer surface (the types.ProtocolVersion constant) is also discharged in-PR: harbor version's --json .protocol field round-trips the value through the wire, which is what harbor inspect-* and a third-party Protocol client over Phase 60's wire would do later. The golden-file test pattern is the consumer of itself — a meta-primitive that future phases will inherit when they extend the help surface (the -update flag is the documented regeneration path).

§14 pre-merge checklist — cmd/harbor coverage 79.0% (target 70%); cobra direct-promotion documented; preflight amendment forward-compatible; no multi-isolation paths touched; no Protocol types changed; no config schema changed; no migrations added. The CLI is a one-shot process that does not load identity, so the multi-isolation checklist row is N/A. The CLI consumes internal/protocol/types.ProtocolVersion only — a pure constant with no I/O — so no integration test is required (the §17.1 trigger list does not fire; the cross-subsystem seam first opens in Phase 64). No reusable artifact lands: each bin/harbor invocation constructs a fresh cobra root in main() and exits; there is no long-lived state that crosses goroutine boundaries, so the D-025 concurrent-reuse obligation is N/A (Phase 64's long-lived server picks it up).


D-085 — Phase 71 harbortest test kit: public top-level package; deterministic default identity + Admin-scope capture; subsequence semantics on AssertSequence; RunID-ownership + reflective payload check on AssertNoLeaks; FIFO failure queue on FaultInjector; no stub LLM (CLAUDE.md §13 amendment posture)

Date: 2026-05-15 Status: Settled Where it lives: harbortest/doc.go + harbortest/agent.go + harbortest/testing.go + harbortest/eventlog.go + harbortest/runonce.go + harbortest/assertions.go + harbortest/reflect.go + harbortest/simulate.go (the public package surface), harbortest/agent_test.go + harbortest/assertions_test.go + harbortest/simulate_test.go + harbortest/concurrent_test.go + harbortest/extra_test.go + harbortest/testhelpers_test.go (self-tests + the deliberate-cross-session-bug regression), scripts/smoke/phase-71.sh (the smoke), docs/plans/phase-71-harbortest.md (the plan), docs/plans/README.md (the Status row flip), docs/glossary.md (the seven new entries — RunOnce, RecordedEvents, EventLog, AssertSequence, AssertNoLeaks, SimulateFailure, FaultInjector), README.md (Status row + testing pointer), CLAUDE.md §3 / AGENTS.md §3 (the new harbortest/ top-level entry in the canonical layout block).

Why: Phase 71 ships Harbor's first-class authoring surface for flow-level agent tests — the §6.13 / brief 06 §3 obligation. The acceptance criterion is binary: a flow-level test ten lines or fewer; AssertNoLeaks catches a deliberate cross-session bug in a regression test. The five public entry points (RunOnce, AssertSequence, AssertNoLeaks, SimulateFailure, RecordedEvents) are settled in the brief; the design calls below pin the shape choices that aren't fully specified there so a future reader doesn't relitigate.

Five durable design calls warrant a durable home so they don't drift.

  1. The package lives at the top-level (harbortest/), not under internal/. Go's toolchain restricts internal/ packages to importers inside the owning module subtree; a test kit that test-authors are meant to consume from their own modules CANNOT live under internal/. The brief 06 §3 wording ("a public harbortest package consumers import") is unambiguous on this. The precedent inside the Go ecosystem is consistent — golang.org/x/tools/go/analysis/analysistest lives at a top-level path inside its module; net/http/httptest is in the stdlib at a top-level path; every Go ecosystem testing kit follows the shape. CLAUDE.md §3's layout block previously enumerated cmd/ + internal/ + examples/ + test/integration/ + scripts/ + docs/ as the canonical top-level homes, with the explicit closing rule "Anything that doesn't have a home above is wrong. If you need a new top-level directory, propose it in the RFC first." The RFC PR for this addition is the present phase — the addition is justified, in scope for the phase plan, and aligns with the brief. Three alternatives considered and rejected: (a) a _test.go build-tag inside internal/runtime/... — defeated by the cross-module import requirement; (b) a pkg/harbortest/ subdirectory — the pkg/ convention is non-standard in modern Go and would require a separate layout-update RFC; (c) a testkit/ rename — the harbor* prefix is the project-naming convention (harbor binary, harbor-events rename, etc.), and harbortest is the brief-specified name. CLAUDE.md §3 is updated in this PR to document harbortest/ as a top-level package.

  2. Default identity is canonical "harbortest" across the triple, NOT randomised or unset. The kit's RunOnce builds a default identity.Identity{TenantID:"harbortest", UserID:"harbortest", SessionID:"harbortest"} when the caller passes no Deps.Identity. The choice considered against (a) randomised UUIDs per call and (b) a non-Validate-passing zero triple was: (a) UUIDs are unhelpful — test authors who want to predict identity in their assertions cannot grep for "harbortest" because every run is fresh, and the kit's identity-propagation tests need a stable value to assert against; (b) a zero triple fails identity.Validate and would force RunOnce to return ErrStackConstruction for every zero-Deps call, defeating the "flow-level test in ten lines" acceptance criterion. The canonical "harbortest" string is deterministic, grep-friendly, and unmistakeable in production audit logs if it ever leaks (it shouldn't — the kit is harbortest, not harborprod). Test authors who want a different identity supply Deps.Identity explicitly. RunIDs, separately, MUST differ across concurrent calls (the D-025 reuse test relies on this), so the package-level runCounter synthesises a fresh ID per call via harbortest-run-<seed>-<n>. The seed is monotonic per package init; the counter is mutex-guarded.

  3. RunOnce subscribes with events.Filter{Admin: true} — the kit's only way to observe events ACROSS identity triples. Brief 06 §4 documents the bus's identity-triple-mandatory subscription rule: a non-Admin filter without (tenant, user, session) is rejected with ErrIdentityScopeRequired. But AssertNoLeaks MUST see cross-triple events to detect leaks — if the kit subscribed only to its own triple, a leak FROM the kit's run TO a different triple would be invisible (the foreign-triple event would simply not arrive at the subscriber). The Admin subscription is the only filter shape that gives AssertNoLeaks the data it needs. The bus emits one audit.admin_scope_used event per RunOnce in response (CLAUDE.md §6 rule 5 + Phase 05) — the captured EventLog naturally contains this entry; test authors should expect to see it. The audit emit is documented in the EventLog's godoc and is part of the kit's contract; production code never subscribes Admin without a verified scope claim (Phase 61 wires the cryptographic verification; the test kit's Admin claim is unverified and audit-emitted just like any other Admin Subscribe — defence-in-depth).

  4. AssertSequence uses ordered-subsequence semantics, NOT strict prefix or strict equality. A captured EventLog from a real RunOnce contains the agent's emits PLUS the bus-internal audit.admin_scope_used event PLUS any bus.dropped / bus.subscription_idle_closed / runtime.warning events the bus or runtime layers emit. A strict-equality AssertSequence would force every test author to enumerate the full set every time — impractical and brittle. A strict-prefix variant would fail any test where the agent emits more than the caller asserted. The ordered-subsequence shape lets the caller name only the events they care about, in the order they care about, and the assertion succeeds if those events appear in that order (possibly interleaved with others). The semantics match the brief's "make a flow-level test ten lines or fewer" goal: a typical assertion is [tool.invoked, tool.completed] and the test author doesn't need to know what bus-internal events fire alongside. Test authors who want strict equality can grep log.All() themselves. The error message on a missing entry names the first unmatched type and the captured sequence so the diff is actionable.

  5. AssertNoLeaks uses RunID-ownership inference (first-publisher wins) PLUS reflective payload-identity inspection. The brief specifies "cross-tenant/session leakage detector" without prescribing the algorithm. The choice considered: (a) trust the caller to declare the runs and their owners explicitly — verbose, defeats "ten lines"; (b) accept the bus's monotonic Sequence as the authoritative "which triple first published under this RunID" signal — works for any well-formed log; (c) require every test to use distinct buses per identity — defeats Deps.Bus sharing entirely. (b) is the only shape that works for the typical sharing pattern. The walk has two arms: outer-triple-vs-RunID-owner (run-id cross-talk) and payload-vs-outer-triple (payload cross-talk). The reflective payload check (reflect.go::reflectQuadruple) looks for an exported Identity field of type identity.Quadruple (the Harbor canonical name) or identity.Identity (widened to a zero-RunID Quadruple) plus an optional IdentityQuadruple() identity.Quadruple method (the type-assertion fast path for payloads that want to be explicit). Both checks call t.Errorf naming the offending event index + the triple disagreement; the message includes the substring "cross-talk" so test authors can grep for it. The regression test (TestAssertNoLeaks_CatchesCrossSessionLeak) IS the acceptance-criterion fixture: an Agent under triple A publishes an event tagged with triple A but carrying triple B's RunID; the assertion fires.

  6. SimulateFailure wraps the catalog at the Resolve boundary; FIFO failure queue; class-typed errors. The brief specifies "SimulateFailure(toolName, code, n) (next n calls fail with code)." The implementation choices that aren't in the brief: (a) WHERE the wrapper sits — at the catalog Resolve boundary, NOT at the tool descriptor's Invoke registration. The Resolve boundary wraps every consumer of the catalog uniformly; wrapping at Register would only catch tools registered AFTER the wrapper is installed. (b) WHAT shape the error takes — class-typed so the production policy shell classifies the failure correctly. Permanent → wrap tools.ErrToolInvalidArgs (the policy shell classifies wrapped invalid-args as permanent); timeout → wrap context.DeadlineExceeded (the policy shell classifies wrapped deadline-exceeded as timeout); transient + 5xx + unknown → wrap a new package-local ErrSimulatedFailure sentinel (the policy shell's classifyError() falls through to ErrClassTransient for unknown wraps — the right default for "give me a transient failure"). (c) The queue is FIFO across multiple SimulateFailure calls on the same tool: SimulateFailure(inj, "x", transient, 2) then SimulateFailure(inj, "x", permanent, 1) yields transient, transient, permanent. (d) Defensive guards: nil injector, empty toolName, n<=0 are silent no-ops (the caller intent is unclear and silently doing nothing is safer than panicking). (e) A nil catalog at NewFaultInjector panics with a grep-friendly message — this is a test-author bug at the kit boundary, not a production fail-loud concern.

§13 amendment posture — no stub LLM, no silent fallback, no default-driver footgun. The Wave 10 §13 amendment (PR #91 / D-082) forbids test stubs as production defaults on operator-facing seams. The kit is a TEST surface but it is OPERATOR-FACING in the sense that real test authors consume it. Three concrete consequences for this phase:

  • No stub LLM ships with harbortest. A test-only LLM driver bundled in the kit would encourage tests that exercise the stub instead of the real runtime. Test authors that need a mock LLM build their own (an llm.LLMClient mock with hand-rolled Complete is one-page-of-Go); the kit's Agent interface is intentionally narrow so the test author owns the boundary.
  • RunOnce fails loud on stack-construction errors. Missing audit redactor (impossible — auditpatterns.New() is parameterless) is not a path; missing bus driver (the _ "github.com/hurtener/Harbor/internal/events/drivers/inmem" blank import in runonce.go registers it at package init) is not a path either. A future events.Open failure surfaces as ErrStackConstruction wrapping the underlying error with the failing component named (fmt.Errorf("%w: events.Open: %w", ErrStackConstruction, err)).
  • The test-only stubRedactor lives in *_test.go. harbortest/testhelpers_test.go defines stubRedactor as a Go-build-tag-gated test fixture; the production paths (harbortest/runonce.go's auditpatterns.New() call) use the real patterns redactor. The §13 trip-wire is clear.

§13 primitive-with-consumer — discharged in-PR. The phase ships five public functions plus one supporting type (FaultInjector) plus one supporting type (EventLog). The self-tests in harbortest/*_test.go are the first consumer of every public symbol:

  • RunOnceTestRunOnce_RoundTrip_CapturesEvents + TestRunOnce_DefaultIdentity_IsCanonical + TestRunOnce_FailsLoudly_OnNilAgent + TestRunOnce_FailsLoudly_OnInvalidIdentity + TestRunOnce_CustomIdentity_FlowsThrough + TestRunOnce_OwnsBusLifecycle_WhenDepsBusOmitted + TestRunOnce_AgentError_ReturnsLog + TestRunOnce_RedactorOverride_Honoured + TestRunOnce_ConcurrentReuse_NoCrossTalk (the D-025 stress).
  • EventLog.RecordedEventsTestEventLog_RecordedEvents_FiltersByRun.
  • AssertSequence → six scenarios (Happy, OrderedSubsequence_AllowsIntervening, Fails_OnMissingType, Fails_OnOutOfOrder, Empty_Want_Matches, NilLog_ErrorPath).
  • AssertNoLeaks → seven scenarios (Happy, CatchesCrossSessionLeak [the load-bearing regression], NilLog_ErrorPath, PayloadCrossTalk_QuadrupleField, PayloadIdentityHolder_TypeAssertionPath, PayloadIdentityTripleField).
  • SimulateFailure + FaultInjector → nine scenarios (FailsThenResumes, PermanentClass_WrapsInvalidArgs, TimeoutClass_WrapsDeadlineExceeded, PerToolIsolated, StacksFifo, NoInjection_PassesThrough, GuardsAgainstZeroAndNil, NilCatalog_Panics, UnknownTool_NotFound, Register_Forwards, List_Forwards, ConcurrentReuse — the D-025 N=100 stress).

Coverage: 88.4% statement coverage on harbortest/ against the 85% master-plan target. The race detector is the CI gate; every test runs under -race. The Phase 71 smoke script (scripts/smoke/phase-71.sh) runs go test -race ./harbortest/... so the kit's self-tests double as the phase's smoke surface.



D-086 — Phase 31 tool-side approval gates: sibling consumer of the Phase 50 Coordinator under internal/tools/approval; ApprovalGate reusable artifact + ApprovalPolicy interface + *ErrToolRejected typed sentinel + three SafePayload events (tool.approval_requested / tool.approved / tool.rejected); no silent-stub default at boot (§13 amendment); ORIGINAL args never on the bus

Date: 2026-05-15 Status: Settled Where it lives: internal/tools/approval/approval.go (the ApprovalPolicy interface + ApprovalRequest / ApprovalDecision + the *ErrToolRejected typed sentinel + all package sentinels), internal/tools/approval/gate.go (the *ApprovalGate concrete artifact + NewApprovalGate constructor + the RunGuarded entry surface + the ResolveApproval in-process helper + the per-pause waitingEntry registry), internal/tools/approval/events.go (the tool.approval_requested / tool.approved / tool.rejected events.EventType + ToolApprovalRequestedPayload / ToolApprovedPayload / ToolRejectedPayload SafePayload shapes + the init() registrations), internal/tools/approval/policies.go (the three bundled policies — AlwaysDenyPolicy for fail-safe-everywhere, AlwaysApprovePolicy for explicit dev-loop sandboxes, TaggedPolicy for the V1 production reference), internal/tools/approval/{approval,gate,events,policies}_test.go (the unit-test surface — 88.9% statement coverage), internal/tools/approval/concurrent_test.go (the D-025 N=128 concurrent-reuse stress), test/integration/phase31_approval_gates_test.go (the §17 wave-end integration test — full APPROVE + REJECT cycles + scope-gate failure + cross-identity failure + steering-inbox-shape contract + goroutine-leak + N=16 concurrency stress), scripts/smoke/phase-31.sh (the smoke), docs/plans/phase-31-tool-approval-gates.md (the per-phase plan), docs/plans/README.md (the Phase 31 row Status flip), docs/glossary.md (six new entries — approval.ApprovalGate, approval.ApprovalPolicy, approval.ErrToolRejected, tool.approval_requested, tool.approved, tool.rejected), README.md (Status row Phase 31 → Shipped).

Why: Phase 31's master-plan line — "synchronous 'approve this tool call' gates using the same pause/resume primitive — distinct from OAuth, simpler payload shape; APPROVE/REJECT round-trip via the protocol; reject path raises typed tool.rejected events" — is unambiguous on the surface but four durable design calls warrant being recorded here so they don't get re-litigated.

  1. The approval-gate package is a SIBLING of internal/tools/auth, not a subpackage. Two siblings under internal/tools/: auth/ for OAuth (Phase 30) and approval/ for HITL approval (Phase 31). They share the Coordinator + bus + redactor seams but nothing else: OAuth needs a TokenStore, an authorization-server URL, a PKCE verifier, dynamic client registration, metadata discovery, a Sealer for encryption-at-rest. Approval gates need NONE of that — just a policy, a pending-resolution channel, and a typed reject sentinel. Putting approval gates under auth/ would have forced operators to import the OAuth machinery to use an HITL gate; the sibling layout keeps each subsystem's surface area minimal. Three alternatives considered: (a) a subpackage internal/tools/auth/approval — defeated by the no-OAuth-baggage requirement; (b) a wrapping middleware in internal/runtime/dispatch — too tightly coupled to the dispatcher's evolving shape; (c) a tool-catalog hook — the catalog's interface is settled (RFC §6.4) and growing a per-descriptor approval hook would either re-shape the interface or grow optional-capability ceremony (forbidden by CLAUDE.md §4.4). The sibling-under-tools/ layout matches the natural conceptual grouping: tool-side authentication AND tool-side approval are both "things that gate a tool invocation."

  2. ReasonApprovalRequired, NOT ReasonExternalEvent. Phase 30 (OAuth) uses ReasonExternalEvent because the run waits on an external authorization-server callback. Phase 31 (HITL approval) uses ReasonApprovalRequired — the textbook RFC §6.3 reason for a HITL approval gate (brief 02 §"Pause-reason taxonomy"). Using ReasonExternalEvent for approval gates would conflate the two flows in audits / observability and make the four-reason taxonomy meaningless. The Coordinator's pause record carries the reason verbatim; observers branch on it. The smoke script's static guard pins this (grep ReasonApprovalRequired on gate.go).

  3. ORIGINAL args stay in the gate's pending map; the bus never sees them. The tool.approval_requested event carries an ArgsSummary field — the audit-redactor's output over a map[string]any shape of the args. The ORIGINAL json.RawMessage stays in the gate's pending[Token].req.Args. On APPROVE, the gate returns req.Args to the caller — the redactor's output is NEVER routed through to the executed tool invocation. This closes the failure mode where a redactor that elides a secret-shaped field would corrupt the post-approve tool call: the redactor's domain is audit / observability emit; the executed call's domain is the original caller's intent. Two-source-of-truth is forbidden in general (§13) but this is NOT two sources of truth for one piece of data — it is one piece of data (the args) at one site (the gate's pending map), with a separate-purpose redacted view (the event payload) derived from it. Brief 03 §"Audit redaction lives in the audit subsystem" is unambiguous: every payload runs through the redactor; the persisted artifact is the event, not the Go struct. The Phase 31 design composes cleanly with this — the persisted artifact (the event) is redacted; the in-memory execution path (the gate→tool invocation) uses the raw args.

  4. Resolution is double-gated: protocol/auth scope (admin OR console:fleet) + Coordinator identity scope. The §13 amendment dictates BOTH gates are fail-loud at every boundary. ApprovalGate.ResolveApproval enforces auth.HasScope(ctx, ScopeAdmin) || HasScope(ctx, ScopeConsoleFleet) — the same scope set Phase 61's events.Filter{Admin:true} subscriptions use. A leaked observer token (read-only) cannot approve a tool call (ErrApprovalScopeRequired fires loud). The Coordinator's sameScope check separately enforces identity-triple equality — a tenant-B admin cannot resolve a tenant-A pause (pauseresume.ErrScopeMismatch propagates). Two independent gates, two independent failure modes, two independent test scenarios. The §17 integration test covers both. The Phase 54 Protocol edge will also enforce scope at the JWT boundary in a later phase — defence in depth across three layers (transport + gate + Coordinator). The NewApprovalGate(GateDeps{}) nil-policy / nil-coordinator / nil-bus / nil-redactor rejection is the §13-amendment fail-loud at construction time; the runtime CANNOT silently boot with a gate that auto-approves.

§13 primitive-with-consumer — discharged in-PR. Phase 31 ships three new primitives:

  • The ApprovalPolicy interface + bundled AlwaysDenyPolicy / AlwaysApprovePolicy / TaggedPolicy concretes. First consumers: every gate-level test (gate_test.go::TestRunGuarded_*) exercises the policy's Required=true and Required=false paths end-to-end; policies_test.go exercises each bundled concrete directly.
  • The *ApprovalGate concrete artifact. First consumers: test/integration/phase31_approval_gates_test.go::TestE2E_Phase31_FullApproveCycle + TestE2E_Phase31_FullRejectCycle exercise the full pause/resume cycle end-to-end against real Phase 50 Coordinator + real audit.Redactor + real events.EventBus. The §17.3 failure modes (scope-gate + cross-identity) + goroutine-leak + N=16 concurrency stress are also covered.
  • The three event types tool.approval_requested / tool.approved / tool.rejected. First consumers: gate.go::publishApprovalRequested / publishApproved / publishRejected (the producers) + gate_test.go::TestRunGuarded_ApproveRoundTrip / RejectRoundTrip (the subscribers asserting payload shape end-to-end through real inmem.New + real patterns.New() redactor).

The §13 obligation is closed in-PR for every new shape; no consumer is deferred.

§17 integration test in same PR. test/integration/phase31_approval_gates_test.go wires real drivers across every seam (CLAUDE.md §17.3 #1): real audit.Redactor (the patterns driver), real events.EventBus (the inmem driver), real pauseresume.Coordinator, real steering.Registry (the Phase 53 surface — Phase 31's resolution shape is exactly what Phase 53's RunLoop will dispatch). Identity propagation asserted via the Event envelope's Identity field (CLAUDE.md §17.3 #2). Failure modes covered: ErrApprovalScopeRequired on unscoped resolver (§17.3 #3) + pauseresume.ErrScopeMismatch on cross-tenant resolver (§17.3 #3) + ErrApprovalCancelled on caller-ctx-cancel + goroutine-leak on initiate-then-cancel. Concurrency stress at N=16 distinct identity stacks (CLAUDE.md §17.3 final, N>=10).

§13 "Test stubs as production defaults" amendment — observed. NewApprovalGate(GateDeps{}) with a nil Policy field fails-loud at construction with ErrPolicyRequired. Approval gates with no policy attached would auto-approve every call (the worst-case posture for an HITL surface) OR pass-through with no gate (the §13 amendment's "silent stub default" footgun). The constructor REFUSES both. Smoke-script static guard grep ErrPolicyRequired && grep 'deps.Policy == nil' on gate.go mechanically enforces the trip-wire.

Coverage on internal/tools/approval: go test -race -cover reports 88.9% statement coverage on the touched package against the master-plan 80% target. The test surface is dense (per-method happy path + per-method failure mode + cross-scope conformance + D-025 + goroutine-leak + the integration test's wave-shape stress).



D-087 — Phase 67 harbor scaffold: single embedded minimal-react template; production-shaped harbor.yaml; acceptance proven against internal/config.Load + Validate directly (Phase 68 harbor validate sibling-shipping)

Date: 2026-05-15 Status: Settled Where it lives: cmd/harbor/cmd_scaffold.go + cmd/harbor/cmd_scaffold_test.go (the cobra body + cobra-driver tests), cmd/harbor/scaffold/{doc,scaffold,render,scaffold_test}.go (the binary-internal engine package), cmd/harbor/scaffold/templates/minimal-react/{go.mod,harbor.yaml,README.md,agent.go,agent_test.go}.tmpl (the embedded template), cmd/harbor/testdata/golden/minimal-react/* (the per-file golden the scaffold tests diff against), cmd/harbor/testdata/golden/help.txt (regenerated — scaffold Short shed the "(Phase 67)" suffix), cmd/harbor/root.go (Long updated to drop the "Only harbor version is fully implemented at this phase" claim now that two subcommands are real), cmd/harbor/cmd_stub_test.go (scaffold removed from the stub-table — it is no longer a stub), scripts/smoke/phase-67.sh (the smoke), docs/plans/phase-67-scaffold.md (the plan), docs/plans/README.md (the Status row flip + the §4.3 deviation note in the Phase 67 detail block), docs/glossary.md (the four new entries — harbor scaffold, Template (scaffold), minimal-react, Scaffold output), README.md (the Status row + the testing-section pointer).

Why: Phase 67 turns harbor scaffold from the Phase 63 stub into a real subcommand that materialises a Harbor agent project skeleton from an embedded template. The master-plan acceptance criterion is binary: "harbor scaffold my-agent creates a buildable project; harbor validate returns 0." Five durable design calls warrant a durable home so they don't drift.

  1. Phase 67 ships exactly one template (minimal-react); the template-registry surface is wired generically. Brief 06 §7 #11 sizes the scaffold work as "1–2 phases" and pairs it with the harbor dev draft-save flow (Phase 66 — depends on Phase 64). Shipping multiple templates today is premature: there is exactly one shaped Harbor agent (the worked example built around harbortest.Agent), and a second template would either fork the production-shape constraints (a regression risk) or duplicate them (a §13 two-parallel-implementations smell). The registry — embed.FS over templates/<name>/, Templates() enumerating the directory at runtime, --template flag's allowed-value list derived from the same call — makes adding a template a single-directory operation; no command-body change required. The trade-off considered against (a) hand-rolling a switch in the cobra body and (b) requiring operators to author out-of-tree templates: (a) is brittle (every new template re-touches the command body — a §13 "two parallel implementations" smell read sideways); (b) is post-V1 (the public template-author surface needs an RFC entry that doesn't exist yet — the §16 workflow gate fires). The embedded-only registry preserves the CGo-free / single-static-binary invariant and keeps the surface small.

  2. The scaffolded harbor.yaml demonstrates the production shape — bifrost LLM driver, env.NAME API key reference, sqlite state, real audit redactor — NOT the mock driver default. The Phase 64 pre-plan note ("Phase 64 — harbor dev v1") is binding for the entire wave: "harbor scaffold produces a project — that project's examples/dev.yaml or equivalent MUST demonstrate the production-shaped config, NOT a --mock LLM path." The §13 amendment forbids test stubs as production defaults on operator-facing seams; the scaffold's OUTPUT is precisely such a seam. Three concrete consequences: (a) llm.driver: bifrost (not mock) — the production driver Phase 33 shipped; (b) llm.api_key: env.OPENROUTER_API_KEY — the env.NAME reference form Phase 33's bifrost driver resolves at construction time, not a literal value (literals would be the §7 "no hardcoded secrets" trip-wire even though scaffold output is per-operator); (c) state.driver: sqlite with an explicit DSN — the durable single-node default per RFC §9 + the Phase 15 SQLite StateStore driver, not inmem (which Phase 64's pre-plan note treats as test-shape). The TestScaffold_RenderedConfig_DemonstratesProductionShape test fires on every CI run; a future template author who quietly flips back to driver: mock fails the test immediately. The §13 trip-wire is mechanical, not advisory.

  3. The acceptance criterion is verified against internal/config.Load + Validate directly, not via harbor validate. This is the §4.3 deviation. The master-plan Phase 67 detail block names harbor validate as the validation surface; at scaffold-time Phase 68 is sibling-shipping (its harbor validate subcommand is still a Phase 63 stub). Three options considered: (A) call internal/config.Load + Validate directly from a cmd/harbor/scaffold/scaffold_test.go test — this PR's choice; (B) gate Phase 67 on Phase 68 merge — rejected because the user dispatched these in parallel for a reason: the Go code surfaces are non-overlapping (cmd_scaffold.go vs. cmd_validate.go); (C) wait for Phase 68's PR to update Phase 67's smoke step — rejected because §17.6 explicitly says cross-phase fixes ride with whoever discovers them, and the cross-phase work here is "add a CLI integration step", which is Phase 68's surface to add. Option A is the cleanest. The §13 primitive-with-consumer rule is satisfied: the scaffold's consumer of the config validator is the in-tree internal/config package — a real, shipped subsystem — not a planned future CLI subcommand. The cross-phase CLI integration smoke step (running harbor validate ./harbor.yaml after a scaffold) lands in Phase 68's PR per §17.6.

  4. scaffold.Scaffold is a pure function — no goroutines, no shared state, no D-025 obligation. Each invocation builds a templateVars value, walks the embedded templates/<name>/ tree, renders each file's bytes via text/template.Execute, and writes the output. There is no long-lived reusable artifact, no compiled-once-invoked-many surface. The §5 / §11 / D-025 concurrent-reuse contract is vacuous. Phase 64's harbor dev will ship the long-lived server and pick up the engine-level D-025 dance for the runtime; this phase's surface is per-invocation. Concurrent invocations against distinct output dirs are safe by construction (no shared state); concurrent invocations against the SAME output dir race on os.Stat but the loser cleanly fails with ErrOutputDirExists (idempotent fail-closed posture).

  5. No-partial-writes contract: any post-MkdirAll failure cleans up the output dir before returning. The §13 fail-loud posture extends to filesystem state: an operator who runs harbor scaffold --name foo and sees a non-zero exit MUST NOT then find a half-rendered foo/ directory on disk (the next scaffold invocation would fail with ErrOutputDirExists against the partial tree, masking the real cause). scaffold.Scaffold's body: validate name → resolve template → validate output dir absence → render via renderTemplate(...) → on render failure, os.RemoveAll(absOut) before returning the wrapped error. The cleanup itself is wrapped (also failed to clean up partial output: %w) so a cleanup failure surfaces in the error chain rather than silently disappearing. The shipped minimal-react template renders cleanly by construction, so this path is exercised today only via the synthetic engine-level tests; a future template that can fail partway will inherit the contract.

  6. go.mod template carries no replace directive; the rendered module assumes a publicly-resolvable Harbor module. The scaffolded go.mod reads require github.com/hurtener/Harbor v0.0.0-dev. Phase 78 (release-engineering) will pin the first real semver; until then, operators who run go vet ./... on the scaffolded tree from outside the Harbor source checkout will see a module-resolution failure — that's expected and documented in the README. The §17.1 integration-test obligation is satisfied by TestScaffold_RenderedConfig_PassesConfigValidate (the config-validate seam) — there is no in-PR go vet ./... test because (a) it would require either a network call (excluded by §17.4) or a replace directive that is NOT representative of the shipped output. Pinning the semver + adding a real go.mod integration test is a Phase 78 follow-up.

Coverage: cmd/harbor at 82.1% (master-plan target 70%), cmd/harbor/scaffold at 78.2% (master-plan target inherited from cmd/harbor — 70%). The uncovered fraction is the os.MkdirAll / os.WriteFile / os.RemoveAll cleanup branches that fire only on a real filesystem failure (out-of-disk, permission-denied) — exercised today only via the cmd_scaffold_test.go happy path. The smoke script (scripts/smoke/phase-67.sh) runs five OK assertions against the built binary and the engine-level config-validate test; preflight is green.

§17.6 cross-phase fix — Phase 63 stub-table trim. Phase 63's smoke (scripts/smoke/phase-63.sh) iterates a fixed stubs=(dev scaffold validate inspect-events inspect-runs inspect-topology) table and asserts each emits CodeNotImplemented on a --json invocation. The real Phase 67 scaffold returns CodeInvalidProjectName on an empty-name invocation (the smoke's invocation shape) — so this PR drops scaffold from the table. Per CLAUDE.md §17.6, cross-phase smoke maintenance rides with the PR that moves the subcommand out of stub status. Future PRs that ship validate/inspect-* will trim the stubs array further; the in-script comment documents the pattern so subsequent maintainers don't need to grep this decision entry.

§14 pre-merge checklist — green. make drift-audit clean (RFC §8 + brief 06 references resolve; the smoke script exists; the mirror invariant holds). make check-mirror clean. go test -race ./... clean. make preflight clean (the Phase 63 cross-phase fix above closes the only FAIL the gate surfaced). No multi-isolation paths touched (the scaffold is a one-shot CLI that writes files; identity is N/A). No Protocol types changed. No config schema changed (the scaffold WRITES a config file but does not extend the schema). No migrations added. No new heavy dependencies (embed + text/template are stdlib).


D-088 — Phase 68 harbor validate: pre-boot CLI surface over internal/config's validator; stable error categories (config.parse / config.semantic / io.not_found / io.read); file:line precision from goccy AST + dotted-path lookup; tri-state exit codes (0/1/2); --json body is {error, code, hint, errors[]} parseable by jq; cross-phase Phase 67 integration deferred via SKIPing smoke step (§17.6)

Date: 2026-05-15 Status: Settled Where it lives: cmd/harbor/cmd_validate.go (the subcommand body — replaces the Phase 63 stub) + cmd/harbor/validate_test.go (golden-pinned tests per category) + cmd/harbor/main.go::exitCodeFor (the tri-state exit-code mapping) + cmd/harbor/cmd_stub_test.go (the stub-cases list updated to drop validate) + cmd/harbor/testdata/validate/*.yaml (the five fixtures) + cmd/harbor/testdata/validate/golden/*.{txt,json} (the twelve goldens — six categories × two modes; .txt rather than .out because the latter is gitignored), scripts/smoke/phase-68.sh (the smoke), cmd/harbor/testdata/golden/help.txt (regenerated — the validate Short description drops the "(Phase 68)" suffix), .github/workflows/ci.yml (the new harbor validate examples/*.yaml step in the preflight job — closes the master-plan acceptance "CI uses validate as a pre-flight check"), docs/plans/phase-68-validate.md (the plan), docs/plans/README.md (the Status row flip), docs/glossary.md (Validation error category + Validation file:line precision), README.md (Status row + CLI pointer paragraph).

Why: Phase 68 lands the real harbor validate body — RFC §8's settled CLI subcommand — replacing the Phase 63 stub. The master-plan acceptance is "Each error category produces a stable message; CI uses validate as a pre-flight check." Five design calls warrant a durable home so a future contributor doesn't relitigate them.

  1. harbor validate is a pre-boot tool, NOT a Protocol client. Every other CLI subcommand (inspect-events, inspect-runs, inspect-topology, dev) is or will be a Protocol client of a running Runtime — the brief 06 §3 posture ("All of this is implemented as protocol clients of the same runtime — no private hooks"). validate is the deliberate exception: its purpose is to detect breakage BEFORE the Runtime boots, so it MUST be invocable against a half-broken config that would prevent the Protocol from coming up at all. The implementation only imports internal/config and calls config.LoadFromBytes; it never opens a network port, never resolves an LLM provider, never touches the StateStore. This positioning is the right one for CI gates — harbor validate examples/*.yaml on every PR is cheap, deterministic, and catches the largest class of operator-facing config drift.

  2. Stable error categories are wire contracts, not implementation details. The four categories the V1 surface ships (config.parse / config.semantic / io.not_found / io.read) are pinned by goldens at cmd/harbor/testdata/validate/golden/*.{out,json} and by the smoke script's substring greps. Renaming any is a breaking change to the CLI surface that downstream scripts depend on. The taxonomy is intentionally narrow: config.parse covers anything goccy/go-yaml rejects (unknown field via yaml.Strict(), malformed YAML, type mismatch); config.semantic covers anything internal/config.Validate rejects (bad enum, missing required, out-of-range numeric); the two io.* categories cover anything os.ReadFile rejects. Skill / agent-definition categories will be added when those file surfaces materialise (they don't today — see §4 below). The categories are documented in the glossary (Validation error category) and the format is documented in the godoc of cmd_validate.go's validationFinding type.

  3. File:line precision via goccy AST + dotted-path lookup; missing-field errors fall back to line=0 with documented semantics. config.parse errors come from goccy/go-yaml's typed errors (SyntaxError, UnknownFieldError, etc.) which include a [<line>:<col>] marker at the start of Error(). We extract the line via the message-format probe in parseLineFromGoccyMessage (rather than importing goccy's internal errors package directly — keeping the dependency surface narrow). config.semantic errors arrive with a dotted field path embedded in the loader's wrapped message ("config: invalid configuration: config.<a.b.c>: <reason> (source: ...)"); we extract the path in extractFieldPath, then walk the YAML AST in lineForFieldPath to find the offending key's token line. A semantic error whose path resolves to a field the operator OMITTED (the reason IS "must not be empty") has no token to point at — we report line=0 and the operator greps the field path in the message. This is a deliberate trade-off, not a bug: threading per-field token positions down through internal/config/loader.go would require a separate phase to plumb. The fallback is documented in Validation file:line precision (glossary) and in the godoc of lineForFieldPath. Three alternatives considered and rejected: (a) pin token positions on the Phase 02 Config struct itself — high-touch, requires changes to every validateXxx helper; (b) call the loader twice (once to parse, once to validate with the AST in hand) — doubles parse cost on the happy path; (c) skip line precision for semantic errors entirely — defeats the master-plan acceptance ("Errors include file:line").

  4. Skills + agent-definition validation are deliberately out of scope today. The master-plan goal sentence is "Validate config / skills / agent definitions without booting." Today, only "config" has a file-shaped surface to validate. Skill definitions are imported via the Phase 37 internal/skills/importer from Markdown-with-frontmatter (not YAML; the importer is a runtime path consumed by harbor dev, not a standalone file format the operator hand-edits). Agent definitions live in the Agent Registry (Phase 53a) and are produced through the Registry API, not loaded from files. When either surface lands a standalone file format (a harbor-skill.yaml schema, an agent-def.yaml schema), a successor phase wires them into harbor validate <path> behind a --kind config|skill|agent-def flag (or content-sniffing). The CLI's positional path argument is forward-compatible — today it accepts a single config path; tomorrow it'll accept any of the three kinds. This deferral is explicit in the phase plan's "Non-goals" and in this decision so a future reader does NOT take it as silent drift.

  5. Tri-state exit codes (0/1/2) flow through main.go::exitCodeFor; the validation_internal_error code is the lever. Phase 63 shipped exit-1-on-any-error. Phase 68 introduces the distinction "validation found issues" (exit 1) vs "the tool itself couldn't run" (exit 2). The mapping is centralised in cmd/harbor/main.go::exitCodeFor keyed on CLIError.Code: validation_internal_error → exit 2, everything else → exit 1. A CI script can therefore distinguish "the operator's config is broken" from "the file you pointed me at doesn't exist" without parsing the body. The two new codes (CodeValidationFailed, CodeValidationInternal) are wire contracts pinned by the JSON-mode golden tests. Future subcommands that want exit-2 semantics add their codes to exitCodeFor's switch — the mapping is the single source of truth.

§13 amendment posture — validate is structural, not live. The Wave 10 §13 amendment (PR #91 / D-082) says "Fail loudly at boot when a required external dependency is missing." Phase 68 deliberately does NOT enforce that rule: a config without a reachable LLM API key, without reachable Postgres, without a reachable JWKS URL still passes harbor validate if the structural shape is right. The distinction is by design — harbor validate is the pre-flight, harbor dev (Phase 64) is the live check. Three concrete consequences:

  • Placeholder API keys pass validate. A scaffolded harbor.yaml (Phase 67 will produce one) with api_key: replace-me satisfies validate's structural check (the field is non-empty). The first harbor dev boot fails loudly when the LLM client tries to authenticate. This is the right split: a developer who types harbor scaffold && harbor validate should see green; a developer who runs harbor dev against the unfilled scaffold should see a loud failure naming the missing key.
  • No "is the URL reachable" check. The Phase 02 validator checks identity.issuer is non-empty and looks like a URL via jwks_url's string form; it does NOT issue a HEAD against the JWKS endpoint. Adding network reachability to validate would couple a pre-flight check to network state — exactly the coupling the brief 06 §3 wording rejects.
  • The --json body is validation_failed-coded, even for missing fields that harbor dev will also flag. Operator-facing scripts that want to differentiate "validate found N findings" from "dev couldn't boot" use the CLI Code field, not the file content.

This distinction is documented in the phase plan ("Goals" / "Non-goals") and in the godoc of cmd_validate.go.

Phase 67 cross-phase integration — deferred via SKIPing smoke step (§17.6). Phase 67 (harbor scaffold) is shipping in parallel with Phase 68. The master-plan acceptance for Phase 67 is "scaffolded output passes harbor validate" — a cross-phase integration that can only be exercised when both subcommands are real. The smoke step at scripts/smoke/phase-68.sh step 7 probes the scaffold subcommand; when scaffold still emits {"code":"not_implemented"} (Phase 67 not merged), the step SKIPs cleanly. When Phase 67 merges, the step will execute against the real scaffold output and assert exit 0. Per CLAUDE.md §17.6, the fix landed-or-deferred posture is documented up-front so a reviewer can see what is connected today vs what's connected by the merge. The cross-phase scaffold output schema (which subdirectory, which filename for the rendered config) is a Phase 67 implementor's call; this smoke uses find with a wildcard pattern that should match any sensible layout.

CI integration — harbor validate examples/*.yaml in the preflight job. The .github/workflows/ci.yml's preflight job already builds ./bin/harbor via make preflight. Phase 68 adds a one-line shell loop that runs ./bin/harbor validate "$f" for every examples/*.yaml, failing the build if any returns non-zero. This closes the master-plan acceptance "CI uses validate as a pre-flight check" with a one-liner change. Operators who add a new example YAML get free CI coverage of "the example fixture loads cleanly" as a result.

§13 primitive-with-consumer — discharged in-PR. The phase ships one subcommand body + four new helper functions (classifyLoaderError, lineForFieldPath, parseLineFromGoccyMessage, extractFieldPath) + two new error codes (CodeValidationFailed, CodeValidationInternal). The unit + golden tests in cmd/harbor/validate_test.go are the first consumer of every new symbol:

  • runValidate (the cobra RunE) → TestValidate_Human_PinnedByGolden + TestValidate_JSON_PinnedByGolden (six categories × two modes = twelve goldens) + TestValidate_DefaultArgPath (no-arg → harbor.yaml resolution) + TestValidate_QuietFlag_DoesNotSuppressErrors + TestValidate_QuietFlag_SuppressesSuccessLine.
  • parseLineFromGoccyMessageTestParseLineFromGoccyMessage (six shapes).
  • lineForFieldPathTestLineForFieldPath (ten paths: scalar, nested, sequence-indexed, map-indexed, missing).
  • extractFieldPathTestExtractFieldPath (five message shapes).
  • extractParseReasonTestExtractParseReason (parse + fallback).
  • exitCodeFor (in main.go) → indirectly verified by assertValidateExit in validate_test.go.

Coverage: cmd/harbor lands at ≥75% (master-plan target 75% — verified via go test -coverprofile). The race detector is the CI gate; every test runs under -race. The Phase 68 smoke (scripts/smoke/phase-68.sh) runs the package tests under -race plus the built-binary exit-code matrix and the Phase 67 cross-phase probe (SKIPing per §17.6 until scaffold merges).


D-089 — Phase 64 harbor dev v1: LLM-default flip + dev-only mock escape hatch + LLM-backed Summarizer + dev signer

Date: 2026-05-15 Status: Settled

Where it lives:

  • internal/llm/registry.go (flips DefaultDriver from "mock" to "bifrost")
  • internal/config/loader.go (defaults().LLM.Driver"bifrost")
  • internal/config/validate.go (driver-default comment updated; allowlist unchanged so test fixtures keep working)
  • internal/llm/summarizer/ (new: production LLM-backed memory.Summarizer + unit tests + D-025 N=100 concurrent reuse)
  • cmd/harbor/cmd_dev.go (real harbor dev implementation — boot stack, fail-loud, dev-token mint, graceful shutdown)
  • cmd/harbor/cmd_dev_test.go (unit tests for validateLLMProvider, parsePortFromBind, newDevSigner, SignDevToken, bootErrorToCLIError)
  • cmd/harbor/devauth.go (ephemeral ES256 keypair + auth.KeySet + JWT signer)
  • cmd/harbor/devmock.go (conditional mock blank-import + banner helper)
  • cmd/harbor/cmd_stub_test.go (dev removed from the stubs table)
  • cmd/harbor/testdata/golden/help.txt (dev short description updated)
  • examples/dev.yaml (new: canonical harbor dev config; demonstrates driver: bifrost + api_key: env.OPENROUTER_API_KEY)
  • examples/harbor.yaml (flipped from driver: mock to driver: bifrost)
  • scripts/preflight.sh (exports HARBOR_DATA_DIR; passes --config examples/dev.yaml; sets HARBOR_DEV_ALLOW_MOCK=1)
  • scripts/smoke/phase-63.sh (dev graduated out of the stub table; dev live-server assertions moved to phase-64.sh)
  • scripts/smoke/phase-64.sh (new: 6-assertion smoke including LLM-seam round-trip + fail-loud-no-config)
  • test/integration/phase64_harbor_dev_test.go + phase64_harbor_dev_helpers_test.go (cross-subsystem E2E)
  • docs/plans/phase-64-harbor-dev.md (per-phase plan)
  • docs/plans/README.md (Phase 64 row Status flipped to Shipped)
  • docs/glossary.md (4 new entries — harbor dev, HARBOR_DEV_ALLOW_MOCK, HARBOR_DEV_TOKEN, dev signer)
  • README.md (Status row Phase 64 → Shipped; Quick Start updated)

The decision (cluster, recorded so a future planner cannot relitigate):

  • llm.DefaultDriver = "bifrost". The Phase 64 pre-plan note's constraint #1 settles this. A binary built before this PR with no llm: block resolved "mock"; after this PR the same config resolves "bifrost". The mock driver is not blank-imported in cmd/harbor/main.go, so a production binary that does NOT also build cmd_dev.go would not even have the mock registered — but every binary DOES build the dev cmd (it's part of the unified harbor binary), so the mock IS in the registry; what gates it is the validateLLMProvider runtime check which rejects driver: mock unless HARBOR_DEV_ALLOW_MOCK=1 fires. A "stricter" alternative — build-tagging the mock package — was scoped out because every test importing internal/llm/mock (≈7 files across internal/llm, internal/governance, internal/planner, test/integration/wave7b_test.go, test/integration/wave8_test.go) would need the same build tag, expanding the blast radius beyond the phase's scope. A later refactor that prefers the strict path is one PR; the §13 amendment in spirit is satisfied today because the mock cannot run as the default and the only path is the explicit env-var escape hatch with a banner.
  • Escape hatch = env var, not flag. HARBOR_DEV_ALLOW_MOCK=1 (the env var) was chosen over --mock (the flag) because the preflight harness invokes ./bin/harbor dev without arguments — an env var lets the preflight gate work without changing the harness. The flag form is one diff away if a future operator surface demands it.
  • Banner emit is unconditional, single-source. When HARBOR_DEV_ALLOW_MOCK=1 fires, the banner [DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION] is printed exactly once on stderr at boot. The emit lives in cmd/harbor/devmock.go::registerMockIfDevAllowMock. Tests can inject a bytes.Buffer as the stderr sink and assert the banner literal.
  • LLM-backed Summarizer lives in internal/llm/summarizer/ (in-package — not a new top-level directory). The Summarizer is a thin compose over the existing LLMClient + a versioned compaction prompt (PromptVersion = "v1"). When the operator picks memory.strategy: rolling_summary, the dev cmd reaches for inmem.New(cfg, deps, Options{Summarizer: summarizer.New(client)}) directly — the registry-path memory.Open does not accept a Summarizer injection (Phase 23 omission). Operators on sqlite / postgres memory drivers + rolling_summary are rejected at boot with a clear "not yet wired" error pointing to docs/plans/phase-25 — a follow-up issue.
  • EchoSummarizer stays in internal/memory/strategy/ (NOT moved to a testfixtures/ subdir). The §13 amendment's "test stubs as production defaults" gate is satisfied because: (a) the production dev cmd never imports internal/memory/strategy.EchoSummarizer; (b) the rolling_summary path runs through llm/summarizer.New instead. Tests that need EchoSummarizer keep working without changes. Moving the type to a build-tagged subpackage would force every consuming test to declare the tag — scope creep without a real safety win.
  • Dev signer is ephemeral ES256. A fresh keypair on every harbor dev boot. The matching default-identity dev token is printed to stderr as HARBOR_DEV_TOKEN=<jwt>. The dev key is in-memory only — never persisted, never exposed via any Protocol endpoint. The token's identity triple is (tenant=dev, user=dev, session=dev) plus admin + console:fleet scopes so an operator can subscribe to fleet events out of the box. ES256 (vs RS256) for fast keygen — the dev loop regenerates per boot.
  • Smoke exercises the LLM seam, not just /healthz. scripts/smoke/phase-64.sh boots harbor dev with HARBOR_DEV_ALLOW_MOCK=1, parses HARBOR_DEV_TOKEN out of the preflight server log, submits a start over /v1/control/start with the Bearer token, and asserts a non-empty task_id in the JSON response. The mock LLM is the deterministic, network-free driver underneath the safety + corrections + downgrade + retry + governance chain — same wiring path bifrost uses. A second assertion boots a fresh bin/harbor in a tmp dir with no config and asserts non-zero exit + named-field error (constraint #5 fail-loud half).
  • The Protocol mux mounts under /v1/. The dev cmd builds a fresh http.ServeMux exposing /healthz + /readyz + a catch-all /v1/ delegating to transports.NewMux(surface, bus, WithValidator(validator)). The auth middleware fail-closes any request without a Bearer token. The Phase 60 trust-based identity carrier headers (X-Harbor-*) are still read but treated as a fallback by the middleware.
  • Preflight env-var contract. scripts/preflight.sh now exports HARBOR_DATA_DIR so per-phase smoke scripts (Phase 64+) can read the dev server's log file (${HARBOR_DATA_DIR}/server.log) to parse the dev token. Older smokes ignore the env; new ones read it. This is the §17.6 fix-it-where-you-find-it pattern: the smoke gate needed a way to surface the token without a separate HTTP endpoint.
  • §17.6 cross-phase fixes landed in this PR: internal/llm/coverage_test.go::TestApplyDefaults_FillsZeros was pinned to the pre-Phase-64 default (mock); after the flip the test fixture needed Driver: "mock" explicitly. The PR includes the one-line fix.

Phase 64a — sibling phase for catalog OAuth + approval wiring (D-090). Pre-plan constraint #7 (tool catalog wires Phase 30 OAuth + Phase 31 approval gates from operator config — issue #104) was scoped out of Phase 64 because the tool catalog extension touches internal/tools/catalog.go + every tool registration site + a new operator-facing config block. The split was permitted by the master-plan pre-plan note's "may split into sibling phase" clause. Phase 64a is dispatched in the same wave (Stage 3b); its decision entry is D-090 (pre-assigned but not yet authored).

Departures from the pre-plan note:

  • Constraint #1 — kept the mock package importable from the test surface (no harbor_testfixtures build tag). The pragmatic justification is recorded above; the §13 amendment in spirit is satisfied because the mock cannot run as the default and the only operator path is the explicit env-var escape hatch with a banner.

§13 primitive-with-consumer rule discharge:

  • The Phase 60 SSE/REST mux primitive (no first production consumer until now) is the consumer of choice. cmd/harbor/cmd_dev.go::bootDevStack builds a transports.NewMux(...), mounts it under /v1/, and serves under a real http.Server.
  • The Phase 61 JWT auth validator primitive is the consumer of choice. The same dev stack builds a auth.NewValidator(devSigner.KeySet(), WithRedactor(red), WithEventBus(bus)) and passes it to transports.NewMux(..., WithValidator(validator)).
  • The memory.Summarizer interface (declared by Phase 23, no production implementation until now) is the consumer of choice. internal/llm/summarizer.New(client) is the first non-test implementation; harbor dev is the first non-test caller.

D-090 — Phase 64a tool catalog OAuth + approval wiring: per-tool tools.entries[] operator config + Builder + WrapWithApproval/WrapWithOAuth + approval-outermost composition order + §13 fail-loud on unknown policy/provider/tool + restart-required (hot reload deferred to Phase 65)

Date: 2026-05-15 Status: Settled

Where it lives:

  • internal/tools/catalog/catalog.go (new package: Builder + WrapWithApproval + WrapWithOAuth + Deps + sentinels)
  • internal/tools/catalog/catalog_test.go (unit + allowlist mirror tests)
  • internal/tools/catalog/concurrent_test.go (D-025 N=128 concurrent-reuse under -race)
  • internal/tools/tools.go (new CatalogReplacer optional interface)
  • internal/tools/catalog.go (in-memory *catalog gains Replace method)
  • internal/config/config.go (ToolsConfig.Entries + ToolEntryConfig + ToolApprovalConfig + ToolOAuthConfig)
  • internal/config/validate.go (tools.entries[] invariants + allowedApprovalPolicies + allowedOAuthBindingScopes)
  • internal/config/validate_test.go (per-entry validation cases)
  • cmd/harbor/cmd_dev.go (boot-stack wiring: constructs catalog + Coordinator, applies cfg.Tools.Entries, exposes catalog + Coordinator on devStack for future phases)
  • scripts/smoke/phase-64a.sh (7 assertions: package + integration + harbor validate accept/reject + static guards)
  • test/integration/phase64a_catalog_wiring_test.go (cross-subsystem E2E: APPROVE/REJECT round-trips + OAuth happy/auth-required + composition-order pin + concurrency stress + leak)
  • docs/plans/phase-64a-tool-catalog-wiring.md (per-phase plan)
  • docs/plans/README.md (Phase 64a row appended + pre-plan note constraint #7 marked closed)
  • README.md (Status row Phase 64a → Shipped)

The decision (cluster, recorded so a future planner cannot relitigate):

  • Operator config shape is tools.entries[]. Each entry is {name, approval?, oauth?}. The empty middleware block (approval AND oauth both nil) is rejected — an entry that wraps nothing is a configuration typo. Unknown policy / unknown binding scope / unknown OAuth provider / duplicate entry name / empty entry name all fail closed with a wrapped error naming the offending field path. Phase 68 harbor validate inherits every new rule through internal/config.Validate — no validator code in cmd/harbor.
  • Wrapper composition order: approval outermost, OAuth innermost. Rationale: approval is the gate operators expect to fire FIRST. A HITL "Approve call to <tool>?" prompt should pop BEFORE any OAuth dance starts; rejecting approval avoids consuming the user's OAuth-completion attention. OAuth's *ErrAuthRequired propagates UP through the approval wrapper unchanged (the gate's RunGuarded returns the inner tool's error verbatim post-APPROVE), so when OAuth is needed the planner still observes the typed sentinel and can pause. The reverse order (OAuth outermost) is rejected because it short-circuits the gate.
  • CatalogReplacer interface for atomic per-tool swap. The in-memory catalog adds a Replace([]ToolDescriptor) error method under its existing write lock — concurrent Resolve / List see either every old descriptor OR every new descriptor, never a partial mix. The optional interface ships on the ToolCatalog package surface; future catalog implementations either provide it OR document "no per-tool wiring at boot." Deregister is NOT added — the in-memory catalog stays write-once on Register; Replace is the seam for boot-time swap.
  • Builder is one-shot. New(entries, deps).Apply(ctx) runs once at boot; a second Apply returns ErrAlreadyApplied. Future hot reload (Phase 65) needs an UnApply path; the surface stays small for now.
  • AppliedGates out-channel for in-process resolution. The Deps.AppliedGates map is the optional surface that hands back the constructed gates keyed by tool name. The dev cmd captures this map so future phases (the dispatcher-side ApprovalDispatcher bridge) can route wire-side APPROVE/REJECT into the right gate's pending map. Phase 64a's integration test uses AppliedGates directly to drive the in-process ResolveApproval and exercise the wrapper end-to-end.
  • Wire-side APPROVE/REJECT bridge deferred to Wave 12 (tracked in issue #112). The Protocol approve / reject methods route through steering → Coordinator.Resume; the gate's pending channels are NOT yet observed by the steering apply path. Wiring them up is a dispatcher-side ApprovalDispatcher or a gate-side pause.resumed subscriber — both substantive enough to warrant its own phase. Phase 64a's integration test exercises the gate's in-process ResolveApproval to PROVE the wrapper composition works end-to-end. PR #110's wave-end E2E uses an in-test bridge (runWave11WireBridge) to substitute for production wiring; the Wave 11 §17.5 audit confirmed the gap and filed issue #112.
  • OAuth provider construction deferred to a later phase (tracked in issue #116). Phase 64a wires the BINDING side (tools.entries[].oauth.provider resolves to a provider in Deps.OAuthProviders). The tools.oauth_providers[] operator config + the OAuth-provider-per-source construction lands when the OAuth-callback Protocol method ships. For now, the dev cmd hands applyToolCatalogWiring an empty providers map; an entry that declares oauth will fail closed at boot — the §13 fail-loud is the design.
  • Identity is mandatory in every wrapper. Both WrapWithApproval and WrapWithOAuth read identity via identity.From(ctx) at the wrapper boundary; a missing triple returns approval.ErrIdentityRequired / auth.ErrIdentityRequired. Defence in depth — the gate and the provider also enforce this, but the wrapper surfaces the error early.
  • Coverage: internal/tools/catalog lands at 89.3% (target 80%). Concurrent-reuse test under -race runs N=128 invocations against a single shared wrapped descriptor across both wrapper shapes; no leaks, no cross-talk.

Departures from the pre-plan note:

  • Constraint #7 of the Phase 64 pre-plan note says the Wave 11 wave-end E2E exercises APPROVE/REJECT via the real transports/control HTTP handler. Phase 64a's integration test exercises the gate's ResolveApproval in-process — the wrapper composition + identity propagation + event emission are proven end-to-end, but the wire-side bridge from the steering apply path back into the gate's pending map is NOT in this PR. Rationale: that bridge is a substantive subsystem (a pause.resumed subscriber on the gate, OR a dispatcher-side ApprovalDispatcher owning gates) and its design touches the steering and pause/resume subsystems' contracts. Splitting it out keeps Phase 64a tractable. The Wave 11 wave-end E2E in Stage 4 is the right home for the wire-side round-trip.

§13 primitive-with-consumer rule discharge:

  • The approval.ApprovalGate primitive (Phase 31 / D-086) gains its first non-test catalog consumer — every tools.entries[].approval entry constructs a gate via approval.NewApprovalGate and wraps the descriptor.
  • The auth.OAuthProvider primitive (Phase 30 / D-083) gains its first non-test catalog consumer — every tools.entries[].oauth entry binds the descriptor to a provider via the WrapWithOAuth wrapper.
  • The Phase 50 Coordinator gains a second catalog-shaped consumer (the gate routes pauses through it).
  • The Phase 26 ToolCatalog interface gains a sibling CatalogReplacer for atomic per-tool swap at boot.

Wave-end E2E coupling:

  • Wave 11 Stage 4's test/integration/wave11_test.go will exercise APPROVE/REJECT through the real transports/control HTTP handler. That E2E closes issue #104's Protocol-wire round-trip half. Phase 64a closes the catalog-wiring half (constraint #7).

D-091 — Console deployment posture: separate harbor console subcommand serves the static build; harbor dev is headless at V1; shared chat module encapsulated for future packed dev UI

Date: 2026-05-15 Status: Settled (forward-binding — first consumer lands in the Console wave's harbor console phase)

Where it lives: CLAUDE.md §4.5 (updated in this PR) + AGENTS.md §4.5 (mirror) + docs/research/12-console-deployment-and-shared-ui.md (the supporting brief, landed in this PR) + docs/plans/README.md (Console-wave deployment + shared-library posture pre-plan note, landed in this PR). Future code lands at cmd/harbor/cmd_console.go, cmd/harbor/console_assets.go, web/console/, web/console/src/lib/chat/.

The decision (recorded so a future planner cannot relitigate):

  • Two surfaces, one stack. The full Console ships as a static SvelteKit build served by harbor console. A future single-agent developer UI (post-V1) ships embedded in harbor dev and reuses the Console's chat/playground components via a shared library. Both surfaces are Protocol clients (CLAUDE.md §4.5 #10); neither imports Runtime Go types.
  • harbor console serves the static build via embed.FS. One binary, no --static-dir flag in production (a developer-only escape hatch may be added later), no path-discovery bugs. The cost is a few-MB binary bloat — acceptable per CLAUDE.md §5 "Static binary." The subcommand stays foreground (ctrl-C exits, matches npm run dev's shape); a future --detach flag is one diff away if operator pressure demands it.
  • harbor console is multi-runtime by design. It reads a ~/.harbor/console.yaml listing runtime endpoints + auth, or --runtime <name>=<url> for ad-hoc additions, and bootstraps the browser-side multi-runtime context (Brief 11 §CC-1). The Console NEVER assumes a single runtime; the in-binary embedded build is identical to a remote-deployed build.
  • harbor dev does NOT serve the full Console. Phase 64 (shipped, D-089) is headless (Protocol + LLM seam only). Embedding the Console into harbor dev is rejected because it couples a developer's iteration loop to the operator-facing observability tool — wrong scope, wrong default. The future packed dev UI is a subset of the Console's surface (single-agent chat + traces + logs), shipped via a separate phase, opt-in via a harbor dev --ui flag (or equivalent). Post-V1.
  • Shared chat module: encapsulate first, extract on second consumer. The chat/playground/MCP-Apps-renderer module lives initially at web/console/src/lib/chat/ with two hygiene rules enforced by the introducing phase: (a) no imports of other Console internals from the chat module (only the typed Protocol client, design tokens, Skeleton primitives, and the chat-module's own internals); (b) the chat module exposes a typed ProtocolClient interface the caller injects, never imports a Console-specific singleton. When the packed dev UI phase lands, the extraction to web/shared/chat/ is mechanical (git mv). This pattern matches the §4.4 driver-seam rule: design as if multiple consumers exist; physically split when the second consumer arrives.
  • §13 primitive-with-consumer applies. The shared chat module's first consumer (the full Console's Playground or Live-Runtime page) lands in the SAME wave as the module itself; the module does not ship without its first call site. The future packed dev UI is the second consumer, not the first.
  • Auth-storage model (Brief 11 §"Open architectural questions" #6 resolved). Per-runtime JWTs stored in browser localStorage / IndexedDB, encrypted via WebCrypto with a passphrase the operator enters at first runtime-attach. Loss of passphrase invalidates stored tokens but does NOT corrupt other Console state. AES-GCM with PBKDF2-derived KEK is the obvious starting point; the harbor console phase plan owns the exact algorithm pin.
  • Cross-runtime fleet view is a Console-side aggregator for V1 (Brief 11 §"Open architectural questions" #7 resolved). The harbor console subcommand maintains N persistent Protocol connections; fleet views aggregate client-side. Gateway pattern is post-V1.

Why these specifics matter. The "embed the Console into harbor dev" trap is the natural-feeling default (one binary, one boot), but it violates CLAUDE.md §4.5 #2's decoupled-deployment principle and Brief 11 §CC-1's multi-runtime design. Pinning the posture now prevents a future Console-phase planner from re-deriving the same wrong-feeling-right answer.


D-092 — web/console/ pins Svelte 5 with runes mode; legacy Svelte 4 reactivity is forbidden

Date: 2026-05-15 Status: Settled (forward-binding — applies the first commit that creates web/console/)

Where it lives: CLAUDE.md §4.5 #1 (updated in this PR) + AGENTS.md §4.5 #1 (mirror). Future code lands at web/console/svelte.config.js (compilerOptions: { runes: true }) and web/console/package.json ("svelte": "^5.0.0" exact pin).

The decision:

  • Svelte 5 + runes mode is the only supported reactivity model in web/console/. Components use $state, $derived, $effect, $props exclusively. Legacy Svelte 4 syntax ($: reactive statements, top-level let as reactive state, export let props, store auto-subscription via $store in scripts) is rejected by svelte-check --fail-on-warnings.
  • Rationale. Svelte 5's runes model is the current major. Allowing a mixed codebase (some components in runes mode, some via <svelte:options runes={false}>) is the §13 "two parallel implementations" anti-pattern applied to reactivity. Pinning once at the start prevents parallel-agent dispatches from drifting silently: one agent uses let count = 0; $: doubled = count*2; another uses let count = $state(0); let doubled = $derived(count*2). Both compile, but the codebase fragments before anyone notices.
  • Mechanical enforcement. The first Console phase that creates web/console/ lands svelte.config.js with compilerOptions: { runes: true }; package.json pins "svelte": "^5.0.0"; npm run check in the frontend CI job uses --fail-on-warnings so any legacy-syntax usage fails the build.

D-093 — Protocol TypeScript client generated from internal/protocol/singlesource.CanonicalWireTypes; never hand-written

Date: 2026-05-15 Status: Settled — the "generate" half superseded in part by D-223 (the per-page TS split made the one-generated-file premise stale; D-223 ships a lockstep VERIFICATION gate instead and reserves the cmd/harbor-gen-protocol-ts name for the deferred full generator). The lockstep intent — no Go↔TS drift — stands and is now CI-enforced.

Where it lives: CLAUDE.md §4.5 #5 (updated in this PR) + AGENTS.md §4.5 #5 (mirror). Future code lands at cmd/harbor-gen-protocol-ts/ (the generator), web/console/src/lib/protocol.ts (generated artifact, committed), and Makefile (the protocol-ts-gen + protocol-ts-gen-check targets).

The decision:

  • Generated, not hand-written. The TS Protocol client is mirror-derived from Phase 58's CanonicalWireTypes registry (the Go-side single source). Hand-writing it creates the same mirror-drift trap §18 closes for AGENTS.mdCLAUDE.md, but worse — Go field renames silently break the TS client at runtime, not at compile time.
  • CI gate. make protocol-ts-gen-check (called in the frontend CI job) re-runs the generator and asserts git diff --exit-code is clean. Any drift fails the build. Pattern: the Go-side primitive's owner regenerates in the same PR that changes the Go type.
  • Generator scope. Emit: (a) one TS interface per CanonicalWireTypes registered struct; (b) one method-call stub per methods.go constant; (c) one constant per errors.go error code. Do NOT emit: rendering logic, runtime helpers, or anything Console-specific. The generator is pure shape-translation; UX wrappers live in web/console/src/lib/protocol-helpers.ts (hand-written, calls into the generated types).
  • §13 primitive-with-consumer. The generator (primitive) and its first consumer (protocol.ts imported by the first Console SvelteKit page) ship in the same wave.
  • Why not deferred. "Hand-write now, switch to generated later" is the same trap as "stub now, real impl later" the §13 amendment closes: by the time the hand-written client has 80 method calls, the cost of switching is N×rewrites + drift backlog. Generating from t=0 amortises the tooling cost across the project's lifetime.
  • Hand-edits forbidden. The generated file's header carries // CODE GENERATED BY cmd/harbor-gen-protocol-ts. DO NOT EDIT.. Any commit that modifies protocol.ts without a corresponding regeneration fails the CI check.

D-094 — harbortest/devstack helper extracted for integration test stack assembly

Date: 2026-05-16 Status: Settled

Where it lives:

  • harbortest/devstack/devstack.go (the new helper: Assemble + AssembleOpts + DevStack)
  • harbortest/devstack/devstack_test.go (unit tests pinning the four Skip* shapes + the catalog-wiring + identity-override paths)
  • harbortest/devstack/export_test.go (in-package _test.go exposing TryAssemble so error paths are unit-testable without faking *testing.T)
  • test/integration/wave11_test.go (buildWave11Stack rewritten to call devstack.Assemble)
  • test/integration/phase64_harbor_dev_helpers_test.go (buildPhase64TestStack rewritten to call devstack.Assemble with an LLMConfigSnapshot override)
  • test/integration/phase64a_catalog_wiring_test.go (buildPhase64aEnv rewritten to call devstack.Assemble with SkipAuth + SkipTransports + SkipSteering)
  • test/integration/phase31_approval_gates_test.go (buildPhase31Env rewritten to call devstack.Assemble with SkipAuth + SkipTransports + SkipCatalog)

Decision. Per-test dev-stack assembly is now centralised in harbortest/devstack.Assemble. The four integration test files that previously duplicated ~100–200 LOC each (wave11_test.go, phase64_harbor_dev_test.go, phase64a_catalog_wiring_test.go, phase31_approval_gates_test.go) now call Assemble(t, cfg, AssembleOpts{...}) and consume a typed DevStack struct.

Why. The Wave 11 §17.5 audit (issue #115) pinned this as a drift risk: when the production boot order in cmd/harbor/cmd_dev.go::bootDevStack changes — and it will, as the #112 / #114 follow-ups land — every test stack would silently drift away from production. The helper makes "tests track production" mechanically enforceable: a change to bootDevStack that breaks Assemble's contract surfaces immediately as a test build error rather than a wave-end audit finding two PRs later.

How to apply. New integration tests construct stacks via harbortest/devstack.Assemble, never inline. When you change cmd/harbor/cmd_dev.go::bootDevStack, update Assemble in the same PR — the helper's top-of-file comment names bootDevStack as the source of truth. The §17.6 "fix what the integration test finds — no matter where the bug lives" rule applies: a production boot-order change that does not also update Assemble is a deferred fix, not a clean PR.

Acceptance:

  • harbortest/devstack package exists with Assemble + AssembleOpts + DevStack types.
  • All four integration test files use the helper.
  • go test -race ./test/integration/... ./harbortest/... passes.
  • The helper's coverage is 85.2% (well above the 80% target). The four Skip* shapes each have a dedicated test asserting the expected non-nil / nil field set; the error-returning core (tryAssemble) is exposed via an in-package export_test.go so cfg-validation + duplicate-registration + Builder-failure paths are unit-tested without faking *testing.T.

Departures from the pre-plan note:

  • The issue's API draft proposed an AssemblePhase64Style factory; the implementation lands a single Assemble with SkipAuth / SkipTransports / SkipCatalog / SkipSteering knobs because all four target tests want different layer subsets and a single helper with skip flags is shorter than four named factories. The naming convention "phase64-style" is informal; the production source of truth is bootDevStack regardless.
  • phase31 and phase64a originally constructed events.EventBus directly via eventsInmem.New(EventsConfig{...}, redactor) with tight per-test buffer knobs. The migration synthesises a full *config.Config per test and reuses the matching cfg.Events knobs verbatim — behaviour is unchanged; the indirection is acceptable per CLAUDE.md §4.3 ("a phase plan that deviated permanently … reflects the deviation in the master plan's detail block").

§13 primitive-with-consumer. The helper (primitive) ships with four consumers in the same PR. No deferred consumer.


D-095 — tools.oauth_providers[] operator config + OAuth provider driver registry (closes #116, closes D-090's deferral)

Date: 2026-05-16 Status: Settled (shipping with this PR)

Where it lives: internal/config/config.go (ToolsConfig.OAuthProviders + ToolOAuthProviderConfig + ToolsConfig.OAuthTokenKEKEnv); internal/config/validate.go (allowedOAuthDrivers + the per-provider and cross-validation rules); internal/tools/auth/registry.go (driver registry); internal/tools/auth/drivers/oauth2/oauth2.go (V1 default driver); cmd/harbor/cmd_dev.go::applyToolCatalogWiring (the boundary that walks the config and populates the catalog builder's Deps.OAuthProviders); cmd/harbor/main.go (blank-import); examples/dev.yaml (operator-facing block).

Decision. OAuth provider construction now flows from the operator config: tools.oauth_providers[] declares named providers; each entry resolves to a driver via the §4.4 registry pattern (internal/tools/auth/drivers/<name>/). The V1 default driver is oauth2 — generic OAuth2/PKCE Authorization Code flow. cmd/harbor/cmd_dev.go::applyToolCatalogWiring walks the config and populates the catalog Builder's Deps.OAuthProviders map.

Why. D-090 deferred the provider-construction surface ("a tools.oauth_providers block lands in a later phase"). That gap meant any operator declaring tools.entries[].oauth got a fail-loud at boot — correct but useless: there was no way to actually configure a provider. Issue #116 from the Wave 11 §17.5 audit pinned the gap.

How to apply.

  • New OAuth flow types add a driver under internal/tools/auth/drivers/<name>/ following the §4.4 seam pattern: self-register via init() → auth.MustRegister(name, New) and add the name to internal/config/validate.go's allowedOAuthDrivers allowlist in the same PR.
  • Operators declare providers in harbor.yaml under tools.oauth_providers[]; each entry references its driver by name and uses env-var indirection for client_id_env / client_secret_env (§7 rule 2 — never hardcoded).
  • The KEK for AES-256-GCM token encryption at rest comes from one operator env var named in tools.oauth_token_kek_env. The dev stack constructs ONE shared auth.TokenStore + auth.Sealer and passes them into every factory call via auth.FactoryDeps.
  • Identity propagates through every provider call per §6; the registry never accepts a request without a triple.
  • Credentials enter via env-var indirection (client_id_env, client_secret_env); never hardcoded, never logged (§7).

Acceptance.

  • internal/config schema declares OAuthProviders[] + OAuthTokenKEKEnv with a validator that rejects unknown drivers, duplicate names, empty env-var fields, the missing-KEK-env-when-providers-set case, and unresolved entries[].oauth.provider references.
  • internal/tools/auth adds the Factory type + Register / MustRegister / Resolve registry + ProviderConfig boundary type + FactoryDeps.
  • internal/tools/auth/drivers/oauth2/ ships the V1 default driver with a fail-loud constructor (empty client_id / client_secret / endpoints / redirect_url all return typed errors) and a D-025 concurrent-reuse test (N≥128 concurrent invocations under -race).
  • cmd/harbor/cmd_dev.go::applyToolCatalogWiring populates the map from config; the function's godoc no longer carries the "Phase 64a does NOT yet construct OAuth providers" deferral note.
  • cmd/harbor/main.go blank-imports _ "github.com/hurtener/Harbor/internal/tools/auth/drivers/oauth2".
  • examples/dev.yaml documents the block with one realistic GitHub entry.
  • scripts/smoke/phase-64a.sh asserts both the unknown-provider error path (harbor validate rejects with the unknown-provider name in the message) and the missing-KEK-env error path.
  • D-090's "Deferred" note about OAuth provider construction is now closed by D-095 (this entry); D-090 itself is left untouched as a historical record.

§13 primitive-with-consumer. The primitive (registry + iface + driver) and its first consumer (cmd/harbor/cmd_dev.go::applyToolCatalogWiring populating the catalog builder's Deps.OAuthProviders) ship in the same PR. The smoke surface exercises both the validator (pre-boot) and the boot path (harbor dev constructs the providers from config or fails loud).

Source-binding scope (V1 simplification). Phase 30's *auth.Provider.Token(ctx, source) API keys by tools.ToolSourceID. The V1 oauth2 driver constructs ONE *Provider per tools.oauth_providers[] entry with a single OAuthConfig whose Source = ToolSourceID(cfg.Name). The catalog wrapper (internal/tools/catalog.WrapWithOAuth) passes the underlying tool's source ID, which may not match the provider name; the driver transparently retargets every Token / Revoke / InitiateFlow call onto the operator-configured source. Future per-vendor drivers (e.g. google-workspace, github-app) may implement more sophisticated multi-source mappings; the V1 default keeps the operator's mental model simple: one provider declaration → one OAuth attachment.


D-096 — PauseResumedPayload.Decision typed marker (closes #113)

Date: 2026-05-16 Status: Settled (binding — landed in the PR that closes issue #113)

Where it lives: internal/runtime/pauseresume/decision.go (the new typed enum), internal/runtime/pauseresume/events.go (the Decision Decision field on PauseResumedPayload), internal/runtime/pauseresume/coordinator.go (the extended Resume signature), RFC-001-Harbor.md §3.3 (the canonical-event-shape note).

Decision. internal/runtime/pauseresume.PauseResumedPayload gains a typed Decision field (values: approve, reject, resume, timeout). Coordinator.Resume's signature is extended to take the typed Decision parameter; all in-tree producers (steering.applier.advancePause for RESUME/APPROVE/REJECT controls; approval.ApprovalGate.ResolveApproval for HITL gates; auth.Provider.HandleCallback for OAuth flow completion) populate it. An unknown / empty Decision is rejected loud with the new ErrInvalidDecision sentinel — there is no untyped default. The wave-end E2E (test/integration/wave11_test.go) and any other consumer that previously parsed free-form Reason strings now switches on the typed field.

Why. PR #110's wave-end E2E worked around the gap by subscribing to tool.approved / tool.rejected and inferring the resolution kind from the per-tool event type. Wave 11 §17.5 audit (issue #113) pinned that as a §13 violation — overloading the typed event shape against a non-existent typed enum is a "parallel implementation of the same conceptual feature" smell read sideways: the typed enum that should exist gets simulated by tag dispatch on a sibling event. The runtime-level pause.resumed event is the canonical place to learn how a pause terminated; the per-tool events are routing surfaces, not classification surfaces.

Why a new enum (not reusing approval.ApprovalDecision). The approval package already exports ApprovalDecision with values approve / reject / pending. That enum is approval-specific by design — pending is the gate's implicit parked state. The pause/resume Coordinator needs a broader enum that covers tool-side OAuth completion (resume) and deadline-driven resumes (timeout) — neither belongs in an approval-decision vocabulary. Adding resume / timeout to ApprovalDecision would pollute approval-gate code with values that are nonsensical there; defining a parallel "ApprovalDecision vs PauseResumeDecision" split with overlapping approve/reject values would be the §13 two-parallel-implementations smell. The right factoring keeps the gate-internal enum narrow and the coordinator-edge enum broad; the approval gate maps its internal Approve/Reject onto pauseresume.DecisionApprove/DecisionReject at the Coordinator seam.

How to apply. New producers populating a pause.resumed event MUST set Decision. Wire consumers (the Console, third-party clients, integration tests) consume the typed value; do not regress to Reason-string parsing. The Reason field stays for human-readable context; Decision is the load-bearing type. Test stubs that implement pauseresume.Coordinator (e.g. steering.stubCoordinator) update their Resume method signature in the same PR — no parallel "with-Decision / without-Decision" overload.

Acceptance.

  • PauseResumedPayload carries Decision Decision.
  • Coordinator.Resume(ctx, token, decision Decision, payload map[string]any) is the new signature; an unknown/empty Decision is rejected with ErrInvalidDecision.
  • All in-tree producers populate the field: steering.applier.advancePause maps ControlResume/ControlApprove/ControlRejectDecisionResume/DecisionApprove/DecisionReject; approval.ApprovalGate.ResolveApproval maps ApprovalDecisionpauseresume.Decision; auth.Provider.HandleCallback populates DecisionResume.
  • test/integration/wave11_test.go subscribes to pause.resumed and switches on Decision, asserting the typed marker arrives alongside the per-tool events. The per-tool tool.approved / tool.rejected subscriptions are preserved because they carry the Tool name (the per-tool channel is orthogonal to the decision-discrimination channel); only the decision-discrimination workaround the audit flagged is removed.
  • pauseresume.IsValidDecision covers the four canonical values; a new unit test pins each emits the right typed marker on the pause.resumed event.
  • RFC §3.3 documents the typed Decision field as part of the canonical resume event shape.
  • All tests -race green.

Wave-11 cross-fix bundled. The wave-end test stack previously constructed pauseresume.New() with no WithBus(bus) option, so pause.requested / pause.resumed never landed on the bus — the same gap that motivated the audit's tool.approved / tool.rejected workaround. After PR #120 (D-094) extracted harbortest/devstack.Assemble, the omission moved from buildWave11Stack into the helper; this PR wired the bus into the Coordinator at the helper boundary (search for pauseresume.New(pauseresume.WithBus(bus)) in harbortest/devstack/devstack.go) so every devstack-built test inherits the fix (CLAUDE.md §17.6: integration tests fix what they find, regardless of which phase originally shipped the gap). The matching production wiring landed later as F1 of the Wave 11.5 §17.5 closeout audit — the test-only fix here perpetuated the test↔production divergence until F1 closed it in cmd/harbor/cmd_dev.go::bootDevStack.


D-097 — steering.RunLoop wired into harbor dev + bridges APPROVE/REJECT into ApprovalGate (closes #112 + #114)

Date: 2026-05-16 Status: Settled (shipping with this PR)

Where it lives: internal/runtime/steering/runloop.go (the new WithApprovalGates option); internal/runtime/steering/apply.go (applier.gates + routeThroughGate + wireGateTokenFromPayload + the option-A bridge wiring in advancePause); internal/runtime/steering/bridge_test.go (the in-package bridge tests against real gate + real Coordinator); cmd/harbor/cmd_dev.go::bootDevStack (originally constructed the planner via react.New(llmClient) as a hardcoded V1 default — closed by D-103, which moves the construction onto the internal/planner driver registry; the cmd_dev path now reads planner.Resolve(ctx, cfg.Planner, planner.FactoryDeps{LLM: llmClient})); cmd/harbor/cmd_dev_runloop.go (the new perTaskRunLoopDriver that subscribes to task.spawned and runs the RunLoop per spawned foreground task); cmd/harbor/cmd_dev_runloop_test.go (driver unit tests against real bus + real RunLoop); harbortest/devstack/devstack.go (mirrors the production wiring per D-094 — adds SkipRunLoop + RunLoop + RunLoopDriver fields, constructs both via newDevStackRunLoopDriver; the planner construction also moved to the registry per D-103 with a PlannerOverride test escape hatch); test/integration/wave11_test.go (the in-test runWave11WireBridge ~100 LOC bridge is GONE; replaced with startWave11RunLoopForRun — a small helper that constructs a production RunLoop with a scripted pausing planner so the production bridge fires); RFC-001-Harbor.md §3 (one-paragraph note on the bridge).

Decision. cmd/harbor/cmd_dev.go::bootDevStack now constructs a steering.RunLoop per spawned task: a per-stack RunLoop (shared, concurrent-safe per D-025) is wired with WithApprovalGates(appliedGates), and a new perTaskRunLoopDriver subscribes to task.spawned events bus-wide and launches a goroutine per spawned foreground task that calls runLoop.Run(ctx, spec) with the task's identity quadruple. The RunLoop's drain path observes ControlApprove / ControlReject from the steering inbox; for each event carrying a token key in its wire payload, the new routeThroughGate helper looks up the matching *approval.ApprovalGate in the gates map and calls gate.ResolveApproval(ctx, token, decision, reason). The map is sourced from applyToolCatalogWiring's AppliedGates out-channel (D-090).

Why. Two gaps closed at once:

  • #114: steering.RunLoop had zero production consumers — harbor dev advertised itself as a runtime but the planner-step loop was unwired. A start request reached tasks.TaskRegistry.Spawn and the task sat there forever (the wave-11 §17.5 audit finding A3, applied to the broader composition: §13's "test stubs as production defaults on operator-facing seams" amendment, read sideways).
  • #112: a Console operator approving a paused tool got 200 OK from the wire and a steering.applied event, but the gate's pending map never saw the resume — the wrapped tool's Invoke stayed parked forever. PR #110's wave-end E2E worked around this with a ~100-LOC in-test bridge (runWave11WireBridge).

The audit recommended these ship together because A1's bridge is naturally the RunLoop's drain; splitting them creates either a primitive-without-consumer window (Phase 114 ships with no gate resolution) or a parallel ApprovalDispatcher service that duplicates the inbox-drain.

Bridge shape (decided, not re-litigated). Issue #112 named three valid shapes:

  1. RunLoop owns the bridge (DECIDED — this entry's implementation). The drain path inside RunLoop.Runapplier.applyEventapplier.advancePauseapplier.routeThroughGate is the wiring. internal/runtime/steering imports internal/tools/approval for the gate type. Both are runtime mechanism; the boundary is acceptable.
  2. Gate subscribes to pause.resumed events (rejected — adds bus dependency to every gate; D-096's typed Decision marker makes shape 2 possible later if needed, but YAGNI).
  3. Separate ApprovalDispatcher service (rejected — duplicates RunLoop's inbox drain).

How to apply.

  • New steering-driven side effects (HITL approval, future OAuth callback completion, future A2A INPUT_REQUIRED) follow the same pattern: extend the apply path with a typed look-up, call the relevant resolver, identity flows via ctx.
  • The AppliedGates map handed into RunLoop is the SAME map the catalog Builder populates (no copy, no shadow). Tests that exercise the bridge can inject their own gates via the same surface.
  • harbortest/devstack.Assemble constructs the RunLoop by default; SkipRunLoop: true opts out for tests that don't need it.

Double-resume guard (option A — gate-owned resume). gate.ResolveApproval calls Coordinator.Resume for the gate's pause token (wireToken). The RunLoop's own outstanding pause (token) is a DIFFERENT pause — the planner-side RequestPause token. When the bridge routes through a gate, the direct Coordinator.Resume call below would EITHER resume the RunLoop's own pause (different token — safe; two separate Resumes) OR — when wireToken == token (the planner itself RequestPaused AND the planner happened to wrap a tool call in an approval gate, sharing the token) — would trigger pauseresume.ErrAlreadyResumed. The bridge guards against the second case with an explicit wireToken == token early-return after routing. For the common shape (planner runs idle, gate's pause is independent), both pauses resume cleanly in sequence: gate.ResolveApproval first, then the direct path on the RunLoop's token. The wire-side token payload key is the canonical channel — its absence means the APPROVE/REJECT targets the RunLoop's own pause (OAuth, A2A AUTH_REQUIRED), preserving the pre-D-097 behaviour.

Identity flow. The bridge runs in the steering apply path's identity ctx (the RunLoop hands runCtx carrying the run's quadruple). gate.ResolveApproval enforces protocolauth.HasScope(ctx, ScopeAdmin) || HasScope(ctx, ScopeConsoleFleet) as defence-in-depth. The Phase 54 Protocol edge already vetted the caller's scope at inbox-Enqueue time via CheckScope; routeThroughGate re-stamps the ctx with WithScopes([ScopeAdmin, ScopeConsoleFleet]) so the gate's check passes. The elevation is scoped to this single ResolveApproval call (a derived ctx, never propagated back to the caller). The Coordinator's identity-tuple scope check is unchanged — the run's triple already matches the gate's pause-identity tuple by construction.

§13 primitive-with-consumer. Stage B's primitive (the bridge in routeThroughGate + WithApprovalGates) ships with its consumer (production harbor dev wires it in bootDevStack; the wave-end E2E exercises it WITHOUT the in-test bridge). The new cmd_dev_runloop_test.go pins the driver-side wiring; the new bridge_test.go pins the apply-side routing against the REAL approval gate + REAL Coordinator. No deferred consumer.

Bundled cross-fix — RunLoop lifecycle gotcha. The driver subscribes to task.spawned events with events.Filter{Admin: true} (per CLAUDE.md §6 rule 5's runtime-internal fan-in carve-out — the driver listens across every (tenant, user, session) triple). Per-task RunLoop goroutines inherit the driver's subCtx; driver.Close cancels that ctx and waits for the WaitGroup to drain. Close is idempotent. The task FSM bridge — translating a RunLoop's Finish decision into Mark{Complete,Failed} on the task — was intentionally NOT in this PR; the deferral was filed as issue #123 with the intent to land in Wave 12. Closed by D-098 (the FSM bridge now lives in the same per-task goroutine that calls runLoop.Run; the dispatch-executor framing in this note was abandoned for the simpler driver-direct shape — see D-098's "Why shape 1 over shape 2" section).

Acceptance:

  • steering.RunLoop accepts Gates map[string]*approval.ApprovalGate at construction via WithApprovalGates.
  • cmd/harbor/cmd_dev.go::bootDevStack constructs and drives the RunLoop per spawned task, with the AppliedGates wired.
  • harbortest/devstack.Assemble mirrors the production wiring; SkipRunLoop knob added.
  • test/integration/wave11_test.go drops runWave11WireBridge entirely; the production path is exercised end-to-end.
  • New bridge_test.go asserts the wire-side APPROVE/REJECT routes through the gate against the REAL Coordinator with NO in-test substitution.
  • New cmd_dev_runloop_test.go asserts the driver picks up task.spawned, drives the RunLoop, skips background tasks, drains cleanly on Close.
  • D-090 §"Deferred" recorded the wire-side bridge gap; this entry closes it (D-090 is left untouched as historical record).

D-098 — perTaskRunLoopDriver translates RunLoop exits into TaskRegistry.Mark{Complete,Failed} (closes #123)

Date: 2026-05-17 Status: Settled (shipping with this PR)

Where it lives: cmd/harbor/cmd_dev_runloop.go::perTaskRunLoopDriver.runOne (the per-task goroutine that now also calls MarkRunning before runLoop.Run and MarkComplete / MarkFailed based on the Run exit shape); cmd/harbor/cmd_dev.go::bootDevStack (the perTaskRunLoopDriverOpts.tasks field is wired with the boot's taskReg); cmd/harbor/cmd_dev_runloop_test.go (new tests TestPerTaskRunLoopDriver_FSMBridge_MarksComplete / _MarksFailed_OnPlannerError / _MarksFailed_OnCtxCancel pin the three terminal-state paths); harbortest/devstack/devstack.go (DevStackRunLoopDriver mirrors the production bridge per D-094's helper-tracks-production rule — gains a tasks field and a runOne that calls the same Mark* sequence); test/integration/phase64_task_fsm_bridge_test.go (end-to-end coverage through the devstack helper using the real ReAct planner + mock LLM, asserting the FSM reaches a terminal state).

Decision. cmd/harbor/cmd_dev_runloop.go::perTaskRunLoopDriver now owns the task FSM bridge: the per-task goroutine that calls runLoop.Run ALSO calls tasks.MarkRunning before Run (advancing Pending → Running so the registry's FSM table — which forbids Pending → Complete — accepts the eventual terminal Mark) and translates Run's exit shape into the matching Mark* call after Run returns. Three exit shapes map to three Mark* calls:

  • Run returned nil with Finish.Reason == FinishGoalMarkComplete(ctx, taskID, TaskResult{}). The result Value is empty at V1; persisting Finish.Payload would require the LLM-edge heavy-content redaction + ArtifactRef shaping (D-026) which is a separate post-V1 concern.
  • Run returned nil with Finish.Reason ∈ {NoPath, Cancelled, DeadlineExceeded, ConstraintsConflict}MarkFailed with the FinishReason as the error code and a human-readable message naming the reason. The registry FSM has no "no-goal-but-not-failed" status; Failed is the closest terminal match for "the run terminated but did not satisfy the goal."
  • Run returned a non-nil errorMarkFailed with code runloop_error (or cancelled for context.Canceled — see "Cancellation handling" below) and the error string as the message.

Why. Closes the deliberate carve-out D-097 documented and issue #123 filed. Before this PR a foreground task spawned under harbor dev reached StatusPending, the perTaskRunLoopDriver picked it up and ran the planner to completion, and the task stayed at StatusPending forever — the FSM diverged from reality. A Console operator querying tasks.List(StatusPending) would see tasks accumulate even after their planners had finished cleanly. This is the same shape §13's "test stubs as production defaults" amendment closes one layer up: an operator-facing seam (the task FSM) that ships looking complete but is silently broken on the happy path.

Why shape 1 over shape 2 (driver-direct over bus-driven FSM bridge). Issue #123 named two valid shapes:

  1. perTaskRunLoopDriver calls TaskRegistry.Mark directly (CHOSEN). Driver imports internal/tasks and dispatches on the RunLoop exit. Pro: lives next to existing driver code; the driver already owns the per-task lifecycle (it constructed the goroutine that calls runLoop.Run); extending that goroutine to translate the return value into Mark* is one-step coupling at the same layer. Pro: the driver already imported internal/tasks (it reads tasks.TaskSpawnedPayload from the bus event); adding tasks.TaskRegistry.Mark* is a small additional surface, not a new layer.
  2. Bus-driven FSM bridge (rejected). Would require RunLoop to emit a typed exit event (steering.runloop.finished or equivalent) AND a separate subscriber AND that subscriber owns the task-keyed mapping the driver already has. More moving parts for marginal separation — and the only "win" would be decoupling the driver from internal/tasks, which it is already coupled to.

If shape 1 turns out wrong for a reason a future PR can articulate, the driver-direct path can be lifted into a bus-driven subscriber by introducing the typed exit event then — but until that reason materialises, shape 1 is the simplest correct shape.

How to apply. Future task-lifecycle subscribers (the Console, third-party Protocol clients, audit consumers) consume the canonical task.completed / task.failed events the TaskRegistry already emits when MarkComplete / MarkFailed fires — no new event type needed. The driver's bridge is internal mechanism; the canonical observability surface stays unchanged.

Cancellation handling. context.Canceled is the third terminal shape runLoop.Run can return (driver shutdown OR an explicit cancel of the run's ctx). The registry FSM has no "auto-cancelled by ctx" path — TaskRegistry.Cancel(ctx, id, reason) is the external-caller surface and requires a reason. We map ctx-cancelled runs to MarkFailed with code cancelled and the ctx error string as the message. Rationale: the run did not reach a goal; Failed is the correct terminal state. An operator who wants the deliberate-cancel semantics (which transition to StatusCancelled, not StatusFailed) calls TaskRegistry.Cancel directly — that path routes through the explicit Cancel API and uses StatusCancelled. The driver's ctx-cancel is a forced-shutdown signal (the binary is going down), not a deliberate cancel decision, so collapsing it onto MarkFailed{code=cancelled} is the truthful FSM state.

Mark* failures post-Run are logged, not escalated. If MarkComplete / MarkFailed errors after Run returns (e.g. the task was concurrently transitioned to StatusCancelled via the external Cancel path, or the registry is unhealthy), the driver logs Warn and returns. The per-task goroutine continues; the next spawned task is unaffected. Tearing down the driver on a per-task Mark* failure would be a denial of service: one race with an external Cancel would stop the whole runtime. This is the standard "fail loudly per-call, do not crash the shared artifact" shape D-025 names.

Identity propagation. The driver constructs taskCtx := identity.With(d.subCtx, q.Identity) once per run and passes it to MarkRunning / MarkComplete / MarkFailed. The TaskRegistry rejects calls missing the triple per §6 rule 9 (identity is mandatory); the explicit identity.With here is the same call site §6 mandates for every identity-scoped storage method. The driver does NOT call identity.MustFromd.subCtx is the long-lived subscription ctx that never carries identity, so we attach it explicitly per run.

§13 primitive-with-consumer. This PR closes a gap, not introduces a primitive. The primitive (TaskRegistry.Mark{Running,Complete,Failed}) already had consumers (the per-driver test suites under internal/tasks/drivers/inprocess/; the conformance suite under internal/tasks/conformancetest/); the new consumer added here is the production driver — the one that was previously missing and whose absence kept the FSM stuck at StatusPending. The "no primitive without its consumer in the same wave" rule is not relevant for this PR (the primitive is from Phase 20).

Bundled invariant — concurrent-reuse contract holds. The driver's per-task goroutines now block on runLoop.Run AND a post-Run Mark* call; both honour d.subCtx. Close cancels subCtx, waits for subLoopWG, then waits for runsWG. The Mark* call uses the SAME ctx-derived taskCtx so a cancelled subCtx cancels the registry call too — the WaitGroup-drain pattern from D-097's lifecycle gotcha still holds. The new TestPerTaskRunLoopDriver_Close_DrainsRunningRuns and _ConcurrentReuse_NoRaceUnderLoad tests pin this; both run under -race.

Bundled invariant — D-094 helper-tracks-production rule holds. harbortest/devstack/devstack.go was updated in the SAME PR as cmd/harbor/cmd_dev.go. DevStackRunLoopDriver now carries a tasks field and a runOne method that mirrors the production bridge. Skipping the helper update would have left the wave-end E2E (and every future integration test) silently divergent from production — the F1 failure mode §17.6 explicitly calls out.

Acceptance:

  • perTaskRunLoopDriverOpts.tasks is mandatory; newPerTaskRunLoopDriver fails loud on nil.
  • runOne calls MarkRunning before runLoop.Run and MarkComplete / MarkFailed based on the Run exit shape (FinishGoal → Complete; other FinishReason → Failed with the reason as the code; non-nil error → Failed with runloop_error or cancelled).
  • bootDevStack wires taskReg into the driver opts.
  • harbortest/devstack.Assemble mirrors the production wiring — DevStackRunLoopDriver gains a tasks field, the constructor fails loud on nil, and the per-task goroutine runs the same bridge.
  • New unit tests pin the three terminal-state paths: _MarksComplete, _MarksFailed_OnPlannerError, _MarksFailed_OnCtxCancel. Each asserts on reg.Get(taskID).Status reaching the expected terminal.
  • New integration test TestTaskFSMBridge_ProductionPath_ReachesTerminalState exercises the bridge end-to-end through devstack.Assemble with a real ReAct planner + mock LLM, asserting the FSM reaches a terminal state within a bounded timeout.
  • Issue #123 closed; D-097's "deliberate carve-out" note updated to point at D-098.
  • All tests -race green; make preflight PASS; make drift-audit clean; make check-mirror clean; npx markdownlint-cli2 docs/decisions.md 0 errors.

D-099 — harbor dev hot-reload supervisor (fsnotify-driven graceful-drain restart, Phase 65)

Date: 2026-05-17 Status: Settled (shipping with this PR)

Where it lives: cmd/harbor/cmd_dev_hot_reload.go (the supervisor + watcher + bus emission); cmd/harbor/cmd_dev.go::runDev (hands off to the supervisor when hot-reload is enabled; adds --no-hot-reload flag); internal/config/config.go (CLIConfig.DevHotReload + DevHotReloadConfig + policy constants); internal/config/loader.go::defaults (CLI.DevHotReload defaults); internal/config/validate.go::validateCLI (the new validator); cmd/harbor/cmd_dev_hot_reload_test.go (unit + in-package integration tests); harbortest/devstack/devstack.go (godoc carve-out note); scripts/smoke/phase-65.sh; docs/plans/phase-65-harbor-dev-hot-reload.md; go.mod (github.com/fsnotify/fsnotify v1.10.1).

Decision. harbor dev boots an fsnotify-driven hot-reload supervisor that wraps bootDevStack and serves the active devStack until a watched file changes OR ctx cancels. On a file change the supervisor: (1) emits dev.hot_reload.triggered on the active bus; (2) drains the active stack per cli.dev_hot_reload.policy (drain / cancel); (3) calls bootDevStack again with the original boot opts; (4) starts a fresh serve goroutine against the new stack; (5) emits dev.hot_reload.completed on the new bus. The supervisor owns the serve loop AND the rebuild loop in one goroutine; a debounced (250ms) channel collapses fsnotify event bursts so an editor save fires one rebuild, not N.

Hot-reload shape: in-process devStack rebuild (NOT binary re-exec). The §4.3 "smaller approach that still satisfies acceptance" carve-out applies. Acceptance criterion 1 is "new code picked up"; at dev-time granularity (operator edits a file, restart picks up the change), the in-process rebuild satisfies it for every config / scaffold change. Binary re-exec was considered and rejected:

  • It requires an out-of-process supervisor (the binary cannot re-exec itself without losing the current http.Server's connections).
  • It costs a Go build per cycle (~5s on a warm machine) — the developer feedback loop is the load-bearing UX here.
  • An operator iterating on a YAML config file does NOT need a binary rebuild; an operator iterating on Go source rebuilds + re-launches the binary manually (the same cycle they'd run today without hot-reload).

A binary-rebuild path can be layered on as a future opt-in (policy: rebuild or similar) without changing the supervisor's shape — the rebuild step becomes "run go build, then re-exec" instead of "re-call bootDevStack". V1 ships the in-process shape only.

Why. Phase 64 (harbor dev v1) closed the embedded-runtime + Protocol boot path; the §13 amendment for the LLM seam landed there. The remaining dev-loop UX item from RFC §8 is hot-reload: "watches the project directory for changes, hot-reloads on Go-source changes (graceful-stop in-flight runs first; configurable)." Without it, an operator iterating on a scaffolded agent kills + restarts harbor dev per change — slow, kills SSE subscribers, drops the dev token. The supervisor closes this surface.

Why fsnotify (RFC §10 confirmation). fsnotify is the de-facto pure-Go FS-watching library (no CGo, no platform-specific extensions in the consuming code; the library handles inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows). It is the implicit RFC §10 candidate brief 06 §7 item 10 already named ("fsnotify watcher"). Added to go.mod as a direct dependency (was previously indirect via cobra/viper but cobra alone does not pull it).

How to apply.

  • Operators declare hot-reload behaviour under cli.dev_hot_reload in harbor.yaml: enabled (default true), policy (drain / cancel / disabled; default drain), drain_timeout (default 5s), watch_roots (default [".harbor/agents"]).
  • The CLI flag --no-hot-reload is the operator-facing escape hatch — overrides the config's enabled to false for that boot.
  • Wire consumers (the Console, integration tests, third-party Protocol clients) subscribe to dev.hot_reload.triggered / dev.hot_reload.completed on the canonical bus to observe the restart cycle. The events carry the dev identity triple (tenant=dev, user=dev, session=dev) so subscribers consume them via the standard triple-scoped filter or via the §6 rule 5 admin path.
  • A reboot failure returns up to runDev; the operator sees a CLIError with code boot_internal_error. Per §13, the supervisor does NOT silently degrade to "stay on the old stack" on a reboot failure — the old stack is already drained at that point and the operator's intent (pick up the new state) is unsatisfiable.

Bus events — canonical shapes. Both payloads are SafePayload by construction (every field is internal bookkeeping; no secrets).

text
EventTypeDevHotReloadTriggered = "dev.hot_reload.triggered"
DevHotReloadTriggeredPayload { Path string; Op string; Policy string }

EventTypeDevHotReloadCompleted = "dev.hot_reload.completed"
DevHotReloadCompletedPayload {
    Path string; Op string; Policy string;
    DurationMS int64; Success bool; ErrorMessage string
}

§13 primitive-with-consumer. The primitive (the supervisor + the canonical bus events) ships with its first consumer in the same PR: runDev constructs and runs the supervisor when hot-reload is enabled. The TestHotReloadSupervisor_FileChangeTriggersRebuild test is the first wire-side consumer of the canonical events — it subscribes to dev.hot_reload.triggered against a real bus and asserts the typed payload arrives. No deferred consumer; the supervisor is dead code without runDev's integration.

§13 silent-degradation discipline. Three fail-loud surfaces in the supervisor:

  1. fsnotify.NewWatcher errors fail the boot loud.
  2. watcher.Add(path) errors on a watch root fail the boot loud — except os.ErrNotExist, which is logged Info and skipped (the default .harbor/agents does not exist for first-time projects; failing-loud there would block the dev loop on day one).
  3. A reboot failure propagates up; the supervisor exits with the wrapped error. The operator sees the CLIError. No "silently keep the old stack" fallback.

D-094 helper-tracks-production carve-out. The supervisor wraps bootDevStack at the runDev layer (not inside bootDevStack itself). The harbortest/devstack helper mirrors bootDevStack per D-094's source-of-truth invariant, NOT the surrounding supervisor: a "helper that owns the rebuild loop" would duplicate the cmd-side orchestrator with no integration test consuming it. The devstack package's godoc documents this scope choice explicitly so the next contributor doesn't read the omission as drift. When the supervisor's shape next changes, both files are revisited together — same precedent as D-094's invariant, applied to a scope where the helper deliberately does NOT mirror.

Concurrency contract. The supervisor's Run method runs ONE goroutine that owns both the fsnotify event loop and the serve-goroutine spawning. A second serve goroutine is spawned per rebuild (against the new stack); the previous serve goroutine is cancelled before the rebuild starts and drained via the shared serveErr channel. At any instant exactly one serve goroutine is active (or zero, during a rebuild). Per CLAUDE.md §5 "Concurrent reuse contract": the supervisor IS a per-boot artifact (not shared across boots), so D-025's "N concurrent invocations against a single shared instance" does not apply — the relevant test is the lifecycle-drain test, which TestHotReloadSupervisor_CtxCancel_ReturnsCleanly covers.

Acceptance:

  • cli.dev_hot_reload config block + loader defaults + validator land in internal/config.
  • cmd/harbor/cmd_dev_hot_reload.go implements the supervisor; cmd/harbor/cmd_dev.go::runDev constructs and runs it when enabled.
  • --no-hot-reload flag added to harbor dev.
  • dev.hot_reload.triggered / dev.hot_reload.completed registered as canonical event types.
  • Unit tests + in-package integration test under cmd/harbor/cmd_dev_hot_reload_test.go.
  • scripts/smoke/phase-65.sh asserts the watcher log line + the --no-hot-reload flag + the canonical event-type strings in the binary.
  • docs/plans/phase-65-harbor-dev-hot-reload.md documents the phase per §16.
  • docs/plans/README.md + README.md flip Phase 65 to Shipped.
  • harbortest/devstack/devstack.go godoc documents the supervisor-scope carve-out.
  • go.mod adds github.com/fsnotify/fsnotify v1.10.1 as a direct dependency.
  • All tests -race green; make preflight PASS; make drift-audit clean; make check-mirror clean; npx markdownlint-cli2 docs/decisions.md 0 errors.

D-100 — Phase 66 harbor dev draft-save scaffolding: /v1/dev/drafts/ over the existing dev mux + identity-scoped on-disk layout + Phase 67 scaffold engine round-trip

Date: 2026-05-17 Status: Settled (shipping with this PR)

Where it lives: internal/devdraft/ (new package — devdraft.go Store + sentinels, events.go 5 EventTypes + SafePayload structs, http.go Handler + wire shapes + error mapping, path_safety.go §7 rule 5 helper mirror of internal/skills/importer/path_safety.go, devdraft_test.go + http_test.go + concurrent_test.go); cmd/harbor/cmd_dev.go::bootDevStack (the draftStore + draftHandler are constructed and mounted on the dev router at devdraft.RoutePrefix under the same auth.Middleware wrapper the Phase 60 transports use); harbortest/devstack/devstack.go (D-094 mirror — Assemble always constructs a DraftStore and mounts the handler when transports are enabled); test/integration/phase66_draft_save_test.go (cross-subsystem E2E through the devstack helper); scripts/smoke/phase-66.sh (live-binary smoke).

Decision. The draft scratchpad surface is an HTTP-only subsystem mounted on the existing harbor dev server at /v1/dev/drafts/. The on-disk layout is <root>/<tenant>/<user>/<session>/<draft_id>/ where root is <cwd>/.harbor/drafts by default — identity-scoped so concurrent harbor dev instances against the same operator working directory cannot collide. The save path round-trips through the Phase 67 scaffold engine: Store.Create invokes scaffold.Scaffold to seed the draft tree (so a freshly-created draft IS a harbor scaffold-shaped output); Store.Save runs internal/config.Load + Validate against the rendered harbor.yaml BEFORE any promoted output is written, then copies the draft tree byte-for-byte to the operator-supplied output dir (preserving the operator's PATCH edits). The HTTP surface is five endpoints:

  • POST /v1/dev/drafts/ — create a fresh draft from a name + template.
  • GET /v1/dev/drafts/{id} — list files + content for the Console editor.
  • PATCH /v1/dev/drafts/{id}/files/{path} — write a file's content.
  • POST /v1/dev/drafts/{id}/preview — validation-only dry-run.
  • POST /v1/dev/drafts/{id}/save — promote to scaffold layout.
  • DELETE /v1/dev/drafts/{id} — discard (idempotent).

Five lifecycle events land on the canonical event bus per round-trip — dev.draft.created, dev.draft.updated, dev.draft.previewed, dev.draft.saved, dev.draft.discarded — registered with internal/events's exhaustive registry at init(); SafePayload by construction (no file contents on the bus).

Why HTTP-only at V1 (no harbor dev draft ... CLI sub-CLI). The intended V1 consumer is the Console editor that lands in Phases 72–75; a CLI surface would add operator friction without solving a load-bearing use case at V1 (operators who want to script the flow can curl the JSON surface directly — the dev-token surface already supports it). A future PR can add a sub-CLI in cmd/harbor/cmd_dev_draft.go against the same Store without breaking the wire contract.

Why identity-scoped on disk (and not via opaque ULIDs alone). The operator's working directory is a shared filesystem surface: two concurrent harbor dev instances bound to different (tenant, user, session) triples MUST be able to author drafts under the same .harbor/drafts/ root without seeing each other's work. The <tenant>/<user>/<session>/ subpath enforces isolation at the filesystem layer — Store.Get cannot return another identity's draft because the path is composed from the identity before any file open. CLAUDE.md §6 rule 2 ("Every storage method that touches an identity-scoped table takes the triple and filters with the appropriate WHERE clause") applied to a filesystem-backed store: the equivalent of a WHERE clause is the path-component prefix.

Path-traversal safety (CLAUDE.md §7 rule 5). Every operator-supplied path component — the {path} in PATCH /v1/dev/drafts/{id}/files/{path}, the operator-supplied output dir on save, the identity-component subpath — is filtered through internal/devdraft.resolveSafe, which mirrors internal/skills/importer/path_safety.go's shape: filepath.Clean + lexical-prefix verification + a symlink-evaluation pass. Escape attempts fail loud with ErrUnsafePath (HTTP 400 + CodeUnsafePath on the wire). The importer helper is unexported so we duplicate; a future refactor that lifts the helper into a shared package collapses both call sites.

Pre-promotion validation (fail-loud at the seam). Store.Save runs internal/config.Load + Validate against the draft's harbor.yaml BEFORE any file is written to the operator-supplied output dir. An invalid draft is refused with ErrValidationFailed (HTTP 400 + CodeValidationFailed) and the wire envelope's hint points the operator at the preview surface. This closes the seam at the boundary instead of "save succeeds but the next harbor validate fails" — the §13 "fail loudly at boot" posture extended to the next operator-facing seam.

§13 primitive-with-consumer. The wave that introduces the draft endpoints (the primitive) ships the same wave's consumer that exercises every endpoint end-to-end — the test/integration/phase66_draft_save_test.go round-trip drives create → patch → preview → save → delete through the real HTTP handler under a real Bearer token, observes all five lifecycle events on the bus, and exercises the path-traversal + missing-bearer failure modes. The handler's wire contract is therefore validated against a real call site in the same PR that lands it.

§13 fail-loud (no test-stub-as-default). NewStore fails loud at construction when either Options.Root or Options.Bus is missing — a Store with no bus would silently drop the observability surface (the Wave 11.5 §17.6 F1 lesson applied here). The Store has no fallback to an in-memory backing store: the filesystem is the load-bearing surface (operators inspect drafts via their text editor; an in-memory fallback would silently break that affordance). NewHandler fails loud at construction when the Store is nil.

Composition order on the dev router. The handler is registered BEFORE the router.Handle("/v1/", mux) Protocol catch-all. Go's http.ServeMux resolves longest-prefix-match, so /v1/dev/drafts/... routes to the draft handler and the rest of /v1/ flows to the Protocol mux — the two surfaces are non-overlapping by design.

Auth-wrap inherited from the Protocol mux. The draft handler is wrapped in auth.Middleware(validator, auth.MWLogger(logger)) — the same wrapper the Phase 60 transports use. Every draft request requires a Bearer token; the middleware injects the verified identity into ctx; the Store's mustIdentity helper reads it via identity.From and rejects missing-triple requests with ErrIdentityMissing (HTTP 401 + CodeIdentityRequired). There is no "skip auth in dev" knob — the §13 amendment closes that surface.

D-094 helper-tracks-production rule. harbortest/devstack/devstack.go::Assemble now always constructs a DraftStore (under a per-test os.MkdirTemp root) and mounts the handler on the helper's router when transports are enabled — the helper mirrors production. The wave-end E2E (test/integration/phase66_draft_save_test.go) and every future integration test that touches the draft surface inherits the wiring; skipping the helper update would have left the test silently divergent from production. AssembleOpts.DraftRoot lets a test override the root when it needs to assert specific on-disk paths.

Concurrent-reuse contract (D-025). The Store is a compiled artifact — every field is set at construction and immutable afterwards (the entropyMu guards the ULID entropy reader, which is not goroutine-safe per its godoc; everything else is set-once). internal/devdraft/concurrent_test.go::TestStore_ConcurrentReuse_NoRaceUnderLoad runs N=128 concurrent invocations against one shared Store under -race, each goroutine creating + writing + previewing + getting + cross-identity-probing its own draft. The test also asserts runtime.NumGoroutine returns to baseline after every invocation returns (no goroutine leak).

dev.draft.previewed semantics. The V1 preview path is a config-validation pass against the rendered harbor.yaml — not a real dry-run that boots the draft against a sandboxed runtime. The bus event + the wire shape ({ok, errors[]}) are stable across a future upgrade that adds the dry-run; the surface is forward-compatible.

Acceptance:

  • Five HTTP endpoint shapes under /v1/dev/drafts/ with stable wire codes (identity_required, invalid_request, not_found, unsafe_path, unknown_template, output_dir_exists, validation_failed, internal_error).
  • On-disk layout <root>/<tenant>/<user>/<session>/<draft_id>/ per CLAUDE.md §6.
  • Cross-identity reads return ErrNotFound (pinned by TestStore_Get_CrossIdentityIsolation).
  • Path-traversal attempts return 400 + CodeUnsafePath (pinned by TestStore_WriteFile_RejectsPathTraversal + TestHandler_Patch_RejectsTraversal).
  • Save refuses to promote an invalid draft with ErrValidationFailed (pinned by TestStore_Save_RejectsInvalidYAML + TestHandler_Save_InvalidYAML_Returns400_WithCodeValidationFailed).
  • Save round-trips through the Phase 67 scaffold engine; the promoted harbor.yaml passes internal/config.Load (pinned by TestStore_Save_RoundTrip + TestE2E_Phase66_DraftSave_RoundTripThroughHTTP).
  • Five lifecycle events emit per round-trip (pinned by TestStore_LifecycleEvents + the integration test's bus-event drain).
  • scripts/smoke/phase-66.sh exercises the round-trip against the live binary; the 404/405/501 → SKIP convention keeps the smoke harmless on builds that pre-date Phase 66.
  • harbortest/devstack/devstack.go::Assemble mirrors the production wiring per D-094.
  • internal/devdraft/concurrent_test.go::TestStore_ConcurrentReuse_NoRaceUnderLoad passes under -race with N=128.

D-101 — Phase 69 harbor inspect-events + harbor inspect-runs: graduate the two Phase 63 inspect-* stubs into SSE Protocol-client consumers; auth via HARBOR_TOKEN env or ~/.harbor/token file; no new Protocol methods (consume existing Phase 60 surface)

Date: 2026-05-17 Status: Settled (shipping with this PR)

Where it lives: cmd/harbor/inspect_common.go (shared token discovery, endpoint composition, SSE parser, identity-header injection, fail-loud CLIError codes — CodeAuthRequired / CodeBindInvalid / CodeStreamFailed / CodeIdentityIncomplete / CodeRunNotFound); cmd/harbor/cmd_inspect_events.go (the SSE consumer + human/JSON renderers); cmd/harbor/cmd_inspect_runs.go (per-run aggregator for list mode + trajectory replay for single-run mode); cmd/harbor/testdata/golden/inspect-{events,runs}-*.txt (six new goldens locking the human + JSON shapes for both subcommands); cmd/harbor/inspect_common_test.go + cmd_inspect_events_test.go + cmd_inspect_runs_test.go (unit + golden coverage); test/integration/phase69_inspect_cli_test.go (wire round-trip: builds bin/harbor, exec's it against an httptest Protocol mux, asserts a real task.spawned arrives in the CLI's stdout); scripts/smoke/phase-69.sh (new smoke); scripts/smoke/phase-63.sh (graduated subcommands removed from the stubs array — §17.6 cross-phase fix); docs/plans/phase-69-harbor-inspect.md.

Decision. The Phase 63 stubs harbor inspect-events and harbor inspect-runs graduate to real implementations that consume the Phase 60 SSE event stream as Protocol clients. Two shape decisions land here:

  1. Bearer-token discovery: HARBOR_TOKEN env preferred; ~/.harbor/token file fallback; missing token at BOTH sources fails loud with auth_required. The CLI is a Protocol client of a running Runtime; the Runtime requires Authorization: Bearer <jwt> on every Protocol request (Phase 61 / D-079 — the auth.Middleware wraps both /v1/control and /v1/events). The discovery shape mirrors what every Protocol client will need; the file fallback is operator convenience (harbor dev prints HARBOR_DEV_TOKEN=... to stderr; the operator pipes that into the file once and the CLI works from any terminal). Failing loud per §13's "fail loudly when a required external dependency is missing" — no silent anonymous probe, no implicit fallback to a "dev mode" CLI that talks to an unauthenticated Runtime.

  2. No new Protocol methods — both subcommands derive their views by REPLAYING the existing SSE event stream. Phase 60 already shipped /v1/events with SSE replay-from-cursor (Last-Event-ID); both commands stand up an HTTP+SSE client against that, read the canonical wireEvent shape, and project. inspect-events is a thin pass-through (human or NDJSON); inspect-runs (list) aggregates events keyed by run identifier; inspect-runs <run-id> filters client-side by run identifier and projects each event into a trajectory step. The §13 primitive-with-consumer rule reads backwards here: don't ship a primitive (a runs.list / runs.trajectory Protocol method) without a consumer in the same wave. The Console (RFC §7's Sessions / Live Runtime pages) will eventually want richer per-run query methods, but THOSE pages' Protocol-surface phase is Phase 72+ — the Phase 69 CLI does not lead that primitive. The CLI is the consumer of an EXISTING primitive (the SSE stream) instead.

Why. Closes Phase 69 acceptance criteria 1 + 2 + the §13 amendment closure for the two stubs (operator-facing seams that previously exited with CodeNotImplemented). Gives operators wire-level debug observability against harbor dev — the same surface the Console will provide once the Console pages ship, but available today via the CLI for scripting / triage / smoke gates. The §17.6 cross-phase fix in scripts/smoke/phase-63.sh (removing the two graduated subcommands from the stubs array) is bundled per the rule.

Protocol additions. None. The CLI uses the existing GET /v1/events SSE handler (Phase 60), the existing X-Harbor-Tenant / X-Harbor-User / X-Harbor-Session / X-Harbor-Run / X-Harbor-Event-Type carrier headers (Phase 60 + Phase 61), the existing Last-Event-ID reconnect cursor, and the existing wire wireEvent shape. No new method registered in internal/protocol/methods/methods.go; no new error code in internal/protocol/errors/errors.go; no new wire type in internal/protocol/types/. The CLI-side CodeAuthRequired / CodeBindInvalid / CodeStreamFailed / CodeIdentityIncomplete / CodeRunNotFound are CLIError codes (operator-facing exit surface) — distinct from Protocol wire codes by design (CLAUDE.md §8 — the CLI structured-error surface is single-sourced in cmd/harbor/errors.go; D-084).

§13 primitive-with-consumer. The two subcommands ARE the first CLI consumers of the Phase 60 SSE stream. The Phase 60 wire-transport tests exercised the stream as a Go test client; this PR adds the first operator-facing wire consumer. No new primitive lands.

Why the run-id projection is client-side. The Protocol start method dispatches tasks.SpawnRequest with Identity: identity.Quadruple{Identity: id}RunID is empty at spawn time (the per-task RunLoop driver sets RunID = TaskID when it picks the task up, but the FIRST event, task.spawned, lands on the bus before that with Event.Identity.RunID == ""). The Phase 60 SSE handler filters server-side by events.Filter.Run (i.e. by Event.Identity.RunID), which would DROP the spawn event from a --run R filtered subscription. The CLI works around this by NOT passing X-Harbor-Run server-side; it filters client-side via a small projection helper (runIDFromEvent) that falls back to the payload's TaskID when the identity-tuple RunID is empty. This is the same projection the Console will need when its Sessions page ships; exporting the helper here documents the contract. A future Phase 60 stream upgrade that adds payload-aware run filtering on the server would let the CLI move to server-side filtering — tracked as a source-comment enhancement.

Acceptance:

  • harbor inspect-events --bind H:P --tenant T --user U --session S --type X --since C --follow=false snapshots the SSE replay and exits 0 on success, non-zero with auth_required / stream_failed / identity_incomplete on the documented failure modes.
  • harbor inspect-events ... --json emits NDJSON: one canonical wireEvent JSON object per line + a sentinel {"comment":"…"} line on :-comment frames (keepalive, replay-gap markers).
  • harbor inspect-runs --bind H:P --tenant T --user U --session S [--json] aggregates per-run rows from the SSE replay and emits either a human table or a single-line JSON array of {run_id, status, started_at, last_event_at, event_count, failure_code?}.
  • harbor inspect-runs <run-id> ... [--json] filters the replay to one run and emits a trajectory (one row per event) or a {run_id, steps[]} JSON object; CodeRunNotFound fires when no event with the target run identifier appears in the replay window.
  • Bearer JWT discovery: HARBOR_TOKEN env preferred, ~/.harbor/token fallback. Both empty → auth_required BEFORE any network call.
  • Identity triple (--tenant/--user/--session) is mandatory at the CLI edge — fails CLI-side with identity_incomplete so the error message names the missing flag rather than relying on the Runtime's generic 401.
  • Six new goldens (cmd/harbor/testdata/golden/inspect-{events,runs}-*.txt) lock both human and --json shapes for both subcommands. go test -update ./cmd/harbor/... regenerates them.
  • The Phase 63 stubs array in scripts/smoke/phase-63.sh no longer includes inspect-events / inspect-runs (§17.6 cross-phase smoke maintenance).
  • test/integration/phase69_inspect_cli_test.go builds bin/harbor, stands up an httptest Protocol mux over real events / state / tasks drivers, drives a real start, exec's the binary, and asserts the canonical task.spawned event surfaces in the CLI's stdout. Includes a fail-loud-no-token assertion.
  • All tests -race green; make preflight PASS; make drift-audit clean; make check-mirror clean; npx markdownlint-cli2 docs/decisions.md 0 errors.

D-102 — harbor inspect-topology renderer + trajectory-synthesised source + Wave 12 wave-end E2E (closes Phase 70)

Date: 2026-05-17 Status: Settled (shipping with this PR)

Where it lives: cmd/harbor/cmd_inspect_topology.go (the cobra body — replaces the Phase 63 stub); cmd/harbor/topology_render.go (the pure ASCII + JSON renderer); cmd/harbor/topology_synthesise.go (the wire-frame → Topology builder); cmd/harbor/topology_render_test.go (renderer + synthesiser unit tests + golden round-trip); cmd/harbor/cmd_inspect_topology_test.go (cobra-driver + transport-layer tests); cmd/harbor/testdata/golden/inspect-topology-happy.{txt,json} (the golden pinning the renderer shape — regeneratable via go test -update); cmd/harbor/testdata/golden/help.txt (regenerated — (Phase 70) suffix dropped from the subcommand's Short); cmd/harbor/cmd_stub_test.go (drops inspect-topology from stubCases); scripts/smoke/phase-70.sh (new); scripts/smoke/phase-63.sh (drops inspect-topology from the stub-subcommands array); test/integration/wave12_test.go (the new Wave 12 wave-end E2E per §17.7 step 5); docs/plans/phase-70-inspect-topology.md (the phase plan); docs/plans/README.md (Phase 70 flips to Shipped); README.md (Status table flip + Phase 70 row).

Decision. Two intertwined design calls land in this PR — the renderer shape AND the topology source. Both are documented here so a future PR cannot quietly retrofit them.

1. Renderer shape — indent-based ASCII, not box-drawing. Two valid shapes for run graphs: box-drawing (├──, └──, ) and indent-based (+--, plain spaces). Indent-based wins on three operational criteria the Phase 70 prompt asked us to settle:

  • Terminal portability: indent + +-- renders on every terminal — Windows cmd's CP437, Linux TTYs without ncurses, CI log capture tools that strip ANSI / Unicode. Box-drawing characters are multi-byte UTF-8 that some captures mangle.
  • Deterministic byte length: fixed-width ASCII makes the golden comparison trivial under diff; multi-byte UTF-8 bloats the golden's byte surface and complicates the truncation rule (we'd need rune-aware width math instead of len()).
  • Readability: the visual hierarchy is one space per level + the +-- connector; readers from a wider terminal family see the same shape as readers in an 80-col SSH window.

Sort order is (Sequence, EventID)Sequence is per-bus monotonic + gap-free (events.Event.Sequence), so two snapshots of the same run produce byte-stable ASCII. EventID (ULID-shaped) is the tie-break if a future driver ever issues parallel sequences. The renderer applies the sort defensively at render time so an out-of-order input slice still produces deterministic output (the renderer test TestRender_OutOfOrderNodes_SortedDeterministically pins this).

2. Topology source — trajectory-synthesised from existing events, NOT topology.snapshot. The master-plan Phase 70 goal cites "topology.snapshot events"; the canonical producer of those is Phase 74 (Console topology projection events, RFC §6.13), which has not yet landed. Two source options were on the table:

  • (a) Extend an existing event with topology fields — rejected. Topology is its own concern; bolting it onto e.g. tool.invoked couples two payload schemas and makes Phase 74's later canonical producer awkward (it would need a back-compat path against the bolt-on shape).
  • (b) Trajectory-synthesise from existing event types — CHOSEN. The synthesiser (BuildTopologyFromEvents) walks tool.invoked / tool.completed / tool.failed / tool.invalid_args / tool.approval_requested / tool.auth_required / task.spawned / pause.requested / planner.finish and produces a Topology value. Paired events merge (tool.invoked + tool.completed → one node with Status=ok); depth is inferred (tool.approval_requested is a child of the last open tool.invoked). When Phase 74 lands, the synthesiser gains one additional case branch for topology.snapshot, and the renderer prefers that source when present — the V1 path stays as a fallback for older runs.

The CLI is a Protocol client per CLAUDE.md §8 + RFC §7: it consumes the canonical wire shape (internal/protocol/transports/stream.wireEvent), not the in-process events.Event struct. The wire shape is re-declared as WireEventFrame in cmd/harbor/topology_synthesise.go so the cmd does not import internal/protocol/transports/stream — the contract is the WIRE shape, not the Go struct (a future third-party CLI in any language can synthesise the same topology from the same SSE bytes).

Why. Closes Phase 70 acceptance ("Sample run produces stable ASCII matching golden") AND closes the §13 primitive-with-consumer rule for the topology subsystem: shipping the renderer now (with a synthesise-from-existing-events source) means the operator-facing subcommand WORKS the day it lands, instead of waiting for Phase 74 and risking the renderer drifting from the design that motivated it. When Phase 74 ships, the renderer's existing test surface validates the new event-kind case.

Wave 12 wave-end E2E (per CLAUDE.md §17.7 step 5). test/integration/wave12_test.go boots the assembled dev stack via harbortest/devstack.Assemble (D-094) and exercises the composed Wave 12 surface end-to-end. Scenarios:

  • TestE2E_Wave12_InspectTopology_HappyPath — boot devstack; spawn a foreground task (POST /v1/control/start); the per-task RunLoop drives the planner; tool/finish events flow on the bus; inspect-topology against the live runtime produces non-empty ASCII naming the run id and at least one tool node.
  • TestE2E_Wave12_InspectTopology_CrossTenantIsolation — two distinct identity-triple stacks share the SAME assembled runtime; each tenant's inspect-topology sees ONLY its own run's events. Asserts the identity-tuple flow through SSE filter → bus subscribe → renderer header.
  • TestE2E_Wave12_InspectTopology_RunNotFound_FailureMode — invoke inspect-topology against a deliberately nonexistent run-id; assert the structured CodeInspectTopologyRunNotFound exit (≥1 failure mode per §17.3).
  • TestE2E_Wave12_InspectTopology_Concurrency_NoCrossTalk — N=10 concurrent operators run inspect-topology against the same runtime (each with their own identity tuple); assert no goroutine leak after all clients close, no cross-tenant data appearing in any client's output (the §17.3 concurrency-stress shape, ratchet from the §17.3 minimum N≥10).

Parallel-PR coverage caveat. Wave 12 also contains Phase 65 (hot-reload events), Phase 66 (draft endpoints), Phase 69 (inspect-events / inspect-runs), and #126 (cfg.Planner driver registry). Per the prompt's "Strategy" note, the wave-end E2E exercises ONLY the surface this PR ships (inspect-topology + the inherited Phase 60/63/64 dev stack). When the parallel PRs merge, the audit follow-up chore(checkpoint): wave-12 audit fixes PR extends the wave-end E2E with scenarios that exercise the merged surfaces — not in this PR, because the Stage-2 PRs may land in any order and a fragile cross-PR dependency here would block this PR's merge on theirs. Documented here so the audit owner knows to backfill.

§13 primitive-with-consumer. The renderer (the primitive) and the cmd (the consumer) ship in the same PR. The renderer's pure-function shape AND the cmd's transport-layer SSE integration are BOTH exercised end-to-end by the wave-end E2E. The §13 rule is discharged: no "renderer without consumer" or "cmd without renderer" window exists.

§17.6 cross-phase smoke maintenance. scripts/smoke/phase-63.sh's stub-subcommands array drops inspect-topology. Phase 69 (parallel-merge) will drop inspect-events / inspect-runs from the same array in its own PR; whichever PR lands first wins, and the second rebases. The array goes to empty when Phase 69 merges. cmd/harbor/cmd_stub_test.go's stubCases table mirrors the same drop.

Recurring-failure-mode pre-empt. The cmd's SSE fetcher (fetchSSEUntilIdle) owns ONE reader goroutine per invocation; that goroutine is joined via ctx-cancel when the fetcher returns. The wave-end E2E's N=10 concurrent invocations + post-test goroutine-baseline assertion is the D-025 stress proof for the cmd's reusable shape. No mutable state on any cmd-side artifact; the renderer + synthesiser are pure.

Acceptance:

  • cmd/harbor/cmd_inspect_topology.go::runInspectTopology graduates from the Phase 63 not_implemented stub; renders a run's node graph as deterministic ASCII (golden-pinned) and JSON (--json).
  • cmd/harbor/topology_render.go::Render + RenderJSON are byte-stable for a given input (the renderer test asserts forward vs reverse input ordering yields identical output).
  • Every failure mode emits a stable CLIError code (inspect_topology_bind_invalid, inspect_topology_width_invalid, inspect_topology_auth_missing, inspect_topology_run_id_missing, inspect_topology_connect_failed, inspect_topology_http_status, inspect_topology_run_not_found).
  • test/integration/wave12_test.go ships with this PR — real drivers, identity propagation, ≥1 failure mode, N=10 concurrency stress, all -race green.
  • scripts/smoke/phase-70.sh runs the cmd/harbor tests + binary-mode assertions + (when HARBOR_DEV_TOKEN set in env) live-server run-not-found check; OK > 0, FAIL = 0.
  • scripts/smoke/phase-63.sh drops inspect-topology from the stubs array; cmd/harbor/cmd_stub_test.go::stubCases mirrors the drop.
  • cmd/harbor/testdata/golden/help.txt regenerated; the (Phase 70) suffix drops from the inspect-topology row.
  • docs/plans/README.md Phase 70 row flips to Shipped; README.md Status table flips Phase 70 row to Shipped.

D-103 — cfg.Planner schema + internal/planner driver registry (closes #126, closes D-097's "future phases will read cfg.Planner" note)

Date: 2026-05-17 Status: Settled (shipping with this PR)

Where it lives: internal/config/config.go (PlannerConfig + the top-level Config.Planner field); internal/config/validate.go (allowedPlannerDrivers + validatePlanner); internal/config/loader.go::defaults (the Planner: PlannerConfig{Driver: "react"} default); internal/planner/registry.go (the driver registry — Factory + Register / MustRegister / Resolve / RegisteredDrivers + PlannerConfig + FactoryDeps boundary types + the four sentinel errors); internal/planner/registry_test.go (registry-bookkeeping tests); internal/planner/react/init.go (the react driver's self-registration via init() → planner.MustRegister("react", factory)); internal/planner/react/registry_allowlist_test.go (the drift guard between the planner registry and the config validator's allowlist); cmd/harbor/main.go (blank-import _ "github.com/hurtener/Harbor/internal/planner/react"); cmd/harbor/cmd_dev.go::bootDevStack (the hardcoded react.New(llmClient) call is replaced with planner.Resolve(ctx, plannerConfigFromConfig(cfg.Planner), planner.FactoryDeps{LLM: llmClient})); cmd/harbor/cmd_dev.go::plannerConfigFromConfig (the boundary mapper from config.PlannerConfigplanner.PlannerConfig); harbortest/devstack/devstack.go (mirrors the production wiring per D-094 — adds AssembleOpts.PlannerOverride for tests that need a stub planner; constructs via planner.Resolve otherwise); examples/dev.yaml and examples/harbor.yaml (operator-facing planner: block with the react default and a commented max_steps: knob); test/integration/phase_d103_planner_registry_test.go (end-to-end coverage); test/integration/wave11_test.go + harbortest/devstack/devstack_test.go (the new _ "github.com/hurtener/Harbor/internal/planner/react" blank imports the tests need now that devstack reaches the planner only through the registry).

Decision. The planner concrete is resolved at boot via the internal/planner driver registry: operators declare planner.driver: <name> in harbor.yaml; the binary's blank-import block in cmd/harbor/main.go self-registers each available driver; cmd/harbor/cmd_dev.go::bootDevStack calls planner.Resolve(ctx, cfg.Planner, planner.FactoryDeps{LLM: llmClient}) to construct the concrete. The V1 default driver is react (the reference LLM-driven ReAct planner — Phase 45 / D-051) and remains the no-config-needed default: cfg.Planner.Driver == "" resolves to "react" at both the loader-side defaults() and the validator-side default branch, so an operator config that omits the planner: block boots unchanged from the pre-D-103 hardcoded path. cfg.Planner.MaxSteps is the optional planner-side circuit-breaker step cap; zero means "use the driver's internal default" (the react driver's react.DefaultMaxSteps = 12); negative is rejected at the validator edge. The registry mirrors D-095's OAuth-provider registry structurally — same shape, same sentinel set, same Register / MustRegister / Resolve / RegisteredDrivers quartet, same allowlist-mirror pattern in the config validator. The structural precedent is deliberate.

Why. CLAUDE.md §1.3 names the swappable planner one of the three non-negotiable product properties of Harbor's runtime — yet the V1 boot path hardcoded the concrete via react.New(llmClient) in cmd/harbor/cmd_dev.go::bootDevStack. D-097's planner-construction site explicitly carried the note "the cfg.Planner schema + driver registry (per the §4.4 seam pattern that D-095 uses for OAuth providers) is tracked in issue #126 — until it lands, ReAct is hardcoded here." Issue #126 (surfaced by the Wave 11.5 §17.5 closeout audit's finding N2) is that gap. Closing it closes the product-property gap and the §13 "shipping a primitive without its first consumer" smell, twice: (a) the §4.4 seam pattern is now exercised on the planner subsystem the same way it's exercised on tools/auth, memory, state, artifacts, events, telemetry, and tasks; (b) the registry's first consumer (cmd/harbor/cmd_dev.go::bootDevStack's production planner construction) ships in the same PR as the primitive.

How to apply.

  • New planner concretes add a package under internal/planner/<name>/ and a func init() { planner.MustRegister(DriverName, factory) } block following the §4.4 seam pattern. The factory adapter maps the planner.FactoryDeps + planner.PlannerConfig boundary onto the concrete's option-applied constructor (the react driver's init.go is the reference shape).
  • The driver's canonical name MUST be added to internal/config/validate.go's allowedPlannerDrivers allowlist in the same PR. The internal/planner/<name>/registry_allowlist_test.go drift guard mirrors the D-095 OAuth-provider pattern (every test loads _ "internal/config" AND _ "internal/planner/<name>", then asserts the validator accepts the driver's name).
  • cmd/harbor/main.go blank-imports the new driver package: _ "github.com/hurtener/Harbor/internal/planner/<name>".
  • Operators opt into a new driver by setting planner.driver: <name> in harbor.yaml. The V1 reference planner (react) stays the no-config-needed default for the foreseeable future — flipping the default to a different concrete would be a §13 violation of the operator-stability contract.
  • Per-driver tuning knobs land in planner.PlannerConfig.Extra (opaque map[string]string); a driver that grows tuning beyond MaxSteps reads from Extra at factory time. Future drivers may petition for a typed Extras block on the YAML schema (the same shape D-095 considered for OAuth providers); the V1 register-side surface stays narrow on purpose.

§4.4 conformance. Interface lives in internal/planner/planner.go (Planner.Next). Drivers live in internal/planner/<name>/ — Phase 45's react/ is the V1 default; Phase 48's deterministic/ is the existing second concrete (deterministic planner — see D-073 + the spawn-await scenario test); finish/ ships the stub planner used by the conformance pack. Factory + registry live in internal/planner/registry.go. Blank-import in cmd/harbor/main.go fires the self-registration. The factory's error message lists registered drivers so misconfigurations are obvious. The internal/config package MUST NOT import internal/planner (§4.4 — drivers depend on interfaces, not the other way round). The allowedPlannerDrivers allowlist in the validator is a deliberate duplication; the internal/planner/react/registry_allowlist_test.go test catches drift between the two surfaces.

§13 fail-loud. Three loud-failure paths land in this PR:

  1. Unknown driver names rejected at internal/config/validate.go. The pre-boot validator surfaces a clear error (config.planner.driver: must be one of [react], got "...") so harbor validate flags the typo before the binary attempts to boot. The error message lists the allowed values so the operator sees the fix.
  2. Negative MaxSteps rejected at the same place. Zero is the documented "use driver default" sentinel; positive integers are honoured.
  3. The factory rejects missing required deps. The react driver's factory returns fmt.Errorf("planner/react: LLM client is required (FactoryDeps.LLM was nil)") when deps.LLM == nil; silent fallback to a stub is forbidden. The registry's own Resolve rejects unknown / empty driver names with ErrDriverUnknown and includes the registered-driver list in the error message.

§13 primitive-with-consumer. The primitive (the internal/planner driver registry — Factory + Register / MustRegister / Resolve + the PlannerConfig / FactoryDeps boundary types) lands with its first consumer (cmd/harbor/cmd_dev.go::bootDevStack's production planner construction) in the same PR. The wave-end E2E (test/integration/phase_d103_planner_registry_test.go) exercises the registry round-trip end-to-end through devstack.Assemble — the production wiring per D-094's source-of-truth invariant. No deferred consumer.

MaxSteps knob over per-driver Extras. The V1 schema ships a typed MaxSteps int field at the top level of PlannerConfig rather than burying it inside the opaque Extra map[string]string. Two reasons: (a) MaxSteps is the one knob the V1 reference planner already exposes via react.WithMaxSteps(n) — surfacing it as a typed field lets operators tune the circuit breaker without per-driver Extras ceremony; (b) future drivers (Plan-Execute, Workflow, Graph, Deterministic, Supervisor, MultiAgent, HumanApproval per RFC §6.2) will likely all carry a step-cap-equivalent knob, so the field captures a cross-driver universal rather than a react-specific quirk. Per-driver knobs (a deterministic planner's scripted step sequence, a supervisor planner's sub-agent list) land in Extras until a future RFC pulls them up.

Devstack mirror shape (PlannerOverride, not SkipPlanner). harbortest/devstack.Assemble reaches the planner only through the registry now. The test escape hatch is AssembleOpts.PlannerOverride planner.Planner — when non-nil, the helper uses the injected instance instead of calling planner.Resolve. There is NO SkipPlanner knob: skipping the planner entirely would leave stack.RunLoop / stack.RunLoopDriver nil even when the caller has not opted out via the existing SkipRunLoop / SkipCatalog / SkipSteering flags, which would be a confusing failure mode. The PlannerOverride shape is the minimal escape hatch tests that need a stub / scripted / pausing planner can use without re-implementing the wiring; production code never sets the override. The existing SkipRunLoop flag remains the way to opt out of the entire planner-RunLoop construction stack.

Acceptance.

  • internal/config/config.go declares Config.Planner + PlannerConfig{Driver, MaxSteps, Extra} with the documented defaults.
  • internal/config/validate.go adds validatePlanner to the validator chain + the allowedPlannerDrivers allowlist; rejects unknown drivers + negative MaxSteps; accepts empty Driver as the "use default" sentinel.
  • internal/config/loader.go::defaults populates Planner: PlannerConfig{Driver: "react"}.
  • internal/planner/registry.go adds the Factory type + Register / MustRegister / Resolve / RegisteredDrivers quartet + the four sentinel errors + the PlannerConfig / FactoryDeps boundary types.
  • internal/planner/react/init.go self-registers the react driver via init() → planner.MustRegister("react", factory). The factory adapter rejects nil LLM clients.
  • cmd/harbor/main.go blank-imports _ "github.com/hurtener/Harbor/internal/planner/react".
  • cmd/harbor/cmd_dev.go::bootDevStack replaces plnr := react.New(llmClient) with plnr, err := planner.Resolve(ctx, plannerConfigFromConfig(cfg.Planner), planner.FactoryDeps{LLM: llmClient}); the direct internal/planner/react import is dropped from cmd_dev.go (it's reached via the registry now).
  • harbortest/devstack/devstack.go mirrors the production wiring per D-094; adds AssembleOpts.PlannerOverride for tests that need a stub planner.
  • examples/dev.yaml and examples/harbor.yaml document the planner: block with the react default + a commented max_steps: knob.
  • internal/planner/registry_test.go pins the registry's bookkeeping behaviour (empty-name / nil-factory / duplicate-name rejection; unknown-driver / empty-driver loud failure with the registered-driver list; ctx-cancellation honoured; MustRegister panics on error; RegisteredDrivers sorted; factory dispatch).
  • internal/planner/react/registry_allowlist_test.go pins the validator↔registry drift guard: the validator accepts react, rejects unknowns, accepts empty as default, rejects negative MaxSteps; the registry's Resolve(react) returns a non-nil planner; the factory rejects nil LLM.
  • internal/config/validate_test.go adds six planner-specific assertions covering the same paths from the config side.
  • test/integration/phase_d103_planner_registry_test.go exercises the registry round-trip end-to-end through devstack.Assemble (the production wiring per D-094) — a config with planner.driver: react boots the devstack and produces a non-nil RunLoop; an unknown driver is rejected pre-boot; the direct planner.Resolve(react) call succeeds.
  • D-097's "Where it lives" paragraph + cmd/harbor/cmd_dev.go::bootDevStack's comment are updated to point at D-103 as the closure.
  • All tests -race green; make vet clean; go build ./... clean.

Structural precedents. D-095 (tools.oauth_providers[] + internal/tools/auth/registry.go) is the direct structural precedent — same shape, same sentinel set, same allowlist-mirror pattern, same Register / MustRegister / Resolve quartet. D-090 (tools.entries[] operator config + the Deps.OAuthProviders construction pattern) is the broader §4.4 boundary precedent. D-097 (the RunLoop wrap around the planner) is the consumer the new registry path feeds; D-098 (the per-task FSM bridge) is the downstream driver that closes the RunLoop's exit shape onto the task FSM. All three settled the cmd_dev wiring this PR retargets.


D-104 — preflight parallelisation + ephemeral-port allocation (closes #135)

Date: 2026-05-17 Status: Settled (shipping with this PR)

Where it lives: cmd/harbor/cmd_dev.go::devStack.serve (replaces http.Server.ListenAndServe with an explicit net.Listen + server.Serve(listener); emits the parseable HARBOR_DEV_BOUND=<host:port> line on stderr exactly once per boot; refreshes s.bindAddr to the OS-resolved address so host:0 binds report the actual port to subsequent loggers); scripts/preflight.sh (the orchestrator — classifies smokes by header, runs static-only + unit-tests batches in parallel before the dev boot, boots harbor dev ONCE on HARBOR_BIND=127.0.0.1:0, parses the bound port from the server log, exports HARBOR_BIND / HARBOR_PORT / HARBOR_BASE_URL / HARBOR_DEV_PORT / HARBOR_DEV_TOKEN to every live-server smoke, runs the live-server batch serially, tears down); scripts/drift-audit.sh (check 9 — every scripts/smoke/phase-*.sh MUST carry one of the three classification values, FAIL with a clear directive on miss); scripts/smoke/_template.sh (new # PREFLIGHT_REQUIRES: header line + a paragraph documenting the three values); every scripts/smoke/phase-NN.sh and phase-NNa.sh / phase-NNb.sh (78 files — the # PREFLIGHT_REQUIRES: header lands on line 2 of each; 6 are live-server, 11 are static-only, 61 are unit-tests); scripts/smoke/phase-64.sh (the fail-loud-no-config secondary boot now passes HARBOR_BIND=127.0.0.1:0 rather than --port 18198 so two sibling worktrees running preflight concurrently cannot collide on the pinned port); scripts/smoke/phase-69.sh (replaces the hardcoded --bind "127.0.0.1:${HARBOR_DEV_PORT:-18080}" with --bind "${HARBOR_BIND:-127.0.0.1:18080}" at the three inspect-events / inspect-runs invocations); scripts/smoke/phase-70.sh + scripts/smoke/phase-64.sh (docstring updates pointing at the ephemeral-port default).

Decision. Preflight's wall time and its port-pinning are the same problem viewed from two angles, and the fix is one structural change. Three intertwined design calls land here.

1. Ephemeral-port allocation by default. The harness binds 127.0.0.1:0 and reads the actual bound port back from the server log. The dev binary emits a parseable HARBOR_DEV_BOUND=<host:port> line on stderr immediately after net.Listen returns — this is a NEW contract on the dev binary, single-source: the line is emitted exactly once, with that exact prefix, on stderr. Preflight greps it out of the captured log and exports HARBOR_BIND, HARBOR_PORT, HARBOR_BASE_URL, and HARBOR_DEV_PORT (mirrored for any legacy reader) to every live-server smoke. scripts/smoke/common.sh::api_url already reads HARBOR_BASE_URL, so the existing live-server smokes flow through unchanged once the env is set. Two sibling worktrees running make preflight concurrently no longer collide because each binds an OS-assigned port. Operators who NEED a pinned port (an external integration test attaching to a known address) can still set HARBOR_DEV_PORT=18080 explicitly; the orchestrator honours the override and the resolution-from-log path works either way (the bound port matches the requested port when it's free; the harness reports whichever the OS hands back).

2. Smoke classification via the # PREFLIGHT_REQUIRES: header. Every scripts/smoke/phase-*.sh carries one classification on line 2:

  • static-only — pure file/text greps, golden compares, file-existence assertions. Runs in the parallel batch BEFORE the dev server boots; needs no shared state.
  • live-server — hits the booted dev server over HTTP (api_url, assert_status, skip_if_404, assert_json_path) or reads the preflight server log. Runs serially against the booted instance because the smokes observe and sometimes mutate shared dev state (SSE streams, the in-mem bus, the singleton draft store).
  • unit-tests — runs go test for one or more packages. Parallelisable at the bash-fan-out level; go test schedules its own internal parallelism on top.

The grammar is INTENTIONALLY inflexible: the orchestrator parses the header with a single grep + sed and demands one of the three exact values. A missing or unrecognised header fails preflight loud (and the same check fires in make drift-audit standalone). The fail-loud is per §13: silent classification defaults are forbidden because a server-mutating smoke misclassified as static-only would land in the parallel batch and produce nondeterministic flakes — exactly the failure mode the wave-end checkpoint audits (Wave 12 §17.5 N1) keep surfacing one PR late. Phases that ship a new smoke MUST classify it correctly in the same PR (the §4.2 phase-implementor contract grows item 12 in spirit; the template at scripts/smoke/_template.sh documents the convention).

3. Two-batch parallel driver in scripts/preflight.sh. The orchestrator:

  1. Builds ./bin/harbor (unchanged from pre-D-104).
  2. Classifies every scripts/smoke/phase-*.sh by header; FAILs loud on any missing / unrecognised header before doing further work.
  3. Runs scripts/drift-audit.sh (unchanged).
  4. Runs the static-only batch in parallel — no server needed, so the wall-time win is the full parallelism. Cap defaults to sysctl -n hw.ncpu (macOS) / nproc (Linux) / 4 (last-resort fallback). Override via MAX_PARALLEL_SMOKES=N. Outputs are captured per-smoke and replayed in sorted (deterministic) order after the batch finishes so the operator sees a consistent log layout.
  5. Runs the unit-tests batch in parallel using the same machinery. These run BEFORE the boot because they don't need it and the wall-time win compounds with batch 1.
  6. Boots ./bin/harbor dev ONCE on HARBOR_BIND=127.0.0.1:0 (or the pinned HARBOR_DEV_PORT if the operator set it). Parses HARBOR_DEV_BOUND= from the server log to discover the actual bound address. Waits for /healthz to return 200 against THAT address (not a hardcoded :18080). The pre-Phase-64 stub-binary branch is preserved unchanged — when the binary exits cleanly OR emits "code":"not_implemented" matching the Phase 63 stub, the boot is skipped without failure.
  7. Exports HARBOR_BIND, HARBOR_PORT, HARBOR_BASE_URL, HARBOR_DEV_PORT, HARBOR_DEV_TOKEN to every live-server smoke.
  8. Runs the live-server batch SERIALLY. Serial because the smokes observe shared dev-server state and a parallel run would produce nondeterministic order-dependent failures (an inspect-events snapshot from one smoke would carry the prior smoke's task.spawned events; a draft round-trip from one smoke would race against another smoke's draft GET). The N=6 live-server smokes finish in a few seconds combined; the wall-time win is concentrated on the parallel batches above.
  9. Tears down (graceful TERM, then KILL, then cleanup) — unchanged.

The bash-3.2-compatible fan-out (drain-head rather than wait -n) is deliberate: macOS still ships bash 3.2 as /bin/bash and a chunk of Harbor's contributors run there. The harness MUST work without a bash 4+ install.

Why. Two coupled operator-facing problems:

  1. Wall time. ~70+ phase smokes ran serially against one shared harbor dev instance. Each new wave added 1–5s per smoke; cumulative wall time was substantial, sometimes longer than the development cycle. Wave 12 §17.5 closeout audit's "Recommendations for Wave 13" pinned this as a structural item ("Recommend scheduling early in Wave 13 — every wave that lands without this added another 10–20s to the gate"). The parallel-batch path drops the wall time by the typical bash-fan-out factor (~3–5x on a 4-core laptop), which is the issue's ≥50% target.
  2. Port collision across worktrees. Two sibling worktrees running make preflight concurrently both tried to bind 127.0.0.1:18080 and one would fail. Wave 12 used HARBOR_PREFLIGHT_SKIP=1 three times for this reason (PRs #129, #130, #131); the Wave 12 §17.5 closeout audit itself couldn't run preflight cleanly. The ephemeral-port default removes the contention class entirely — N concurrent worktrees each get a distinct OS-assigned port.

Helper-tracks-production invariant (per D-094). cmd/harbor/cmd_dev.go::devStack.serve and harbortest/devstack.Assemble's test-side serve path use the same net.Listen + Serve(listener) shape. The dev binary is the only producer of the HARBOR_DEV_BOUND= log line; tests that need the bound port from the helper read it the same way the harness does. No second source of truth.

§13 fail-loud. Three loud-failure paths land in this PR:

  1. Missing classification header rejects preflight before the dev boot. Error message names the offending file(s) and points to scripts/smoke/_template.sh + CLAUDE.md §4.2.
  2. Unrecognised classification value rejects at the same gate with the same shape. Silent defaults to static-only would let a server-touching smoke leak into the parallel batch — exactly the nondeterministic-flake source we're closing.
  3. Drift between drift-audit and preflight. The same classification check runs in scripts/drift-audit.sh (check 9) so make drift-audit standalone surfaces the issue without needing a full preflight run. A header drift is caught at the cheapest possible gate.

§13 primitive-with-consumer. The primitive (the # PREFLIGHT_REQUIRES: header grammar + the orchestrator's batch dispatcher) lands with its first consumer (the orchestrator parses, classifies, and parallelises 78 existing smokes) in the same PR. The grammar is exercised end-to-end the moment the PR lands; there is no "header without consumer" window. The ephemeral-port primitive (the HARBOR_DEV_BOUND= log line + the net.Listen switch) lands with its first consumer (the preflight orchestrator reads the line and resolves the port from it) in the same PR — no orphan primitive.

CI matrix sharding (issue #135 step 4) deferred. The four-step plan in #135's body included a CI matrix-sharding step ("GitHub Actions can shard the static batch across runners"). Step 4 is deferred to a follow-up issue because (a) the primary win is local-dev wall time, where this PR's parallel-batch path delivers the bulk of the gain; (b) CI's preflight job already runs on a single runner without operator-perceived wall time (the GitHub Actions queue dwarfs the test runtime), so the marginal benefit of sharding is small; (c) sharding adds matrix-bookkeeping complexity (per-shard result aggregation, shared bin/harbor build, deterministic shard assignment) that is best evaluated standalone. The follow-up issue tracks the work without blocking the local-dev win.

Recurring-failure-mode pre-empts (per §17.7 step 3): the orchestrator uses the ${arr[@]+"${arr[@]}"} empty-array guard everywhere a classification bucket might be empty (so set -u doesn't trip on an absent STATIC_ONLY bucket — the same shape that bit Phase 63's smoke in Wave 12); the fan-out drain uses the drain-head pattern rather than wait -n (bash 3.2 compatibility on macOS); per-smoke output capture writes to a tempfile so SIGPIPE-shaped early-exit failures don't corrupt the aggregated log; the HARBOR_DEV_BOUND parse handles both IPv4 and IPv6-bracketed forms (the existing dev-bind LastIndex('😂 convention applies).

Acceptance:

  • scripts/preflight.sh boots harbor dev with HARBOR_BIND=127.0.0.1:0 by default; two sibling worktrees can run make preflight simultaneously without collision (the PR body documents the cross-worktree concurrency test).
  • Every scripts/smoke/phase-*.sh carries exactly one # PREFLIGHT_REQUIRES: live-server|static-only|unit-tests header on line 2.
  • A missing or unrecognised header fails make preflight AND make drift-audit loud, with an actionable error message naming the file(s).
  • The static-only and unit-tests batches run in parallel up to MAX_PARALLEL_SMOKES (CPU-count default); the live-server batch runs serially after the dev boot.
  • Total preflight wall time drops by ≥50% on a clean checkout (the PR body reports before/after numbers from time make preflight).
  • scripts/smoke/_template.sh documents the classification convention so new phases inherit the rule.
  • The HARBOR_DEV_BOUND=<host:port> line is emitted exactly once per harbor dev boot, on stderr, with that exact prefix.
  • CI matrix sharding is deferred to a follow-up issue; this PR's body links the issue.

D-105 — Phase 72 Console subscription protocol surface: events.subscribe canonical method + CodeIdentityScopeRequired wire code + closed-scope re-affirmation (D-079)

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (the new MethodEventsSubscribe Method = "events.subscribe" constant + its registration in canonicalMethods + the IsControlMethod exclusion — the streaming-events method is NOT a steering-control method, the Phase 54 control nine stays exclusive); internal/protocol/methods/methods_test.go (the new TestMethods_EventsSubscribe_Registered + the updated IsControlMethod exhaustiveness test + the wantMethods slice extension); internal/protocol/errors/errors.go (the new CodeIdentityScopeRequired Code = "identity_scope_required" canonical wire code + registration in canonicalCodes); internal/protocol/errors/errors_test.go (the new TestCodes_IdentityScopeRequired pinning the wire string + the wantCodes extension); internal/protocol/transports/stream/stream.go (the new writeProtocolError helper that emits the canonical JSON Protocol error envelope; the ?admin=1 reject path now returns CodeIdentityScopeRequired instead of free-form scope_mismatch: prose; the Subscribe-error switch maps events.ErrIdentityScopeRequired AND events.ErrAdminScopeRequired onto CodeIdentityScopeRequired at HTTP 403); internal/protocol/transports/stream/stream_test.go (the new TestServeHTTP_CrossTenantWithoutScope_Returns403 + the io import); internal/protocol/transports/stream/internal_test.go (the existing TestServeHTTP_SubscribeScopeRequired_* renamed to _403 + body-Code assertion + the new TestServeHTTP_BusReturnsAdminScopeRequired_Maps403 defensive case); internal/protocol/transports/stream/concurrent_scope_test.go (NEW — the D-025 contract: N=128 concurrent SSE subscribers against ONE shared Handler under -race, half triple-scoped, half admin-scoped (alternating ScopeAdmin / ScopeConsoleFleet); asserts no context bleed, audit.admin_scope_used emitted per admin subscribe, goroutine-baseline restored); internal/protocol/transports/control/status.go (the new CodeIdentityScopeRequired → 403 HTTP-status mapping); internal/protocol/transports/control/status_test.go (table extension + exhaustiveness check switched to derive from protoerrors.Codes() so a future code without a status entry surfaces by NAME + the new TestStatusFor_CodeIdentityScopeRequired_Returns403); internal/protocol/conformance/conformance.go (matrix exhaustiveness: methods.Methods() count to 11; errorCodeMatrix adds the new code; expectedHTTPStatus adds 403; new CodeIdentityScopeRequired_CrossTenantWithoutScope scenario in runErrorCodeMatrix + new EventsSubscribe_HappyPath top-level subtest; runMethodMatrixHappyPath + runMethodMatrixMalformedRequest skip MethodEventsSubscribe because the streaming-events method is served by the SSE transport, not REST control); internal/protocol/conformance/internal_test.go (errorCodeMatrix size pin updated 8 → 9); internal/protocol/control.go (the existing Dispatch rejects MethodEventsSubscribe at the REST surface with CodeInvalidRequest + a "wrong transport" message — the streaming-events vocabulary is NOT served by the REST control surface; the compile-time init() exhaustiveness check switched from if m == MethodStart to if !IsControlMethod(m) so non-control methods don't need a methodToControlType entry); internal/protocol/singlesource/singlesource.go (CanonicalMethods adds "events.subscribe" — the duplication-pin lockstep with internal/protocol/methods); test/integration/events_subscribe_scope_test.go (NEW — the §13 primitive-with-consumer discharge: six scope-degradation scenarios with real events/drivers/inmem + real protocol/auth.Middleware over the real ES256 testdata keypair + real transports.NewMux); scripts/smoke/phase-72.sh (already authored when the plan landed; covers the surface assertions per the plan); docs/plans/README.md (Phase 72 row Pending → Shipped + the detail block's new plan-file pointer); docs/plans/phase-72-console-subscription-scope.md (the binding spec — pre-existing); docs/glossary.md (extends the existing events.subscribe / identity_scope_required entries + adds Scope-degradation regression suite); README.md (the Phase 72 status row).

Decision. Phase 72 elevates the already-shipped events.subscribe substrate (Phase 05 bus + Phase 06 replay + Phase 60 /v1/events SSE + Phase 61 JWT scope-claim gate) into a first-class canonical Protocol surface. Four binding calls land here.

1. events.subscribe is a canonical Protocol method name. The wire-transport route is still GET /v1/events (Phase 60 SSE); the method-name constant methods.MethodEventsSubscribe = "events.subscribe" is the new wire-contract anchor third-party Console implementations branch on. It is NOT a steering-control method: IsControlMethod("events.subscribe") is false, the Phase 54 control nine stays exclusive, and the REST ControlSurface.Dispatch rejects events.subscribe with CodeInvalidRequest + "use the SSE transport at GET /v1/events instead" (the streaming-events vocabulary is served by the SSE transport, not the steering inbox). The wire string is final; bumping it is a Protocol version change (RFC §5.3).

2. CodeIdentityScopeRequired is the canonical wire-rejection code. Returned at the SSE edge when a Subscribe request's scope set is insufficient for the requested cross-tenant fan-in — typically a ?admin=1 request from a JWT lacking auth.ScopeAdmin or auth.ScopeConsoleFleet (D-079). HTTP status 403 (the request is authenticated; the scope set does not authorize the operation). Distinct from CodeIdentityRequired (missing triple, 401), CodeAuthRejected (token invalid, 401), and CodeScopeMismatch (reserved for the steering-control scope-claim path per RFC §6.3). The wire transport collapses BOTH events.ErrIdentityScopeRequired (Subscribe filter elided the triple AND Admin was false) and events.ErrAdminScopeRequired (Admin requested without the verified scope claim) onto this single Code: from the third-party Console's perspective the operator-actionable answer is the same — attach a scope-bearing token. The Go-level distinction stays available for in-process callers. The ?admin=1 reject path now emits the canonical JSON Protocol error envelope ({code:"identity_scope_required", message:...}) instead of the pre-Phase-72 free-form scope_mismatch: plain-text body — a third-party Console branches on body.code, not a prose grep.

3. The closed-scope set (D-079) holds — no new events.crosstenant scope. The Wave 13 decomposition §4 row 72 phrasing hinted at a dedicated events.crosstenant scope; this PR explicitly rejects that. D-079 settled the closed scope set as ScopeAdmin + ScopeConsoleFleet and brief 11 §CC-2 maps both to the cross-tenant case (admin = full fleet; console:fleet = fleet observation). Introducing a third scope would re-litigate D-079 without new evidence and would also leak into Phase 73's state-inspection methods (sessions.inspect, tasks.get, etc.) which are also scope-gated. If a finer-grained scope vocabulary is needed at some future point, that is an RFC PR + a new decisions entry, not a Phase 72 deliverable. The integration test pins this: both ScopeAdmin AND ScopeConsoleFleet satisfy the cross-tenant gate (scenarios 3 + 4); a JWT lacking both is rejected (scenarios 2 + 6).

4. §13 primitive-with-consumer discharge in-phase. Three same-PR consumers exercise the primitive end-to-end:

  • The conformance suite's EventsSubscribe_HappyPath + CodeIdentityScopeRequired_CrossTenantWithoutScope matrix scenarios — the wire transport carries a triple-scoped subscribe to 200 + an ?admin=1-without-scope subscribe to 403 + body-Code identity_scope_required, real httptest.Server, real ES256 keypair.
  • The D-025 concurrent-reuse test (concurrent_scope_test.go::TestStreamHandler_ConcurrentScopedReuse) — N=128 concurrent subscribers under -race, no context bleed, audit-emit-per-admin pinned, goroutine baseline restored. The test uses an in-package scopeMiddleware (header-driven) instead of a full JWT validator so the stress run doesn't have to mint 128 JWTs per iteration; the JWT path is exercised by the integration test below.
  • The integration test (test/integration/events_subscribe_scope_test.go) — six scope-degradation scenarios with real events/drivers/inmem + real protocol/auth.Middleware over the real ES256 testdata keypair (internal/protocol/auth/testdata/) + real transports.NewMux: triple-scoped no-cross-tenant-leak, ?admin=1 without scope → 403, ?admin=1 with ScopeAdmin → 200 + cross-tenant fan-in + audit.admin_scope_used observable, ?admin=1 with ScopeConsoleFleet → 200, expired token → 401 auth_rejected (auth layer fail-closes BEFORE the scope gate so the new Code is NOT reached), dropped-middleware shape → 403, body-vs-token identity mismatch (JWT wins per Phase 61's ctx-attached identity).

Filter-shape extensions deferred to Phase 72a. The plan's non-goals list is binding: event-type set / time-window / run-set predicates beyond the existing triple + types ship in Phase 72a per the Wave 13 decomposition §4 row 72a. events.aggregate (time-bucketed counts for sparklines) also lands in 72a. This phase is the scope-claim foundation 72a / 73c / 73d / 73g / 73j / 73k all compose on top of.

internal/protocol/conformance matrix amendment (drift guard). The matrix exhaustiveness check now expects methods.Methods() to return 11 entries (10 task-control + 1 streaming-events anchor); errorCodeMatrix exhaustiveness adds the new code; expectedHTTPStatus adds the 403 mapping; runMethodMatrixHappyPath + runMethodMatrixMalformedRequest skip MethodEventsSubscribe because its happy-path + reject-path live under dedicated subtests (the streaming-events method is served by the SSE transport, not REST control — a happy-path Dispatch against the control surface would be wrong). The runEventsSubscribeNegotiation subtest pins the wire round-trip + the "wrong transport" guard + the registration invariants.

Wire-status mapping decision: 403 not 401. Phase 61's ?admin=1 gate was already returning HTTP 403 with a free-form scope_mismatch: body string — Phase 72 preserves that status and adds the typed Code so a client can branch on the Code rather than the prose. 403 is correct semantically (the request IS authenticated; only the scope set is insufficient); 401 would imply the request lacks authentication entirely, which would be wrong.

Acceptance.

  • methods.MethodEventsSubscribe declared + registered + IsValidMethod returns true + IsControlMethod returns false + Methods() returns 11 entries in sorted order.
  • errors.CodeIdentityScopeRequired declared + registered + IsValidCode returns true + Codes() returns it in lexicographic order.
  • transports/stream maps events.ErrIdentityScopeRequired AND events.ErrAdminScopeRequired onto CodeIdentityScopeRequired at HTTP 403 with the canonical JSON Protocol error envelope. The ?admin=1 gate's pre-existing 403 reject path now emits the typed Code in the body.
  • transports/control/status.go maps CodeIdentityScopeRequired → 403; the exhaustiveness check derives from protoerrors.Codes() (D-082 amendment).
  • protocol.ControlSurface.Dispatch(MethodEventsSubscribe) returns CodeInvalidRequest with a "use the SSE transport" message — the streaming-events vocabulary is NOT served by the REST control surface.
  • internal/protocol/conformance matrix exhaustiveness includes the new method + the new code; EventsSubscribe_HappyPath + CodeIdentityScopeRequired_CrossTenantWithoutScope matrix scenarios pass against the assembled wire stack.
  • internal/protocol/singlesource.CanonicalMethods adds "events.subscribe" — the duplication-pin lockstep test passes.
  • test/integration/events_subscribe_scope_test.go exercises six scope-degradation scenarios under -race with real drivers everywhere on the seam.
  • internal/protocol/transports/stream/concurrent_scope_test.go::TestStreamHandler_ConcurrentScopedReuse exercises N=128 concurrent subscribers against one shared Handler + one shared EventBus under -race, asserts no context bleed + the audit-emit invariant + baseline goroutine count restored.
  • scripts/smoke/phase-72.sh (pre-existing, classified live-server per D-104) passes against harbor dev; the 404/405/501 → SKIP convention keeps it harmless on pre-Phase-72 builds.
  • docs/plans/README.md Phase 72 row Pending → Shipped + detail-block plan-file pointer.
  • README.md Phase 72 status row added.
  • docs/glossary.md extends the existing events.subscribe / identity_scope_required entries; adds Scope-degradation regression suite.

Structural precedent. D-079 (Phase 61 Protocol auth — ScopeAdmin + ScopeConsoleFleet + CodeAuthRejected) is the immediate predecessor and the source of the closed scope set Phase 72 consumes. D-077 (Phase 59 Protocol versioning + capability handshake) is the structural shape for the canonical-constant pattern — Capability + Method + Code all live in fixed package-level maps with NO registration escape hatch. D-082 (Wave 10 audit fixes) is the source of the protoerrors.Codes()-derived exhaustiveness check the new code's status mapping inherits.


D-106 — EventFilter wire shape + events.aggregate Protocol method consume the D-079 closed two-scope set (no events.crosstenant scope)

Date: 2026-05-19 Status: Settled (shipping with this PR — Phase 72a)

Where it lives: internal/protocol/types/events.go (the four new wire types — EventFilter, EventBucket, EventAggregateRequest, EventAggregateResponse); internal/protocol/methods/methods.go (the two new method constants — MethodEventsSubscribe, MethodEventsAggregate — and the new IsStreamingEventsMethod predicate that keeps IsControlMethod exclusive to the Phase 54 nine); internal/protocol/errors/errors.go (the new CodeIdentityScopeRequired canonical code, mapped to HTTP 403 in internal/protocol/transports/control/status.go); internal/protocol/types/version.go (the new CapEventsSubscribe capability constant + its canonicalCapabilities entry — the second Protocol surface advertised in VersionHandshake); internal/events/filter.go (the FilterFromWire converter that backfills the caller's tuple and flags RequiresAdminScope when the wire filter names a tenant other than the caller's, plus the pure MatchWire predicate that filters on header fields only — payload-byte predicates are out of scope per Brief 11 §CC-4 + D-026); internal/events/aggregate.go (the Aggregator compiled artifact + Aggregate(ctx, req) that snapshots the bus's Replayer, bucket-counts in Go, fails loudly on bad Window/Bucket pairs); internal/protocol/transports/stream/handlers.go (the AggregateHandler wire adapter mounted at POST /v1/events/aggregate); internal/protocol/transports/transports.go (the new WithAggregateClock test option + the AggregateHandler mounted alongside the existing control + stream handlers); internal/protocol/singlesource/singlesource.go (the lockstep map gains entries for both new method names + the four new wire types); internal/protocol/conformance/conformance.go (the method matrix bumps to 12, the error-code matrix gains CodeIdentityScopeRequired, the capability matrix gains CapEventsSubscribe; streaming-events methods are excluded from runMethodMatrixHappyPath / runMethodMatrixMalformedRequest via methods.IsStreamingEventsMethod); test/integration/events_filter_aggregate_test.go (the §13 primitive-with-consumer integration test under -race: real events/drivers/inmem + real protocol/auth.Middleware + real transports.NewMux, six scenarios — happy path, cross-tenant rejection without scope, cross-tenant acceptance with admin, cross-tenant acceptance with console:fleet, missing-bearer rejection, bad Window/Bucket rejection, N=16 concurrent clients under -race); test/integration/wave10_test.go::TestE2E_Wave10_VersionHandshake_ContractStable (extended to assert the post-Wave-13 capability set — task_control + events_subscribe); scripts/smoke/phase-72a.sh (real assertions over POST /v1/events/aggregate: surface probe, missing-bearer 401, happy-path 200 + 60-bucket length, bad-window 400, cross-tenant-without-scope SKIP-deferred-to-integration); docs/glossary.md (new entries: events.aggregate, EventFilter, EventBucket); docs/plans/README.md (Phase 72a row added at line 97 + detail block); docs/plans/phase-72a-events-filter-aggregate.md (the binding plan that pre-dated this implementation).

Decision. The Wave 13 Console surface needs two Protocol primitives the existing events.subscribe (Phase 60 / Phase 72) cannot supply: a structured filter shape that scoped Console clients pass at subscription time (event-type set + identity narrowing + time-window bounds), and a events.aggregate time-bucket method that powers the per-event-type stacked-area sparkline at the top of the Console Events page (page-events.md §12). Phase 72a ships both as a single coherent Protocol-surface addition.

Three intertwined design calls land here.

1. The wire filter is a four-axis-of-narrowing struct (event types + identity + window), not a free-form predicate. EventFilter carries: EventTypes []string, TenantIDs []string, UserIDs []string, SessionIDs []string, RunIDs []string, Since time.Time, Until time.Time. Empty axes default to "any" (or the caller's own component for identity axes). This shape composes naturally with the Phase 05 events.Filter (which is single-valued on triple components + admin-boolean for cross-tenant) — FilterFromWire resolves the wire request to a bus-facing Filter and flags RequiresAdminScope when the wire filter names a tenant other than the caller's. Substring/payload predicates are explicitly out of scope (Brief 11 §CC-4: high-cardinality runtime-side search is post-V1 because it would force the runtime to materialise heavy payloads through the D-026 LLM-edge safety net).

2. Cross-tenant filters consume the D-079 closed two-scope set, NOT a new events.crosstenant scope. The Wave 13 decomposition §4 row 72 hints at a "cross-tenant claim (D-079)" wording; the natural temptation is to mint a third scope (events:crosstenant) for the events surface specifically. We resolve this in the negative: D-079 already settled the closed scope set (auth.ScopeAdmin + auth.ScopeConsoleFleet), and Brief 11 §CC-2 maps BOTH to the cross-tenant fan-in case (admin = full fleet; console:fleet = fleet observation). Introducing a third scope here would re-litigate D-079 without new evidence and would also leak into the Phase 73 state-inspection methods (which are also scope-gated on the same closed set). The wire edge in internal/protocol/transports/stream/handlers.go gates on auth.HasScope(ScopeAdmin) OR auth.HasScope(ScopeConsoleFleet); a request lacking BOTH is rejected with CodeIdentityScopeRequired (HTTP 403). The operator may revisit at a future RFC PR + new decisions entry; not a Phase 72a deliverable.

3. events.aggregate snapshots the bus's Replayer and counts in Go. The aggregator is a D-025-safe compiled artifact: bus + clock are set once at construction, never mutated; each Aggregate(ctx, req) allocates its own bucket slice and returns when done. Bucket arithmetic is deterministic — Window % Bucket == 0 is mandatory (else ErrAggregateBadWindow → HTTP 400 / CodeInvalidRequest); every bucket is present even when empty so a rendering client sees a contiguous time axis. A bus that does not implement Replayer (or whose ring is disabled) fails loudly with ErrReplayUnavailable rather than producing an empty series that looks like "no events" (CLAUDE.md §5: fail loudly, no silent degradation). The concurrent-reuse contract (D-025) is pinned by an N=100+-goroutine test at internal/events/aggregate_test.go::TestAggregate_ConcurrentReuse + an N=16 wire-level test at test/integration/events_filter_aggregate_test.go::TestE2E_Phase72a_ConcurrentAggregateClients.

Why. The Console Events page (page-events.md) is the canonical Stage-2 consumer for both primitives. Without EventFilter, the Events page would either fetch every event in the runtime and filter client-side (the predecessor sharp edge Brief 11 §CC-4 names explicitly as "runtime-side high-cardinality") or paper over with a hand-rolled query-string convention on /v1/events. Without events.aggregate, the per-event-type sparkline would either be missing or hand-rolled from the live SSE stream (a heavy, latency-bound, and unrepresentative approach for the multi-hour windows operators actually care about). Both gaps would push Console-side persistence — exactly the shape D-061 forbids.

The §13 primitive-with-consumer rule binds in-phase: the integration test consumer (test/integration/events_filter_aggregate_test.go) exercises the full wire surface end-to-end with real drivers under -race, and the per-package conformance tests pin the filter matrix, bucket arithmetic, and concurrent-reuse contract. The Stage-2 Console consumer (Phase 73g Events page) lands in the same wave per the Wave 13 staging.

Helper-tracks-production invariant (per D-094). The AggregateHandler lives in internal/protocol/transports/stream/ alongside the SSE handler — same package, same identity-resolution helpers (resolveIdentity), same auth-middleware fallback contract. There is no second identity-resolution shape on the events.aggregate surface.

§13 fail-loud paths.

  1. Missing identity at the wire edge → CodeIdentityRequired (HTTP 401) before the handler runs (Phase 61 auth.Middleware) OR before any aggregate work (resolveIdentity failure).
  2. Cross-tenant filter without the closed-set scope claimCodeIdentityScopeRequired (HTTP 403). Distinct from CodeIdentityRequired (no identity at all) and CodeAuthRejected (token invalid).
  3. Non-dividing Window/Bucket pairCodeInvalidRequest (HTTP 400). The aggregator never silently rounds.
  4. Bus without Replayer (forward-only driver or ReplayBufferSize=0) → CodeRuntimeError (HTTP 500) with a clear "no historical aggregation" message. Never an empty series that looks like "no events".

§13 primitive-with-consumer. The four primitives (the wire types, the two method constants, the new error code, the new capability) all land with their first consumers in the same PR:

  • EventFilter + MatchWire are consumed by Aggregator.Aggregate (the per-event filter loop) AND by FilterFromWire (the bus-edge converter); the unit tests at internal/events/filter_test.go pin the predicate's full axis matrix.
  • events.aggregate + AggregateHandler are consumed by the integration test (TestE2E_Phase72a_*) over real httptest at the wire edge.
  • CodeIdentityScopeRequired is consumed by the cross-tenant-without-scope integration test scenario AND by the httpStatus mapping table in internal/protocol/transports/control/status.go (HTTP 403).
  • CapEventsSubscribe is consumed by TestE2E_Wave10_VersionHandshake_ContractStable (the wave-10 E2E gains a pin on the post-Wave-13 capability set) AND by the conformance suite's runVersionHandshake check.

Recurring-failure-mode pre-empts (per §17.7 step 3 + the locked-in coordinator-verify protocol):

  • Method matrix exhaustiveness. Adding two methods means bumping the assertMethodMatrixExhaustive count from 10 to 12 AND adding the new constants to the wantSet AND skipping streaming-events methods in runMethodMatrixHappyPath / runMethodMatrixMalformedRequest (they route through their own transports, not the REST control surface). The IsStreamingEventsMethod predicate is the structural way to express this — a future Protocol-surface phase that adds another non-control method extends the same predicate, never re-implementing the skip logic.
  • Error code matrix exhaustiveness. Adding CodeIdentityScopeRequired means bumping errorCodeMatrix count (the lockstep test in internal/protocol/conformance/internal_test.go::TestInternal_ErrorCodeMatrix_AllCanonical fails loudly until the count matches) AND adding the new entry to expectedHTTPStatus (HTTP 403) AND mapping it in internal/protocol/transports/control/status.go.
  • Singlesource lockstep. Adding two methods + four wire types means extending singlesource.CanonicalMethods (the duplicated set the checker uses) AND singlesource.CanonicalWireTypes (the type-home map). The TestSingleSource_CanonicalMethodsInLockstep / TestSingleSource_CanonicalWireTypesInLockstep lockstep tests fail loudly on drift.
  • Wave 10 handshake E2E drift. The pre-Wave-13 wave-10 E2E asserted len(caps) != 1 (task_control only); landing a new capability silently here would either fail the wave-10 E2E or — worse — keep the assertion green by accident. The wave-10 E2E gains a second pin (Accepts(CapEventsSubscribe)) AND a count of 2 in the same PR per §17.6 ("fix what the integration test finds — no matter where the bug lives").
  • Aggregator clock for tests. Production aggregator clock is real-time UTC; tests with backdated events need a deterministic clock. The WithAggregatorClock aggregator option + the new WithAggregateClock mux option close this — the integration test injects a fixedNowPhase72a instance and the wire-level bucket arithmetic is deterministic.

Acceptance:

  • internal/protocol/types/events.go declares the four wire types; the singlesource lockstep recognises them.
  • internal/protocol/methods/methods.go declares MethodEventsSubscribe + MethodEventsAggregate; IsValidMethod returns true for both; IsControlMethod returns false for both (they route through their own transports); IsStreamingEventsMethod is the predicate the transport router uses to classify.
  • internal/protocol/errors/errors.go declares CodeIdentityScopeRequired; the canonical-set ordering is stable; IsValidCode returns true; the HTTP-status map returns 403.
  • internal/protocol/types/version.go declares CapEventsSubscribe; the version handshake advertises it alongside CapTaskControl.
  • internal/events/filter.go::MatchWire is a pure predicate over event headers; the filter matrix unit test exercises every axis combination including the empty-filter no-op case.
  • internal/events/aggregate.go::Aggregator.Aggregate returns a deterministic bucket series; the bucket-arithmetic test pins the per-bucket counts; the concurrent-reuse test pins N≥100 invocations against ONE shared aggregator under -race.
  • POST /v1/events/aggregate is mounted on the wire mux; the integration test exercises happy + reject paths end-to-end against a real httptest.Server with real ES256-signed JWTs (no mocks at any seam per §17.3).
  • Cross-tenant requests without auth.ScopeAdmin OR auth.ScopeConsoleFleet return 403 + CodeIdentityScopeRequired. NO new events.crosstenant scope.
  • scripts/smoke/phase-72a.sh shows OK > 0 on a live Phase 72a build (the surface probe, missing-bearer 401, happy-path 200 + 60-bucket length, bad-window 400 assertions all pass) and SKIPs cleanly on pre-Phase-72a builds.
  • The wave-10 handshake E2E is extended to assert the post-Wave-13 capability set (task_control + events_subscribe).
  • docs/glossary.md carries the new vocabulary entries; docs/plans/README.md Phase 72a row flips to Shipped; docs/decisions.md carries this D-106 entry.

D-107 — Phase 72b IdentityScope admin-impersonation extension: Actor / Requester / Impersonating triplet on the wire; auth.ScopeAdmin gate at the Protocol edge; typed auth.AdminScopeUsedPayload on the existing audit.admin_scope_used event

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/types/control.go (IdentityScope.Actor / Requester / Impersonating as *IdentityScope pointers with omitempty JSON tags + the IsImpersonating() predicate); internal/protocol/types/types_test.go (JSON round-trip + omitempty regression + StartRequest impersonation round-trip); internal/protocol/auth/events.go (AdminImpersonationReason constant + IdentityTriple flat audit shape + AdminScopeUsedPayload typed payload composing events.SafeSealed); internal/protocol/auth/events_test.go (compile-time SafePayload assertion + shape pinning + sentinel stability); internal/protocol/transports/control/control.go (the impersonation gate assertImpersonationShape + the audit emit emitAdminScopeUsed + the WithEventBus / WithRedactor / WithClock handler options + the redactedString helper); internal/protocol/transports/control/impersonation_test.go (17-shape gate table covering the five primary cases + actor-mismatches-JWT + requester-diverges-from-actor + missing-actor + missing-requester + actor-incomplete + requester-incomplete + no-ctx-identity + bare-handler-no-bus + non-impersonation-mismatch-still-rejected + cross-method + redactor-error-paths); internal/protocol/transports/transports.go (WithRedactor mux option threading the redactor into the control handler); test/integration/identityscope_impersonation_test.go (REAL Phase 60 transport mux + REAL Phase 61 ES256 validator + REAL audit/drivers/patterns redactor + REAL events/drivers/inmem bus, 6 end-to-end scenarios under -race); scripts/smoke/phase-72b.sh (smoke assertions per the master plan; auto-SKIPs until the Protocol JSON-RPC stub lands); docs/plans/README.md (Phase 72b row added as Shipped); docs/plans/phase-72b-identityscope-impersonation.md (the plan); docs/glossary.md (the four impersonation entries already landed pre-PR); docs/decisions.md (this entry).

Decision. The Phase 72b extension introduces three new *IdentityScope pointer fields on internal/protocol/types.IdentityScopeActor / Requester / Impersonating — to carry the admin-on-behalf-of-user triplet on every Protocol request. The fields are mutually-required: an IdentityScope MAY carry zero impersonation fields (today's behaviour, the verified JWT identity IS the request identity) OR all three set (admin-on-behalf-of-user). The runtime rejects any other shape loudly at the Protocol edge — never silently degrades. Five gating invariants on the impersonation path:

  1. auth.ScopeAdmin is mandatory. A non-admin token with Impersonating set is rejected with CodeScopeMismatch (HTTP 403) before Dispatch runs. The closed scope set from D-079 is reused; no new scope is minted.
  2. The impersonated triple is identity. Impersonating.Tenant / User / Session must all be non-empty — identity is mandatory, with no identity-downgrading knob (CLAUDE.md §6 rule 9). A missing component rejects with CodeIdentityRequired (HTTP 401).
  3. The Actor MUST equal the verified JWT identity. The Actor is the audit anchor — faking it is a privilege-escalation attempt. Actor.Tenant / User / Session must equal the JWT's verified triple; a mismatch rejects with CodeScopeMismatch (HTTP 403).
  4. V1 invariant: Requester == Actor. Delegated-impersonation chains ("admin A acting on behalf of admin B's audited request") need the two fields to diverge; V1 supports single-hop impersonation only. A divergence at V1 rejects with CodeScopeMismatch. The field exists on the wire so post-V1 delegated impersonation does not require a wire-shape break.
  5. Top-level Tenant/User/Session == Impersonating triple. The run executes as the impersonated identity; the top-level triple carries that identity. A mismatch rejects with CodeIdentityRequired.

On accept, the transport emits a audit.admin_scope_used event with a typed auth.AdminScopeUsedPayload (Actor + Requester + Impersonating as flat IdentityTriple + Reason="impersonation" + Method=<protocol method>) onto the wired event bus. The payload runs through the wired audit.Redactor BEFORE the publish per CLAUDE.md §7 rule 6 + D-020 — the redactor walks a map[string]any of the fields, the gate extracts redacted strings, the typed payload is assembled and published. The event Identity is the IMPERSONATED triple so a Console subscribing to events for the impersonated session sees the audit emit alongside the run's own events; the Actor on the payload provides audit-side correlation.

Why. Brief 11 §PG-5 ("Run as another identity") names the verbatim triplet (actor=admin, requester=admin, impersonating=user_id). The Sessions page mockup (PR #138) needs an identity column that surfaces "who initiated this run" vs. "who the run executes as" — the wire primitive is the load-bearing extension that makes the column meaningful end-to-end. Brief 12 §"two-surface model" pins the wire-shape side: the triplet MUST live on internal/protocol/types.IdentityScope, not in web/console/, so a third-party Console implementing harbor console from scratch sees the same shape Harbor's own code does.

§13 primitive-with-consumer. The primitive (the Actor / Requester / Impersonating wire fields + the transport-edge gate + the typed audit payload) lands with its first consumer (test/integration/identityscope_impersonation_test.go — six end-to-end scenarios through the production wire path + the in-package gate table covering every defensive branch) in the same PR. Per the plan's non-goal carve-out, the Console UI consumer (the Sessions page identity column extraction stub) lands in 73c later in Wave 13 Stage 2.2; the same-wave consumer requirement is satisfied by the integration test consumer here. No deferred consumer.

§13 fail-loud. Five loud-failure paths land in this PR:

  1. Impersonating set without auth.ScopeAdmin rejects at the transport edge with CodeScopeMismatch BEFORE Dispatch runs. Defence in depth at the transport edge mirrors Phase 61 D-079 §4.
  2. Incomplete impersonation triple (any of Tenant / User / Session empty) rejects with CodeIdentityRequired. Identity is mandatory; the impersonated triple is identity too.
  3. Actor != verified JWT identity rejects with CodeScopeMismatch. The audit trail's accountability stands or falls here.
  4. Requester != Actor rejects with CodeScopeMismatch. V1 invariant; delegated impersonation is post-V1.
  5. Bus / Redactor not wired on the transport + an impersonation request → refuse with CodeRuntimeError. The audit emit is the load-bearing accountability surface; without it the gate refuses fail-closed rather than silently accepting (CLAUDE.md §13 "Silent degradation").

§13 closed scope set. No new scope is minted. The gate uses auth.ScopeAdmin from the closed two-scope set settled in D-079 (auth.ScopeAdmin + auth.ScopeConsoleFleet); a future expansion of the scope set is a Protocol-surface phase, not an ad-hoc addition.

Audit payload shape — typed auth.AdminScopeUsedPayload co-located with AuthRejectedPayload, NOT the pre-existing events.AdminScopeUsedPayload. The pre-existing emit site (the Phase 05 events.Subscribe admin-filter, internal/events/drivers/inmem) continues to use the lighter events.AdminScopeUsedPayload shape (Tenant / User / Session / SubscriberID). Phase 72b's impersonation emit needs a richer typed payload (Actor + Requester + Impersonating + Reason + Method), so a new auth.AdminScopeUsedPayload lives next to AuthRejectedPayload in internal/protocol/auth/events.go. The two payload types share the canonical audit.admin_scope_used event type; subscribers branch on the payload shape. Promoting the existing emit site to the new typed payload is a follow-up deferred to a Wave 13 audit cleanup PR.

IdentityTriple separate from identity.Identity. The audit payload uses a flat IdentityTriple{Tenant, User, Session} rather than re-using identity.Identity because the audit payload lives on the wire-adjacent bus surface, not on the runtime's identity-quadruple surface. Mirroring the runtime type 1:1 would couple the audit shape to internal storage refactors (the same anti-pattern RFC §5.1 names for the wire IdentityScope).

Defence in depth: assertBodyMatchesAuthedIdentity bypasses impersonation-shaped bodies. When a body carries Impersonating, the top-level Tenant/User/Session is the IMPERSONATED identity (deliberately != JWT). The Phase 61 body-vs-JWT check would otherwise reject the request. The impersonation gate is the authoritative check for that shape and runs immediately after the Phase 61 check; the Phase 61 check returns nil for impersonation-shaped bodies and lets the impersonation gate take over.

Backward compatibility. When all three impersonation fields are empty (Impersonating == nil), the behaviour is identical to today's StartRequest / ControlRequest surface — the verified JWT identity IS the request identity, no audit emit, no gate. Existing tests pass unchanged. New tests cover the impersonation paths; the bare NewHandler(surface) constructor still compiles and works for non-impersonation paths.

Acceptance.

  • internal/protocol/types/control.go extends IdentityScope with the three pointer fields + IsImpersonating() predicate; godoc pins V1 semantics.
  • internal/protocol/types/types_test.go JSON round-trip + omitempty regression + StartRequest cross-check.
  • internal/protocol/auth/events.go adds AdminImpersonationReason constant + IdentityTriple flat shape + AdminScopeUsedPayload typed payload.
  • internal/protocol/auth/events_test.go compile-time SafePayload + shape pinning + sentinel stability.
  • internal/protocol/transports/control/control.go extends ServeHTTP with the impersonation gate + the audit emit; adds WithEventBus / WithRedactor / WithClock handler options.
  • internal/protocol/transports/control/impersonation_test.go 17-shape table covering every gate edge case + every emit-path defensive branch.
  • internal/protocol/transports/transports.go adds WithRedactor mux option threading the redactor into the control handler.
  • test/integration/identityscope_impersonation_test.go runs the round-trip end-to-end through the REAL Phase 60 transport mux + REAL Phase 61 ES256 validator + REAL audit/drivers/patterns redactor + REAL events/drivers/inmem bus; six scenarios under -race; N=16 concurrency stress.
  • scripts/smoke/phase-72b.sh upgraded from skeleton to real assertions (auto-SKIPs until the Protocol JSON-RPC stub lands per the protocol_call convention).
  • Coverage on internal/protocol/transports/control: 90.7% (target 89.5%); internal/protocol/auth: 90.0% (target 90%); internal/protocol/types: 86.3% (the new code is 100% covered; the version.go drag is pre-existing).
  • make vet test lint build clean; go test -race -count=1 ./... green.
  • docs/decisions.md D-107 entry filed; docs/plans/README.md row 72b flipped to Shipped; docs/glossary.md already carries the four entries (impersonation, actor, requester, impersonating).

D-108 — Phase 72c search.* cluster (5 methods, one phase): runtime-side search over sessions / tasks / events / artifacts; search.query palette dispatcher fans out + merges; cross-tenant gated on auth.ScopeAdmin (D-079 reuse, NO new search.crosstenant scope); CodeScopeMismatch for the cross-tenant rejection (CodeAuthRejected stays pinned to 401 per D-079)

Date: 2026-05-19 Status: Settled Where it lives: RFC §5.2 (state snapshots row) + §6.13 (typed event bus) + §7 (Console layer), docs/plans/phase-72c-search-cluster.md, docs/plans/wave-13-decomposition.md §4 / §12 lock-in #4 ("keep as one phase"), CLAUDE.md §6 (multi-isolation) + §13 (primitive-with-consumer + identity-mandatory), internal/protocol/methods/methods.go (the five MethodSearch* constants + canonicalSearchMethods + IsSearchMethod), internal/protocol/types/search.go (the SearchRequest / SearchResponse / SearchResultRow / SearchFilter / SearchFacet / SearchArtifactRef wire shapes + the DefaultSearchPageSize=20 / MaxSearchPageSize=200 bounds), internal/protocol/singlesource/singlesource.go (the CanonicalMethods + CanonicalWireTypes extensions in lockstep), internal/protocol/search.go (the transport-agnostic SearchSurface dispatcher + mapSearchError), internal/protocol/transports/control/control.go (the WithSearchSurface option + the IsSearchMethod-routing path in ServeHTTP), internal/protocol/transports/control/search_handler.go (the REST decoder + Protocol-error wire mapper for the five methods), internal/search/ (the per-index seam + the Query aggregator + per-index packages internal/search/{sessions,tasks,events,artifacts}/), internal/sessions/registry.go (the new SessionLister capability ListSnapshots), test/integration/search_cluster_test.go (the §17.1 cross-subsystem integration test), scripts/smoke/phase-72c.sh, docs/glossary.md (the seven new entries), brief 11 §CC-4 (runtime-side vs Console-side split), brief 12 §"two-surface model".

Why: Brief 11 §CC-4 settled the design split: runtime-side search for the four high-cardinality entity classes (sessions, tasks, events, artifacts), Console-side adapters for the slow-moving catalog data (tools, agents, flows, MCP connections). The wave-13 decomposition doc §12 lock-in #4 settled the shape ("keep as ONE phase" — the methods share the same conformance surface). Six design calls warrant a durable home so a future auditor doesn't churn them.

  1. The five methods land together as one phase, not split into "primitive then UI" or "palette then per-index." §13's primitive-with-consumer rule reads here as "the palette dispatcher and the per-index Searchers are each other's first consumer." Splitting the cluster would create two halves where each half's tests would still pass (the dispatcher fans out to mocks; the per-index Searchers run standalone) but the seam between them would be unexercised. The decomposition doc §12 lock-in #4 ("Keep as one phase. The methods share the same conformance surface — identity filtering, redaction, pagination, scope claim") is the binding answer; this entry records the rationale for the next plan author who sees five methods and thinks "split."

  2. Cross-tenant gating reuses auth.ScopeAdmin (D-079), NOT a new search.crosstenant scope. Phase 72's plan Non-goals explicitly forbid a third scope; the audit in PR #142 closed the proposal. The closed two-scope set (ScopeAdmin + ScopeConsoleFleet) is sufficient — search reuses the same admin entitlement that events.subscribe Admin=true consults. Minting a per-subsystem scope per cross-tenant call site would re-fragment the auth surface into N entitlements where one suffices. The search.ErrCrossTenantRequiresAdmin sentinel surfaces in the API; the wire mapping is the next call.

  3. The cross-tenant rejection wire code is CodeScopeMismatch (HTTP 403), NOT CodeAuthRejected. The Phase 72c plan's acceptance criteria asked for "403 with CodeAuthRejected" — but CodeAuthRejected is pinned to HTTP 401 by D-079 (the wire mapping for a JWT that failed cryptographic verification). Two codes mapping to two different HTTP statuses cannot share one wire-line; the search subsystem's "authenticated but lacking the cross-tenant claim" shape is the same shape as the steering-control "authenticated but lacking the admin scope" rejection (RFC §6.3 PRIORITIZE), which is already CodeScopeMismatch (403). Reusing it keeps the wire taxonomy stable: 401 = "your token failed to verify"; 403 = "your token verified but you lack the privilege for this action." A reader of the plan should treat the "403 with CodeAuthRejected" line as the plan's intent (any 403 for the cross-tenant case is OK); the implementation realises it via the existing CodeScopeMismatch code. The Wave 13 audit can amend the plan text in-place if it surfaces this drift.

  4. Per-index searchers consume the existing read-side surface; no new Protocol method shapes for the underlying entities. Sessions search reads from a new sessions.SessionLister.ListSnapshots capability on *sessions.Registry (additive — the SessionRegistry interface is unchanged). Tasks search reads from the existing tasks.TaskRegistry.List per session, iterating sessions visible to the caller. Events search reads from the existing events.Replayer interface (Phase 06). Artifacts search reads from the existing artifacts.ArtifactStore.List with a scope filter. V1 ships in-memory linear-scan semantics; the wire shape (SearchRequest / SearchResponse) is index-strategy-agnostic, so a post-V1 FTS sidecar (SQLite FTS5 / Postgres pg_trgm) is an additive swap of the implementation behind the same Searcher interface (the §4.4 seam shape).

  5. Heavy-payload bypass at the row-construction site (D-026), not at the wire boundary. Every per-index Searcher calls search.RedactAndCapPreview(ctx, redactor, preview); the helper redacts, checks the byte-length against HeavyPreviewThreshold (32 KiB, mirroring the LLM-edge safety net), and either returns a capped preview (≤ PreviewMaxRunes=256 runes after redaction) or signals the caller to populate a *SearchArtifactRef instead. The artifacts index ALWAYS carries a Ref (artifacts are by-reference by construction); the other three indexes carry one only when the preview byte-length would breach the threshold. The wire layer never has to introspect row bytes — the row arrives correctly shaped.

  6. The search.query palette dispatcher carries no index of its own and emits no events. It is a pure aggregator: identity + scope validation runs at the aggregate edge; then concurrent fan-out to every selected index via per-index goroutines with a PerIndexTimeout=5s cap; then merge + sort + paginate the union. Per-index hard errors (identity / scope) propagate as request failures; per-index soft errors (an upstream blew up) degrade gracefully — the dispatcher returns the union of the surviving indexes' rows. This is the ONE deliberate exception to the §13 fail-loud rule, and only AFTER the aggregate identity + scope gates have passed: those rejections stay loud at the dispatcher's own boundary.

§13 primitive-with-consumer — discharged in-phase. The Searcher interface + SearcherRegistry + the four per-index implementations + the Query aggregator + the SearchSurface Protocol dispatcher + the WithSearchSurface transport option are all primitives; their consumers ship in the same PR. Per-package query-shape conformance tests (one per runtime-side index, plus the aggregate test for search.query) exercise each Searcher end-to-end with the real driver dependencies (real *sessions.Registry, real tasks.TaskRegistry, real events.Replayer, real artifacts.ArtifactStore). The test/integration/search_cluster_test.go consumes the full chain — real Protocol transport + real auth shape + real cross-subsystem fan-out — against the assembled surface. The D-025 concurrent-reuse test (internal/search/concurrent_reuse_test.go, N≥100 against the shared registry under -race) closes the reusable-artifact contract for the entire subsystem.

§4.4 seam posture. One Searcher interface per index, one implementation per index in V1, no driver pluralism. The §4.4 seam shape (interface + factory + registry) is present for the post-V1 FTS-sidecar swap; it is NOT used to add a "default" stub Searcher to the registry (that would violate the §13 "stubs as production defaults" rule). The aggregate dispatcher silently skips unregistered indexes (a partial deployment is acceptable); a missing per-method registry entry surfaces at the Protocol surface as CodeUnknownMethod ("no Searcher registered for index X on this Runtime"), not as a silent empty result.

Conformance suite extension is in-phase only for the matrix exhaustiveness check (internal/protocol/conformance/conformance.go::assertMethodMatrixExhaustive now expects 15 methods, the five search methods are explicitly registered in wantSet). The per-method happy-path / malformed-request scenarios for the search cluster are deferred to Phase 80 (the Phase 80 plan extends the suite per its existing scope); until then, the conformance suite's MethodMatrix_* runners explicitly t.Skip the five search methods with an issue-style reason ("phase-72c: search.* methods exercised by their per-package conformance + integration tests; conformance-suite scenario lands in Phase 80"). The skip is observable (per CLAUDE.md §5: no silent skips); the per-package + integration tests cover the surface in the interim.

Wave-13 staging note. This phase is in Wave 13 Stage 1 Batch A (the operator's locked answer to §9 question 2 in docs/plans/wave-13-decomposition.md §12 #2). It depends on the shipped Phase 60 (transport), Phase 61 (auth + ScopeAdmin), Phase 06 (events Replayer), Phase 08 (sessions registry), Phase 20 (tasks registry), Phase 17/18/19 (artifacts store). It does NOT depend on Phase 73 (state inspection) — the decomposition doc §4 lists 73 as a Deps entry, but the dependency is on the ALREADY-SHIPPED SessionRegistry / TaskRegistry interfaces; Phase 73 will extend them additively (the contingency, if 73 reshapes them backward-incompatibly, is recorded in the phase plan's Risks section).


D-109 — Phase 72d notification.* event topic: per-class topic naming + runtime-internal mapper/Subscriber + Stage-1 binding test consumer

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: internal/runtime/notifications/notifications.go (the V1 event-type constants + RegisterEventType from init() + V1NotificationClasses / V1TriggerEventTypes snapshots); internal/runtime/notifications/payloads.go (NotificationPayload — non-SafeSealed so the redactor walks the caller-controlled Summary — plus IdentityRejectedPayload for the fail-loudly path); internal/runtime/notifications/mapper.go (the pure-function Map(ctx, ev) translator with one case per V1 trigger type — task.failed, tool.approval_requested, governance.budget_exceeded, tool.auth_required, pause.requested); internal/runtime/notifications/subscriber.go (the long-lived Subscriber that opens an Admin-scope subscription on V1TriggerEventTypes, runs Map per delivered event, and republishes the synthesised notification.* events through the same bus — fail-loudly on identity-rejection via the D-033 <missing> sentinel and on mapper errors via runtime.error); internal/runtime/notifications/errors.go (ErrUnmappable); internal/runtime/notifications/mapper_test.go (5 V1 mapping unit tests + unmapped-returns-nil-nil + ErrUnmappable + TestMap_ConcurrentReuse N=100 under -race); internal/runtime/notifications/subscriber_test.go (the BINDING §13 Stage-1 round-trip — TestSubscriber_TaskFailedSynthesisesNotificationTaskFailed — plus TestSubscriber_Run_GoroutineLeak + nil-bus / nil-log constructor guards); test/integration/notifications_topic_test.go (the §17 integration suite — every V1 mapping round-trip with identity propagation + the missing-identity fail-loudly mode + the N=20 concurrent-producers stress); scripts/smoke/phase-72d.sh (live unit-tests-classified smoke that runs the binding + mapper + leak + integration suites); docs/glossary.md (the seven notification.* taxonomy entries already landed via the wave-13 plan-doc PR); docs/plans/phase-72d-notification-event-topic.md (the binding phase plan); docs/plans/wave-13-decomposition.md (72d row annotated Shipped — D-109).

Decision. Three intertwined shape calls land here. All three are documented so a future PR cannot quietly retrofit them.

1. Per-class topic naming — notification.task_failed / notification.tool_approval_requested / notification.governance_budget_exceeded / notification.auth_required / notification.pause_requested. Two valid shapes exist for a notification family: per-class topics (one events.EventType per notification class) and per-instance emit (a single notification.emit event type with a class-discriminator field on the payload). The wire-shape decision was left open in the Wave 13 decomposition (PR #141 §12) for a Phase 72d call; per-class topic naming wins on three criteria:

  • Composes naturally with the existing event taxonomy — every other Harbor event family (tool.*, task.*, governance.*, pause.*, audit.*, bus.*) uses per-class topics. A per-instance shape would be the lone outlier and would force every Console / CLI / third-party consumer that already knows the taxonomy to special-case notifications.
  • Composes naturally with events.subscribe's topic-filter shape (Phase 72a) — subscribers narrow by event_types: [...], which gives them a built-in per-class filter for free. A per-instance shape would force the subscribe filter to consume the payload's class field (a payload-aware filter doesn't exist in V1) or to over-deliver and require client-side filtering.
  • Replay semantics line up — Phase 06's ring + Phase 57's durable log filter by EventType. Per-class topics give the durable log per-class indexing without payload-walks.

The tradeoff: adding a new notification class is a one-line RegisterEventType + one mapper case (visible in this PR's notifications.go + mapper.go). A per-instance shape would have grown a class enum on the payload instead; both are O(1) to extend. The compose-with-existing-taxonomy argument is the load-bearing one.

2. Runtime-internal mapper + Subscriber wire — pure function + long-lived bus consumer. The mapper (notifications.Map(ctx, ev) []events.Event) is a pure function with no I/O, no global state, and no time.Now() dependency (the bus's Publish path fills OccurredAt). The Subscriber is a long-lived component that opens an Admin: true filter on V1TriggerEventTypes() and republishes each match through the same bus. Two design consequences:

  • D-025 concurrent-reuse is trivially satisfiedMap has no state to share, so N concurrent invocations against a single instance are correct by construction. The mandatory N≥100 concurrent-reuse test still ships (TestMap_ConcurrentReuse) so the contract is observable and CI catches any future shape regression that introduces state.
  • The unified bus stays unifiednotification.* rides the same EventBus.Publish path every other event ships through. Brief 06 §1's "one bus, not two" rule is preserved; there is no parallel notification channel. The mapper does NOT introduce a parallel observability surface; it republishes onto the existing bus with a different event-type prefix.

The Subscriber's Admin-scope subscribe is necessary because the trigger events span the full identity space (every tenant's task.failed should map to a notification). The bus emits audit.admin_scope_used on every Admin-true Subscribe, so the Subscriber is observable as a privileged consumer the same way every other Admin-scope subscriber is.

3. NotificationPayload is events.Sealed (not SafeSealed) — Summary walks the redactor. The mapper derives a human-readable Summary string from each trigger's typed payload (e.g. Task task-abc failed (error_code=tool_invocation_failed)). The Summary is caller-controlled in the sense that the trigger's payload bytes feed into it — even though every V1 trigger payload is itself SafePayload, the principle of "caller-derived strings walk the redactor" stays load-bearing (CLAUDE.md §7 rule 6 / D-020). Consumers therefore see the post-redaction shape — events.RedactedMap{Data: {class, severity, summary, deeplink, origineventtype, origineventsequence}} — with the redactor's reflective field-name lowering (Severityseverity, OriginEventTypeorigineventtype, etc.). The IdentityRejectedPayload IS SafeSealed because every field is a bounded enum or a constant string ("Subscriber.Run" / "tenant_id <missing>" / known EventType); the typed shape survives the bus.

§13 primitive-with-consumer compliance — the Stage-1 binding test consumer is BINDING per Wave 13 decomposition §12 item 5. A notification.* topic with no consumer until 73a Overview's alert ribbon would have introduced a primitive-without-consumer window across stages 1 and 2 of Wave 13. The Stage-1 test consumer at internal/runtime/notifications/subscriber_test.go::TestSubscriber_TaskFailedSynthesisesNotificationTaskFailed closes this — it fires a deliberate task.failed through the real in-mem bus + real audit redactor and asserts a separately-scoped subscriber receives the synthesised notification.task_failed with the trigger's identity preserved. The UI consumers (73a Overview alert ribbon, 73m Settings notification-routing matrix) land in Stage 2 and cannot substitute for the Stage-1 runtime test consumer — that's the operator amendment locked in §12 item 5, and this PR honours it.

Why. Closes Wave 13 §12 item 5 + the 72d acceptance criteria. Lands the runtime side of Brief 11 §CC-3's "separate notification topic populated by a runtime-internal event→notification mapper for the small subset of event types that surface to users." A third-party Console implementation gets the same taxonomy out of the box (D-061 + D-091) because the mapper lives in the runtime, not the Console.

Protocol additions. None. notification.* is an event topic consumed via the existing events.subscribe Protocol surface (Phase 60 + Phase 72 + Phase 72a). Phase 72d ships zero new HTTP routes, zero new Protocol method names, and zero new wire types — the per-class subscribe-filter shape exists naturally because events.subscribe already accepts an arbitrary event_types filter.

Trigger types covered at V1. Five: task.failed (Phase 20), tool.approval_requested (Phase 31), governance.budget_exceeded (Phase 36a), tool.auth_required (Phase 30), pause.requested (Phase 50). Brief 11 §CC-3's starter list also names agent.credentials_expired and runtime.health_degraded; those event types are NOT shipped at V1, so the mapper's switch leaves them unmapped (the default branch returns (nil, nil) for any event type outside the V1 set, which is the expected outcome for the vast majority of bus traffic). Adding a mapping for either is a one-line change in a future phase when the input event type lands.

Identity-rejection fail-loudly path. When a trigger event arrives with the D-033 <missing> sentinel substituted into any identity component (which CAN happen if an upstream identity-rejection emitter — memory.identity_rejected, skill.identity_rejected — produced an event whose own identity carries <missing>), the Subscriber emits a notification.identity_rejected event mirroring MemoryIdentityRejectedPayload's shape AND logs at Error. It does NOT silently publish a malformed notification.* event with a <missing> identity, and it does NOT silently drop the input. This is the §13 fail-loudly contract applied at the boundary of the new subsystem; the integration test TestE2E_NotificationsTopic_MissingIdentityFailsLoudly pins the behaviour.

Mapper-error fail-loudly path. When Map returns a non-nil error (always wrapped ErrUnmappable — the trigger event was structurally invalid, e.g. the payload type doesn't match the declared event type), the Subscriber logs at Error AND emits a runtime.error event for audit observability. No notification.* event is synthesised for that trigger. This is the §13 fail-loudly contract applied to upstream invariant violations.

Acceptance:

  • The five V1 notification classes register from init(); events.IsValidEventType returns true for each; V1NotificationClasses() returns them in deterministic order; V1TriggerEventTypes() returns the five trigger types.
  • Map(ctx, ev) returns (synth, nil) for any V1 trigger event with the correct typed payload; (nil, nil) for any other event type; (nil, wrapped ErrUnmappable) for a V1 trigger with the wrong payload type.
  • Concurrent-reuse test (TestMap_ConcurrentReuse) runs N=100 concurrent Map calls against a single mapper instance under -race; every call returns the right output; baseline runtime.NumGoroutine() is restored.
  • The §13 binding round-trip test (TestSubscriber_TaskFailedSynthesisesNotificationTaskFailed) fires a deliberate task.failed, asserts a separately-scoped subscriber receives the synthesised notification.task_failed with the trigger's identity preserved + the correct severity + correlation back to the trigger's bus sequence.
  • The Subscriber's Run goroutine returns within 2s of ctx cancel; baseline goroutine count is restored after teardown.
  • NewSubscriber(nil, log) and NewSubscriber(bus, nil) panic loudly (no silent no-op consumer).
  • Integration suite (test/integration/notifications_topic_test.go) covers every V1 mapping round-trip with identity propagation, the missing-identity fail-loudly mode, and an N=20 concurrent-producers stress, all under -race.
  • scripts/smoke/phase-72d.sh runs the binding test + mapper unit tests + leak test + integration suite; OK ≥ 4, FAIL = 0.
  • All tests -race green; make vet clean; go build ./... clean.

Structural precedents. D-020 (audit redactor as the bus boundary) is the redaction contract NotificationPayload rides. D-025 (concurrent reuse contract) is the invariant the pure mapper trivially satisfies. D-028 (sealed EventPayload interface) is the seal NotificationPayload embeds. D-033 (the <missing> identity sentinel) is the convention the identity-rejection path mirrors. D-074 (durable event log) is the replay surface notifications inherit for free (per-class topic naming makes the index trivial). Brief 06 §1 (one bus, not two) is the rule the republish path honours. Brief 11 §CC-3 (notification topology) is the design source the wire-shape decision implements.

Out of scope (post-V1 follow-ups). Notification routing fan-out (email / Slack / web-push — lives in 73m Settings + Phase 72h's notifications_routing Console DB table); severity escalation policy; snooze / dismiss / mute-this-trigger user actions (Console DB only — D-061); anomaly detection (would consume events.aggregate from 72a and re-emit synthetic notifications); per-instance notification de-duplication. None of these is blocked by the V1 shape; each can land additively without breaking the per-class topic contract.


D-110 — Phase 72e pause.list snapshot: paginated identity-scoped projection of the unified pause/resume Coordinator; D-079 closed-scope reuse; D-026 heavy-content bypass

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (MethodPauseList = "pause.list" constant + the canonicalMethods map entry + the new IsPauseMethod predicate + pauseMethods set — IsControlMethod returns false for pause.list, keeping the Phase 54 steering-control nine exclusive); internal/protocol/types/pause.go (the wire shapes PauseSnapshot / PauseFilter / PauseListRequest / PauseListResponse / PauseSnapshotState / PauseArtifactRef + the DefaultPauseListPageSize (50) / MaxPauseListPageSize (200) pagination bounds — single source per CLAUDE.md §8 + D-002); internal/protocol/singlesource/singlesource.go (the five new wire types + the pause.list method string registered in CanonicalWireTypes / CanonicalMethods so the Phase 58 checker stays in lockstep); internal/runtime/pauseresume/pauseresume.go (the Coordinator.List interface extension + the ListRequest / ListFilter / ListResponse runtime-internal projection); internal/runtime/pauseresume/list.go (the List implementation — snapshots the in-memory registry under the mutex, filters/sorts/paginates lock-free, fails closed on identity / pagination / cross-tenant); internal/runtime/pauseresume/events.go (the pause.payload_artifact_routed event type + PausePayloadArtifactRoutedPayload); internal/runtime/pauseresume/errors.go (ErrInvalidPage + ErrCrossTenantScope); internal/protocol/transports/stream/pause_list_handler.go (the POST /v1/pause/list HTTP handler — identity at the edge, D-079 cross-tenant scope gate, D-026 heavy-content bypass per row); internal/protocol/transports/transports.go (the WithPauseList mux option + the route mount); cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (the production + fixture wiring — both mount the route so the test fixture never diverges from production, CLAUDE.md §17.6); the test files list_test.go / list_concurrent_test.go / pause_list_handler_test.go / test/integration/pause_list_test.go; scripts/smoke/phase-72e.sh; docs/plans/phase-72e-pause-list-snapshot.md; docs/plans/README.md (the 72e row); docs/glossary.md.

What: pause.list is a read-only, paginated, identity-scope-filtered snapshot of currently-paused runs, projected from the shipped Phase 50 Pause/Resume Coordinator's in-memory registry. It is the snapshot half of the Console intervention-queue contract — live deltas continue to flow through events.subscribe on the existing pause.requested / pause.resumed topics; no pause.list_delta topic is minted (that would be the §13 two-parallel-implementations smell). pause.list does NOT mutate the registry, does NOT call Resume, does NOT clear checkpoints — resume actions stay on the Phase 54 resume / approve / reject control methods.

Why these calls:

  1. The unified pause/resume primitive is never bypassed. pause.list reads the shipped Coordinator state through a new read-only List method on the same interface — it does not reinvent pause coordination (CLAUDE.md §7 rule 4, §13). The Coordinator's in-memory registry IS the index (brief 05 — the runtime owns the index; the Protocol exposes a paginated method, never a client-side filter over a full dump).

  2. D-079 closed two-scope set, reused — no new scope minted. A cross-tenant filter (a TenantIDs value outside the caller's own tenant, or len>1) requires the verified auth.ScopeAdmin claim; the reject path returns CodeIdentityScopeRequired (HTTP 403). This mirrors the events.subscribe (D-105) / events.aggregate (D-106) cross-tenant posture exactly — one scope-claim story, not two. No pause.crosstenant scope.

  3. D-026 heavy-content bypass applied to Protocol read snapshots. A pause-record Payload whose json.Marshal-ed byte length meets or exceeds the configured HeavyOutputThresholdBytes is routed through the ArtifactStore; the snapshot row ships a PayloadRef (a flat PauseArtifactRef wire type) and the inline Payload is left nil. The runtime emits a pause.payload_artifact_routed observation event so the bypass is loud, never a silent truncation — the context-window safety-net principle applied to Protocol read snapshots, not just LLM prompts.

  4. Identity-mandatory + fail-loudly pagination. A request with an incomplete (tenant, user, session) triple is rejected 401 (CodeIdentityRequired); a PageSize of 0 defaults to 50, but a negative PageSize, a PageSize > 200, or a negative Page is rejected 400 (CodeInvalidRequest) — NEVER silently clamped, since a silent clamp would defeat the per-row identity boundary the integration test asserts.

Deviations from the phase plan (§4.3, documented):

  • Flat PauseArtifactRef wire type, not *artifacts.Ref. The phase plan's Public API surface sketch shows PayloadRef *artifacts.Ref. The established Protocol convention (RFC §5.1 / CLAUDE.md §13 single-source rule) — already followed by Phase 72c's SearchResultRow.Ref — is a flat wire type that never re-exports a runtime Go struct. pause.list follows that precedent: PauseArtifactRef is a flat subset mirror of artifacts.ArtifactRef. (internal/artifacts exports ArtifactRef, not Ref, anyway.)

  • No checkpoint-store enumeration fallback. The phase plan's acceptance text says List "falls back to checkpoint store enumeration when configured." The state.StateStore interface (Phase 07) is a key-value store — Load / Save / Delete by key, no enumeration method. pause.list therefore projects the in-memory registry only; a resumed record that aged out of the registry, or a pause from before a process restart, is signalled by the Truncated flag on a status=resumed query (operators inspecting historical resume activity use events.subscribe on pause.resumed). This matches the Phase 50 design ("pauses survive Runtime restart only when StateStore-backed checkpoint is configured" — and that survival is per-token Status rehydration, not enumeration).

  • Conformance-suite scenario deferred to the Phase 80 surface extension. pause.list routes through its own HTTP handler, not the REST ControlSurface, so the Phase 62 conformance suite's method-matrix happy-path / malformed-request runners t.Skip it with an explicit reason — the identical posture the suite already takes for the Phase 72c search.* cluster. pause.list is exercised end-to-end by pause_list_handler_test.go + test/integration/pause_list_test.go (the §13 primitive-with-consumer binding test). The conformance-suite exhaustiveness count moves 24 → 25.

§13 primitive-with-consumer compliance. The pause.list wire surface lands with its first consumer in the same PR — test/integration/pause_list_test.go exercises the method end-to-end at the wire boundary (two-tenant scope, the cross-tenant reject without the admin claim, the admin-claim accept path, the D-026 heavy-payload bypass with a bus assertion). The Overview-page intervention queue (Phase 73a, Stage 2) is the UI consumer; it lands in the next stage and does not substitute for the in-PR binding test.

Structural precedents. D-002 (single-source wire types). D-025 (concurrent-reuse contract — the List path snapshots under the mutex then works lock-free; list_concurrent_test.go pins N=128). D-026 (context-window safety net — the heavy-content bypass). D-067 (Phase 50 Coordinator + no second persistence seam). D-079 (the closed two-scope set). D-105 / D-106 / D-108 (the Wave 13 cross-tenant scope-gate posture pause.list mirrors). Brief 05 (runtime-side high-cardinality reads — runtime owns the index, Protocol paginates). Brief 11 §LR-4 / §CC-2 (the intervention sub-panel + identity-aware UI).


D-111 — Phase 72f runtime-posture surface: five read-only runtime.* / metrics.* Protocol methods on a sibling PostureSurface; CapRuntimePosture advertised; cross-tenant gated on the D-079 closed scope set

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: RFC §5.3 (Protocol versioning) + §6.15 (runtime observability) + §7 (Console layer), docs/plans/phase-72f-runtime-posture.md, docs/plans/wave-13-decomposition.md §4 row 72f, CLAUDE.md §6 (multi-isolation) + §8 (Protocol single-source) + §13 (primitive-with-consumer + Console-never-reads-internals), internal/protocol/methods/methods.go (the five MethodRuntimeInfo / MethodRuntimeHealth / MethodRuntimeCounters / MethodRuntimeDrivers / MethodMetricsSnapshot constants + canonicalPostureMethods + IsPostureMethod + the IsControlMethod exclusion), internal/protocol/types/posture.go (the twelve posture wire structs — RuntimeInfoRequest + RuntimeInfo + SubsystemHealth + RuntimeHealth + RuntimeCounters + SubsystemDriver + RuntimeDrivers + NamedCounter + HistogramBucket + NamedHistogram + NamedGauge + MetricsSnapshot), internal/protocol/types/version.go (the new CapRuntimePosture capability + its canonicalCapabilities entry), internal/protocol/singlesource/singlesource.go (the CanonicalMethods + CanonicalWireTypes lockstep extensions), internal/protocol/posture.go (the transport-agnostic PostureSurface + PostureDeps + NewPostureSurface + the five handlers), internal/protocol/transports/control/control.go (the WithPostureSurface option + the IsPostureMethod-routing branch in ServeHTTP) + internal/protocol/transports/control/posture_handler.go (the REST decoder + identity backfill + Protocol-error wire mapper), internal/protocol/transports/transports.go (the transports.WithPostureSurface mux option), internal/protocol/conformance/conformance.go (method-matrix count 17 to 22 + the IsPostureMethod skip branches), test/integration/runtime_posture_test.go (the §13 same-PR consumer + §17.1 integration test), scripts/smoke/phase-72f.sh, docs/glossary.md (the eight posture entries landed via the wave-13 plan-doc PR), brief 11 §"Settings view" / §"footer counters" / §CC-1, brief 12 §"two-surface model", brief 06 §1 (one bus, decoupling rule).

Decision. Three shape calls land here.

1. PostureSurface is a SIBLING of ControlSurface, not an extension. The five posture methods are READ methods — they project the live Runtime's posture, they mutate nothing. The Phase 54 ControlSurface is a CONTROL surface; threading the build / health / counters / drivers / metrics seams through NewControlSurface would balloon its dependency set. The posture surface ships as its own type with its own Dispatch(ctx, method, req) entry point. The transport adapter (internal/protocol/transports/control) dispatches over the union via methods.IsValidMethod then IsControlMethod vs IsSearchMethod vs IsPostureMethod — the same per-surface branching the Phase 72c search cluster introduced. Each surface's dependency set stays narrow and reviewable, in line with §4.4's "no optional-capability ceremony."

2. The five methods carry a flat, Protocol-owned wire shape — never an internal Runtime Go re-export. RuntimeInfo / RuntimeHealth / RuntimeCounters / RuntimeDrivers / MetricsSnapshot are flat structs in internal/protocol/types/posture.go. In particular MetricsSnapshot is a Protocol-shaped PROJECTION over the Phase 56 telemetry.MetricsRegistry — flat counters / histograms / gauges as plain numbers — NOT a re-export of the OpenTelemetry SDK's metric types. internal/protocol/types/posture.go imports no go.opentelemetry.io/otel package; the Phase 72f static smoke guard pins it. This honours RFC §5.1's reject-on-sight smell ("a Protocol method that maps 1:1 onto an internal Go function signature") and brief 06 §1's "the Console NEVER reads runtime internals."

3. Identity-mandatory at the edge; cross-tenant reads gated on the D-079 closed two-scope set. Every posture handler fails closed with CodeIdentityRequired on an incomplete triple. A cross-tenant query — the request's Identity.Tenant differing from the caller's ctx-verified tenant — requires auth.ScopeAdmin OR auth.ScopeConsoleFleet (the D-079 closed set; NO new runtime.posture / posture.crosstenant scope is minted) and is otherwise rejected CodeScopeMismatch (HTTP 403). When no auth middleware ran (the Phase 60 trust-based posture, no identity on ctx) the cross-tenant gate is a no-op and the body identity is authoritative — the same posture every other Protocol surface holds. No new Protocol error code: the surface reuses CodeUnknownMethod / CodeInvalidRequest / CodeIdentityRequired / CodeScopeMismatch.

§13 primitive-with-consumer compliance. The Stage-2 page consumers (the Overview counter cards via runtime.counters, the Settings Runtime Info card via runtime.info + runtime.drivers) land as 73a / 73m. Phase 72f's same-PR consumer is test/integration/runtime_posture_test.go — it boots a real assembled Runtime via harbortest/devstack.Assemble (real inmem events / state / tasks / artifacts drivers + a real ES256 auth keypair), constructs a real PostureSurface, mounts it through the real transports.NewMux with WithPostureSurface, and probes every posture method end-to-end with identity propagation + the cross-tenant rejection/admission failure mode + an N=16 concurrency stress. The primitive is exercised end-to-end before any UI page ships.

Versioning. CapRuntimePosture is a backward-compatible capability addition (RFC §5.3 minor-class change) — the ProtocolVersion pin is NOT bumped; a Protocol client negotiates the surface via VersionHandshake.Accepts(CapRuntimePosture).

Why. Closes the 72f acceptance criteria + the wave-13 decomposition §4 row 72f. Lands the Protocol surface the Console's Overview counter cards and Settings Runtime Info card read (brief 11 §"Settings view" / §"footer counters" / §CC-1). A third-party Console implementation gets the same posture surface out of the box (D-061 + D-091) because the methods live in the runtime, not the Console.

Structural precedents. D-072 (Protocol single-source foundation) is the discipline the new methods / types / capability honour. D-077 (Phase 59 versioning discipline) is the capability-negotiation mechanism CapRuntimePosture plugs into. D-078 (Phase 60 wire transport) is the REST route table the posture methods extend. D-079 (the closed two-scope set) is the cross-tenant gate — no new scope. D-080 (Phase 62 conformance) is the matrix the method count extends. D-094 (harbortest/devstack) is the assembled-runtime fixture the integration test consumes. D-108 (Phase 72c search cluster) is the sibling-surface pattern this phase mirrors. D-025 (concurrent-reuse contract) is the invariant PostureSurface satisfies — posture_concurrent_test.go pins N≥150.

Acceptance. Five method constants registered + IsPostureMethod predicate; twelve posture wire structs round-trip through JSON; CanonicalWireTypes lockstep stays green; CapRuntimePosture registered + advertised in CurrentHandshake(); PostureSurface fails closed on incomplete identity + cross-tenant without admin; the wire transport route table grows the five methods; the integration test exercises all five end-to-end with real drivers; posture_concurrent_test.go pins N≥150 under -race; scripts/smoke/phase-72f.sh OK ≥ acceptance count, FAIL = 0; make vet test lint build clean; go test -race ./... green.

Out of scope (post-V1 follow-ups). A runtime.info mutation surface (rename runtime, change region — a control method, not a read method); synthetic deep-health checks (per-subsystem latency probes, dependency-reachability synth tests — V1 reports structural readiness only); the governance.posture / llm.posture tier rollups (Phase 72g's scope); high-cardinality metric labels in metrics.snapshot (the Phase 56 cardinality firewall + a projection-boundary re-check keep the labels low-cardinality).


D-112 — Phase 72g governance.posture + llm.posture: read-only posture Protocol methods folded onto the Phase 72f PostureSurface dispatcher + D-089 mock-mode capture path

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (the MethodGovernancePosture / MethodLLMPosture constants + their registration in canonicalMethods + canonicalPostureMethods + the IsPostureMethod predicate now covering all seven posture methods); internal/protocol/types/governance.go (GovernancePostureRequest / GovernancePostureResponse / IdentityTierView / RateLimitView); internal/protocol/types/llm.go (LLMPostureRequest / LLMPostureResponse); internal/protocol/posture.go (the two new seams on PostureDeps + PostureSurfaceGovernance / LLM / Redactor / Bus — plus the handleGovernancePosture / handleLLMPosture Dispatch branches); internal/governance/posture.go (PostureProvider + Snapshot — the deep-copied read accessor over governance.Config); internal/governance/events.go (EventTypePostureReadAdmin + PostureReadAdminPayload); internal/llm/posture.go (PostureProvider + PostureSnapshot + RegisterMockModeCaptured — the boot-time mock-flag capture); internal/llm/events.go (EventTypePostureReadAdmin + PostureReadAdminPayload); cmd/harbor/devmock.go (the reciprocal llm.RegisterMockModeCaptured call at the banner-emit call site); cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (the single NewPostureSurface call site per binary — wires the 72f runtime seams AND the 72g governance / llm / redactor / bus seams into one PostureDeps); internal/protocol/singlesource/singlesource.go (the CanonicalMethods + CanonicalWireTypes lockstep entries); scripts/smoke/phase-72g.sh (the live smoke); docs/glossary.md (the governance.posture / llm.posture / GovernancePosture / LLMPosture entries); docs/plans/phase-72g-governance-llm-posture.md (the binding phase plan); docs/plans/README.md + docs/plans/wave-13-decomposition.md (72g row Shipped — D-112).

Decision. Three intertwined shape calls land here.

1. Posture is a third Protocol-method class — not a control method, not a search method — and governance.posture / llm.posture EXTEND the one Phase 72f PostureSurface, never a parallel dispatcher. Phase 54 shipped the task-control surface; Phase 72c added the search.* cluster; Phase 72f added the five runtime.* / metrics.* reads on a PostureSurface. Phase 72g adds two *.posture methods that are read-only projections of runtime configuration. They are NOT steering controls (IsControlMethod returns false) and route through the SAME PostureSurface dispatcher Phase 72f shipped — there is exactly one PostureSurface type and one NewPostureSurface call site per binary (§13 "no two parallel implementations"). The IsPostureMethod predicate now covers all seven posture methods. All seven share the one read-only *types.RuntimeInfoRequest envelope — the governance / llm reads are also identity-only, so reusing the envelope avoids threading two near-identical wire types (the same economy the search cluster took with SearchRequest). The standalone GovernancePostureRequest / LLMPostureRequest wire types remain declared in internal/protocol/types for documentation + singlesource completeness, but Dispatch accepts the shared RuntimeInfoRequest.

2. governance.posture projects governance.Config, not the enforcement Subsystem. The governance.Subsystem interface is the enforcement seam (PreCall / PostCall); its three concrete enforcers each hold only the slice of Config their policy needs. The posture surface wants the WHOLE configured IdentityTiers shape, so its source of truth is the governance.Config value the binary built at boot. PostureProvider wraps that Config and exposes a Posture(ctx) accessor returning a deep-copied Snapshot — a caller mutating the returned map cannot reach back into the provider (D-025). The internal TierConfig shape is PROJECTED onto the wire IdentityTierView — never re-exported — so a future change to the internal struct cannot silently reshape the Protocol surface (single-source). The wire RateLimitView carries RefillIntervalMS int64 rather than a Go time.Duration (which marshals as a raw nanosecond integer a non-Go client cannot interpret).

3. llm.posture MockMode is captured ONCE at boot — never re-read at request time. D-089's dev-only mock escape hatch (HARBOR_DEV_ALLOW_MOCK=1) is a boot-time decision. llm.RegisterMockModeCaptured(bool) records the flag into a package-level atomic at the SAME call site in cmd/harbor/devmock.go::registerMockIfDevAllowMock that prints the [DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION] stderr banner. The posture handler reads the captured atomic; it NEVER calls os.Getenv at request time. This keeps the banner and the LLMPosture.MockMode flag structurally reciprocal — a future PR that re-routes the dev-hatch path cannot desync one from the other without touching that one function. The package-level atomic is the §5-rule-5-permitted "write-once-at-boot" shape (the same posture as the driver registry).

Cross-tenant gating. Both methods are identity-mandatory (RFC §5.5). A request whose body Tenant differs from the caller's ctx-verified tenant is a cross-tenant read and requires auth.ScopeAdmin OR auth.ScopeConsoleFleet — the D-079 closed two-scope set, NO new posture-specific scope. The reject path returns CodeScopeMismatch (HTTP 403) — the same code the Phase 72f runtime reads use, folding the governance / llm reads onto the one cross-tenant gate. An accepted cross-tenant read emits a governance.posture_read_admin / llm.posture_read_admin audit event through the wired Redactor + Bus; an own-tenant read does NOT emit audit (matches the sessions.inspect convention). The five runtime.* / metrics.* reads never emit audit.

Request-envelope decision. The Phase 72f Dispatch asserts req.(*types.RuntimeInfoRequest) — a plain identity envelope shared by all five runtime methods. Phase 72g reuses that same envelope for the two governance / llm methods rather than branching the req assertion per-method: the governance / llm reads are also identity-only, the body's optional tenant_id is carried by RuntimeInfoRequest.Identity.Tenant, and one envelope keeps the transport adapter (servePosture) generic over all seven methods with zero per-method decode branches.

Read-only. No mutation method ships. Operators change governance ceilings or the LLM provider by editing harbor.yaml and restarting (RFC §6.15 "Hot-reloadable fields" carve-out + RFC §10 default). Post-V1 admin methods (governance.rotate_key, governance.swap_model) are separate phases.

§13 primitive-with-consumer compliance. The Stage-1 in-PR consumer is test/integration/phase72g_posture_test.go — it boots the real governance + llm posture providers, the real audit redactor, the real inmem bus, the real Phase 60 transport, and the real Phase 61 ES256 auth validator, and exercises the payload shapes end-to-end across two boot modes (production-shaped + HARBOR_DEV_ALLOW_MOCK=1) plus the cross-tenant + missing-identity rejection paths. The UI consumer (73m Settings Governance + LLM-Provider Posture cards) lands in Wave 13 Stage 2.

Deviation from the phase plan (§4.3). The plan named internal/protocol/transports/stream/posture_handler.go as the handler file. The posture methods route through POST /v1/control/{method} (the control transport), not the SSE stream transport — so the handler lives at internal/protocol/transports/control/posture_handler.go. The plan named internal/protocol/posture.go as a NEW standalone surface; Phase 72f landed first and shipped that file as the runtime-posture PostureSurface, so 72g EXTENDS the merged 72f surface (two new PostureDeps seams + two Dispatch branches) rather than redefining it — the §13 "no two parallel implementations" rule. The plan's llm.Registry.Posture becomes llm.PostureProvider.Posture because the llm package has no Registry type. All are like-for-like swaps that satisfy every acceptance criterion.

Why. Closes Wave 13 §4's 72g row + RFC §6.15 (governance posture) + RFC §7 (Console Settings page is a Protocol client). A third-party Console gets the same posture surface out of the box (D-061 + D-091) because the wire types live in internal/protocol/types, not a Console-private struct.

Structural precedents. D-111 (the Phase 72f PostureSurface this phase extends — one surface, not two). D-079 (closed two-scope set — admin + console:fleet, no new scopes). D-081 (the IdentityTiers shape the governance posture projects). D-089 (the LLM-default flip + dev-only mock escape hatch — the MockMode capture path). D-072 (the methods / errors / types single-source layout). D-082 (the conformance exhaustiveness lint over methods.Methods()). D-108 (the Phase 72c search cluster — the non-control-method routing posture). D-025 (concurrent reuse contract). D-020 (audit redactor as the bus boundary).

Out of scope (post-V1 follow-ups). Posture mutation methods (governance.set_ceiling, governance.rotate_key, governance.swap_model); a per-model llm.models.list registry projection; per-tenant LLM routing (V1 ships a single provider per Harbor instance — D-088). None is blocked by the V1 shape; each can land additively.


D-113 — Phase 72h Console DB local schema + SvelteKit scaffold: IndexedDB-backed eight-table Console-local datastore + the web/console/ scaffold introduction

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: the new web/console/ directory tree — the SvelteKit scaffold (package.json pinning svelte ^5.0.0 + @sveltejs/adapter-static, tsconfig.json, svelte.config.js with compilerOptions: { runes: true }, vite.config.ts with the Vitest/jsdom block, eslint.config.js, .stylelintrc.cjs, src/app.html, src/lib/tokens.css, src/lib/protocol.ts generated stub, src/routes/+layout.{svelte,ts}, src/routes/+page.svelte); the Console DB module web/console/src/lib/db/ (index.ts factory + re-exports, schema.ts the eight-table shapes + validateRow + operatorIdOf, migrations.ts forward-only migration list, crypto.ts the WebCrypto AES-GCM/PBKDF2 envelope, driver.ts the ConsoleDB interface, errors.ts, drivers/indexeddb.ts the default V1 driver, tests/*.spec.ts the Vitest suite); scripts/smoke/phase-72h.sh (static-only smoke upgraded from skeleton); docs/plans/phase-72h-console-db-schema.md (the binding plan); docs/plans/wave-13-decomposition.md (72h row annotated Shipped — D-113); docs/plans/README.md + README.md (status rows); docs/glossary.md (the eight table entries + "Encrypted-at-rest auth profile" pre-landed via the wave-13 plan-doc PR — no new entries needed here).

Decision. Three shape calls land here so a future PR cannot quietly retrofit them.

1. Phase 72h introduces web/console/ and owns the SvelteKit scaffold (the A5 audit fix). The Wave 13 decomposition decomposed the Console wave into ~26 phases; every Stage-2 page phase needs the SvelteKit scaffold (package.json, svelte.config.js, vite.config.ts, tokens.css, .stylelintrc.cjs, protocol.ts, the +layout.svelte shell) in place. Rather than make those scaffold files a serial dependency among parallel Stage-2 phases, 72h — the Stage-1 phase that already creates web/console/src/lib/db/ — ships the scaffold as infrastructure. The dependency set in package.json is the union of what 72h's lib/db/ needs (TypeScript + Vitest + jsdom + fake-indexeddb) and what every downstream Stage-2 page needs (SvelteKit + Svelte 5 + Skeleton + Stylelint + ESLint). 72h itself ships no Stage-2 page routes and uses no Skeleton components — its own deliverable stays module-level (TypeScript + Vitest against lib/db/). The scaffold honours CLAUDE.md §4.5: Svelte 5 runes mode (D-092), npm only with a committed package-lock.json, web/console/build + node_modules + .svelte-kit gitignored, the design-token surface centralised in tokens.css with .stylelintrc.cjs rejecting raw color/spacing literals, and protocol.ts committed as a generated stub carrying the // CODE GENERATED BY cmd/harbor-gen-protocol-ts. DO NOT EDIT. header (D-093). web/console/ is a new top-level directory; CLAUDE.md §3 already anticipates it ("If it later monorepos into web/console/, the binding rules in §4.5 still apply") so no RFC change is needed.

2. The Console DB is an IndexedDB-backed eight-table datastore behind a ConsoleDB driver interface — Console-local state ONLY (D-061). The eight V1 tables — saved_filters, saved_views, profiles, runtime_registry, auth_profiles, pat_store, notifications_routing, keybindings — each hold Console-local state: the operator's saved filter chips, dashboard layouts, UI preferences, runtime address book, encrypted auth blobs, notification routing matrix, keybinding overrides. NONE mirrors a runtime entity (agents, sessions, tasks, tools, events, artifacts); those flow exclusively through the Protocol. The §13 / D-061 carve-out is mechanically enforced: schema.ts exports a FORBIDDEN_TABLE_NAMES list (the runtime-entity names), tests/schema-carveout.spec.ts fails the build if any forbidden name appears in the TABLE_NAMES registry, and scripts/smoke/phase-72h.sh re-scans the TABLE_NAMES block as defence in depth. The scan deliberately targets the TABLE_NAMES registry, not the whole file — LIST_PAGES legitimately contains page-enum values like 'agents' because the Console renders an Agents page from Protocol data; a page name is not a table. The driver is interface-first (§4.4): driver.ts declares ConsoleDB + TableScope<T>; drivers/indexeddb.ts is the V1 default; index.ts holds the factory + a write-once driver registry that dispatches by name and fails loud with ErrUnknownDriver (listing registered drivers) on a miss. V1 registers only "indexeddb"; the seam is ready for a post-V1 "server" driver without reshaping callers (every method takes operatorID first and returns Promise<...>, so an HTTP-backed driver fits the same shape).

3. Per-operator scoping is structural, and auth blobs are encrypted at rest with a fail-loud envelope. Every Console DB row carries operator_id = base64url(sha256(tenant_id || ':' || user_id)), keyed off the active Protocol identity. In the IndexedDB driver this is not an application-layer filter — each object store uses the compound key [operator_id, id], so an operator-A-scoped query (IDBKeyRange.bound([opA], [opA, []])) cannot reach operator B's rows by construction. upsert additionally rejects a row whose own operator_id does not match the write scope (ErrMissingOperator), and every method rejects an empty operatorID (no silent default — CLAUDE.md §5 fail-loud, the cross-operator-isolation analogue of the §6 multi-isolation contract). auth_profiles.encrypted_jwt_blob and pat_store.encrypted_token_blob are AES-GCM ciphertext: the KEK is PBKDF2-derived (≥100k iterations, SHA-256, 16-byte per-operator salt persisted on profiles.kdf_salt) from a passphrase the operator enters at first runtime-attach; the envelope is IV (12 bytes) || ciphertext+authTag. crypto.ts::decrypt raises ErrAuthDecryption loudly on a wrong key / corrupt blob — never a silent null. The Settings page (73m) MUST distinguish ErrAuthDecryption (re-enter passphrase / re-attach runtime) from a token-expired condition; treating it as "auth missing → redirect to login" would strand the operator's encrypted blobs.

§13 primitive-with-consumer compliance. Phase 72h ships a primitive (the Console DB schema + driver). Its first UI consumer is Phase 73f Tools' saved-filter chips, landing in Wave 13 Stage 2. To satisfy §13 within this PR, 72h ships the in-package integration test tests/integration.spec.ts — it opens the real IndexedDB driver against fake-indexeddb, runs migrations from empty, writes and reads back one row in each of the eight tables, asserts cross-operator isolation against a shared database (operator B's rows never surface to an operator-A read), round-trips the encrypted auth-profile + PAT blobs through crypto.subtle, and covers ≥1 failure mode (wrong-key decryption raises loudly; missing operatorID raises loudly). This is the §17 in-package integration test for 72h: real driver, real crypto, identity propagation, failure modes. The 73f saved-filter-chip handler in Stage 2 is the first UI consumer; it builds directly on the savedFilters TableScope this PR ships.

Why. Closes Wave 13 §9 item 8 + §12 item 6 (Console DB schema as Stage 1, first consumer 73f) + the A5 audit fix (72h owns the SvelteKit scaffold). Lands Brief 11 §CC-1 (multi-runtime registry), §CC-3 (notification routing matrix), §CC-4/§CC-5 (saved filters + keybindings) and Brief 12's auth-storage threat model (WebCrypto AES-GCM + PBKDF2 envelope) as the Console-local persistence foundation every Stage-2 page builds on. A third-party Console implementation gets the same D-061 carve-out discipline because the schema enumerates exactly what each table holds and disclaims the runtime entities it does NOT mirror.

Protocol additions. None. The Console DB is a browser-local datastore; it adds zero HTTP routes, zero Protocol method names, zero wire types. Runtime entities reach the Console exclusively through the existing Protocol surface (Phase 60 transport + Phase 72/72a events). The operator_id row-scope key is derived from the (tenant, user) identity the Protocol session already carries.

Deviations from the plan. Two minor, both documented here. (a) The plan's "Files added or changed" lists tests/*.spec.ts generically; the implementation ships crypto.spec.ts, schema.spec.ts, schema-carveout.spec.ts, migrations.spec.ts, integration.spec.ts plus shared fixtures.ts / idb-helpers.ts / setup.ts helpers — the named four specs from the test plan plus schema.spec.ts and the helpers. (b) The plan names Dexie as a possible IndexedDB wrapper "pinned in plan"; the implementation uses the native IndexedDB API directly — the ConsoleDB interface is the abstraction seam, and a thin native wrapper is sufficient for the eight-table CRUD shape, so adding the Dexie dependency would only enlarge the Console dependency surface for no interface-level benefit. Both deviations satisfy every acceptance criterion.

Out of scope (downstream phases / post-V1). The harbor console subcommand that boots the Console (D-091; bundled into Phase 73m Settings); the notification delivery transports (email / webhook / web-push — 72h persists only the routing matrix); a Console-side server-backed driver (the seam exists; no second driver lands here); cross-operator sharing of Console-local state (post-V1, and if it ever lands it ships as a Protocol surface, never a Console-DB cross-row read); any Stage-2 page UI.

Structural precedents. D-061 (Console DB is Console-local state only, never a shadow source of truth for runtime entities) is the carve-out this schema implements and mechanically enforces. D-091 (harbor console deployment posture + browser-local encrypted auth storage) is the deployment context the runtime_registry + auth_profiles + pat_store tables serve. D-092 (Svelte 5 runes mode) and D-093 (generated protocol.ts) are the scaffold conventions 72h's web/console/ honours. The §4.4 interface + factory + driver-registry pattern is the shape ConsoleDB + drivers/indexeddb.ts + the index.ts registry follow.


D-114 — Phase 74 Console topology projection: dual-surface (topology.snapshot method + topology.changed event) over an engine-scoped TopologyProjection; identity-mandatory + admin-cross-tenant gating

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/types/topology.go (TopologyProjection + TopologyNode + TopologyNodeKind + TopologyEdge + TopologySnapshotRequest + the three NodeKind* constants + SortDeterministic); internal/protocol/methods/methods.go (MethodTopologySnapshot + the canonicalTopologyMethods set + the IsTopologyMethod predicate); internal/protocol/singlesource/singlesource.go (CanonicalMethods + CanonicalWireTypes extended in lockstep); internal/events/events.go (EventTypeTopologyChanged registered from init() + TopologyChangedPayload — a SafePayload per D-028); internal/runtime/engine/topology.go (the pure buildProjection builder, the engine.Topology(ctx) accessor, the construction-time publishTopologyChanged emit, newEngineID); internal/runtime/engine/engine.go (Engine.Topology on the interface + the engineID field + the New construction-time emit); internal/runtime/engine/options.go (WithEventBus option); internal/protocol/protocol.go (TopologyAccessor interface, ScopeChecker type, WithTopologyAccessor / WithScopeChecker / WithEventBus options); internal/protocol/control.go (dispatchTopology + emitAdminScopeUsed); internal/protocol/errors.go (mapTopologyError); internal/protocol/transports/control/control.go (the decodeRequest topology branch + the assertBodyMatchesAuthedIdentity topology case); internal/protocol/conformance/conformance.go (the method matrix extended to include topology.snapshot); harbortest/devstack/devstack.go (AssembleOpts.TopologyAccessor + AssembleOpts.ScopeChecker); cmd/harbor/cmd_dev.go (the documented engine-less posture); the unit / concurrent / integration tests (topology_test.go per package, topology_concurrent_test.go, topology_emit_test.go, test/integration/phase74_topology_test.go); scripts/smoke/phase-74.sh; docs/glossary.md (the three topology entries re-anchored to D-114); docs/plans/phase-74-console-topology.md (the binding phase plan).

Decision. Three shape calls land here.

1. Dual-surface posture — a topology.snapshot method AND a topology.changed event, both over the same TopologyProjection wire type. The wave-13 decomposition left open whether topology is event-only or also a request method. Phase 74 ships BOTH because they cover disjoint consumer needs: the event covers in-flight updates (a consumer already subscribed sees every adjacency change) and the snapshot covers cold-start (a fresh consumer draws the canvas immediately, without waiting for the next edge change). An event-only surface would force every consumer to block on an edge change to render anything. The two surfaces carry the byte-identical TopologyProjection shape (deterministic sort makes them byte-stable), so a consumer composes them without a shape translation.

2. Engine-scoped projection, not run-scoped. The TopologyProjection carries the static node graph + live per-edge channel depth — both stable across runs of the same engine. Per-run overlays (node status, latency, selection state) are NOT in the projection; the consumer composes those from the existing event taxonomy (tool.invoked, task.spawned, pause.requested, …) exactly as the Phase 70 CLI already proves viable (D-102). Splitting the projection this way keeps the snapshot byte-stable and keeps the wire type small. Brief 11 §LR-1's "per-run vs per-session" open question resolves as engine-scoped projection + per-run overlay from existing events.

3. Identity-mandatory + admin-cross-tenant gating. engine.Topology(ctx) rejects an unscoped ctx with ErrIdentityRequired (CLAUDE.md §6 rule 9). The topology.snapshot Protocol method rejects an incomplete identity triple with CodeIdentityRequired; a cross-tenant call (caller's tenant ≠ the engine's tenant) requires the verified auth.ScopeAdmin claim per D-079 — NO new topology.crosstenant scope is minted (the closed two-scope set holds). The cross-tenant rejection uses CodeAuthRejected; the granted admin path publishes audit.admin_scope_used (RFC §6.13). An audit-emit failure on the admin path fails the read closed — an un-auditable admin read is rejected, never silently granted (CLAUDE.md §5).

Deviations from the phase plan. (a) The plan's pre-assigned decision number was D-106; that number was taken by a parallel Wave 13 phase, so this PR uses D-114 per the wave-cadence collision-free assignment. The glossary's three topology entries (landed earlier by the wave-13 plan-doc PR referencing D-106) are re-anchored to D-114 in this PR. (b) The plan specified NewControlSurface(taskRegistry, steeringRegistry, topology, opts...) — a positional accessor argument. This PR wires the accessor via the WithTopologyAccessor functional option instead: it satisfies every acceptance criterion, keeps the Phase 54 two-argument NewControlSurface signature stable (zero ripple through the existing callers), and matches the option-shaped extensibility the surface's Option type already documents. (c) The plan named CodeMethodNotSupported for the nil-accessor (engine-less Runtime) path; no such code exists in internal/protocol/errors, so the nil-accessor path returns CodeUnknownMethod — the route effectively does not exist on an engine-less Runtime, and CodeUnknownMethod maps to HTTP 404 which the smoke's 404 → SKIP convention picks up. (d) harbor dev's runtime is planner/RunLoop-shaped and hosts no engine.Engine node-graph, so its ControlSurface is built without a topology accessor — topology.snapshot returns CodeUnknownMethod there. This is the documented nil-safe posture, not a wiring gap (§17.6): the Phase 74 integration test wires a real engine through protocol.WithTopologyAccessor and exercises the surface end-to-end.

§13 primitive-with-consumer compliance. The topology.snapshot method + topology.changed event are primitives; their Stage-2 page consumers (the Live Runtime topology canvas — Phase 73b; the Playground trace toggle — Phase 73n) land in a later stage. test/integration/phase74_topology_test.go is the in-wave consumer that exercises both surfaces end-to-end: a real engine.Engine constructed with WithEventBus emits a real topology.changed onto a real in-mem bus that a real SSE-capable subscriber receives, and the same engine's projection round-trips through a real topology.snapshot RPC over the Phase 60 wire transport. The construction-time emit + the snapshot byte-stability + the cross-tenant gate (with + without admin) + the edge-delta are all asserted.

Why. Lands RFC §5.2's "topology" Protocol-surface row + RFC §6.13's observability surface. Closes the wave-13 decomposition's Phase 74 row. A third-party Console gets the topology surface out of the box (D-061 + D-091) because the projection is a canonical Protocol wire type, never a re-export of engine's private adjacency / channel internals (brief 06 "visualization couples to private state" — explicitly avoided).

Concurrent reuse (D-025). engine.Topology(ctx) is a pure read against set-once-at-New engine state (engineID + nodes + adjs + channels); internal/runtime/engine/topology_concurrent_test.go pins N=128 concurrent calls against one shared engine under -race with byte-stability + goroutine-baseline assertions. internal/protocol/concurrent_test.go extends the ControlSurface N≥100 stress to the topology.snapshot dispatch path.

Acceptance: see docs/plans/phase-74-console-topology.md — every criterion is covered by the unit / concurrent / integration tests + the phase-74 smoke. The Phase 70 CLI retrofit stays out of scope per D-102; a follow-up phase adds the topology.changed-preferred-source branch.


D-115 — Phase 75 Console e2e Playwright harness baseline: targets harbor console (not harbor dev); baseline-only scope; harness-vs-aggregator split (75 / 75a)

Date: 2026-05-19 Status: Settled (shipping with this PR)

Where it lives: web/console/playwright.config.ts (the single-source harness config — Chromium-only projects, deterministic workers: 1, list reporter, screenshot-on-failure; the webServer block is intentionally absent because the per-run Runtime + harbor console lifecycle lives in the fixture, not in a static config block); web/console/package.json (the three e2e npm scripts — test:e2e / test:e2e:install / test:e2e:ui — plus the @playwright/test devDependency pin); web/console/package-lock.json (committed lockfile so npm ci is reproducible); web/console/tests/fixtures/harbor-runtime.ts (the worker-scoped Playwright fixture — boots bin/harbor console --bind 127.0.0.1:0 on an ephemeral port per the D-104 pattern, reads the bound URL + dev token back from the HARBOR_*_BOUND= / HARBOR_DEV_TOKEN= stderr lines, tears the child process down with SIGTERM-then-SIGKILL; exposes the sync consoleSubcommandAvailable() probe so specs can gate their describe block at collection time); web/console/tests/fixtures/page.ts (the single import every per-page spec uses — extends Playwright's test with the runtime fixture + seedAuth + gotoPage helpers + the 14-page CONSOLE_PAGES IA list); web/console/tests/pages/base-page.ts (the page-object base class — typed selectors map + gotoSlug + waitForHydration); web/console/tests/helpers/protocol.ts (lazily loads the generated web/console/src/lib/protocol.ts typed client — D-093 — so the harness baseline type-checks before the Stage-1 SvelteKit scaffold lands); web/console/tests/helpers/identity.ts (the deterministic test isolation triple — makeTestIdentity / DEFAULT_TEST_IDENTITY); web/console/tests/harness.spec.ts (the meta-test — boots the fixture, asserts the index serves 200 + the SvelteKit app hydrates + a tokenless load does not 5xx; SKIPs cleanly when harbor console is absent); web/console/tests/README.md (the per-page-spec authoring guide); .github/workflows/ci.yml (the new frontend-e2e job — builds bin/harbor, installs Node + npm deps + Chromium, runs npm run test:e2e; skips gracefully when web/console/ is absent); .gitignore (Playwright output-dir ignores); scripts/smoke/phase-75.sh (the static-only smoke — was already a fully-authored skeleton from the wave-13 plan-doc PR; this phase makes its 16 assertions flip from SKIP to OK); docs/plans/phase-75-playwright-harness-baseline.md (the binding phase plan); docs/plans/README.md (Phase 75 row → Shipped, goal/Deps corrected, new 75a row + detail block); docs/glossary.md (the "Playwright harness" + "frontend-e2e CI job" entries — already landed via the wave-13 plan-doc PR, re-pointed from the placeholder D-105 to D-115).

Decision. Three intertwined calls land here. All three are documented so a future PR cannot quietly retrofit them.

1. The harness targets harbor console, NOT harbor dev — a correction to the original master-plan row. The Phase 75 master-plan row read "Playwright suite … runs against harbor dev." D-091 (settled in PR #138) pins that the Console static build is served exclusively by the harbor console subcommand — the Harbor Runtime ships headless, and harbor dev does NOT serve the Console. A harness that booted the dev-loop subcommand would be exercising a surface the Console never runs against in production. Brief 12 §"Why harbor console, not harbor dev, serves the Console" gives the three reasons (decoupling, multi-runtime, audience separation). The harness config + fixture therefore boot harbor console; the master-plan row's goal text + detail block are corrected in the same PR. This is a documented departure from the master-plan wording per CLAUDE.md §4.3 — the RFC (§7) and D-091 win over the stale row text.

2. Baseline-only scope — per-page specs land with their page phase; the wave-end aggregator is 75a. The Wave 13 decomposition (docs/plans/wave-13-decomposition.md §12 item 7, operator-locked) narrows Phase 75 to the harness infrastructure only: config, fixtures, page-object base, helpers, the meta-test, the CI hook. It ships ZERO page-specific assertions. Each of the 14 Console page phases (73a–73n) ships its own <slug>-page.spec.ts in the same PR as the page (the §13 primitive-with-consumer rule is satisfied trivially — the page IS the consumer). The wave-end aggregator suite that walks all 14 pages and asserts a matching spec exists for each is Phase 75a, bundled into the Stage-3 PR per CLAUDE.md §17.5. This split keeps the test infrastructure from waiting on every page to land first, and keeps each page phase honest (a page without a spec is a §13 rejection). The §13 primitive-with-consumer obligation for the harness itself is discharged by the meta-test (harness.spec.ts) — the harness's own self-test — plus the named same-wave first consumer, Phase 73a Overview's overview-page.spec.ts.

3. Graceful degradation when web/console/ and harbor console are absent — the directory-/subcommand-missing → SKIP pattern. Phase 75 lands in Wave 13 Stage 1 Batch B alongside Phase 72h, which owns the web/console/ SvelteKit scaffold (svelte.config.js runes mode, vite.config.ts, tokens.css, the generated protocol.ts). The harbor console subcommand itself lands later, in Phase 73m (Stage 2.3). The harness must therefore not break the build or the CI gate before either dependency exists. Four mechanisms enforce this: (a) the harness is pure TypeScript under web/console/tests/ — it does not touch the Go build; (b) scripts/smoke/phase-75.sh is static-only and SKIPs every assertion when web/console/ is absent, flipping to 16 OK once the scaffold + harness are present; (c) the frontend-e2e CI job detects web/console/package.json and skips its whole step chain when absent; (d) the meta-test gates its describe block on a synchronous consoleSubcommandAvailable() probe — Playwright instantiates the page fixture (launching the browser) BEFORE a test body runs, so a body-level test.skip() cannot prevent a browser launch on a runner with no browser; gating the describe block at collection time is the only correct skip. When web/console/ and harbor console both land, every SKIP flips to OK with no harness change.

Why. Closes the Phase 75 acceptance criteria + the Wave 13 §12 item 7 operator lock-in. Lands the test substrate Brief 11 §"Findings summary" pins ("every operator-facing flow shipped in a phase has a matching .spec.ts") without making the substrate wait on all 14 pages. A third-party Console implementation can adopt the same harness shape — it depends only on @playwright/test (external) and the generated protocol.ts (D-093), never on Console internals.

Protocol additions. None. The harness is build-time + test-time infrastructure; it consumes the existing Phase 60 Protocol surface via the generated protocol.ts client (D-093). Zero new HTTP routes, zero new Protocol method names, zero new wire types.

Decision-number lineage. The wave-13 plan-doc PR pre-wrote the "Playwright harness" + "frontend-e2e CI job" glossary entries and the phase plan against a placeholder D-105; D-105 was subsequently consumed by Phase 72 (the Console subscription protocol surface). This phase files the real decision as D-115 (the coordinator-assigned, collision-free number for the Batch B dispatch) and re-points the glossary entries + phase plan from D-105 to D-115 in the same PR.

Acceptance:

  • web/console/playwright.config.ts exists, declares a single Chromium projects entry, and does NOT reference the dev-loop subcommand string.
  • web/console/tests/ ships the fixtures (harbor-runtime.ts, page.ts), the page-object base (pages/base-page.ts), the helpers (protocol.ts, identity.ts), the meta-test (harness.spec.ts), and the authoring guide (README.md).
  • web/console/package.json declares test:e2e / test:e2e:install / test:e2e:ui and pins @playwright/test; package-lock.json is committed.
  • .github/workflows/ci.yml declares a frontend-e2e job that runs after go, builds bin/harbor, installs Node + npm deps + Chromium, runs npm run test:e2e, and skips gracefully when web/console/ is absent.
  • The meta-test (harness.spec.ts) lists 4 tests and SKIPs all 4 cleanly when harbor console is absent; once the subcommand lands it boots the fixture, asserts the index serves + hydrates, and exercises the tokenless-load failure mode.
  • scripts/smoke/phase-75.sh shows OK = 16, FAIL = 0 with web/console/ present; SKIPs cleanly when absent.
  • docs/plans/README.md Phase 75 row → Shipped, Deps → 60, 72, goal text corrected; new 75a row + detail block added.
  • docs/glossary.md "Playwright harness" + "frontend-e2e CI job" entries re-pointed from D-105 to D-115.
  • No .spec.ts (or any web/console/tests/ file) hand-rolls a raw browser HTTP call — all Runtime access goes through the typed Protocol client (CLAUDE.md §4.5 #11).

Structural precedents. D-091 (harbor console subcommand serves the Console) is the deployment posture the harness targets. D-092 (Svelte 5 runes mode) + D-093 (generated protocol.ts) are the scaffold contracts the harness consumes. D-094 (harbortest/devstack.Assemble) is the Runtime-assembly seam the fixture shells out to. D-104 (preflight ephemeral-port allocation) is the bound-port-from-stderr pattern the fixture reuses. Brief 11 §"Findings summary" (every operator-facing flow has a matching spec) is the rule the harness mechanises. Brief 12 §"Why harbor console, not harbor dev" is the design source for call #1.

Out of scope (Phase 75a / post-V1). Per-page Playwright specs (each lands with its 73a–73n page phase); the wave-end aggregator suite + the page-coverage check (Phase 75a); the Go-side test/integration/wave13_test.go (Phase 75a); visual-regression / screenshot golden compares (post-V1, Brief 11); the Firefox / WebKit browser matrix (post-V1 — the projects array is structured so adding a browser is a one-line change); Playwright trace-viewer recording in CI (Phase 75a wires the failed-only upload). None is blocked by the V1 harness shape; each lands additively.


D-116 — Phase 73f Console Tools page: seven tools.* Protocol methods + the Tools-page UI; admin methods gate on the D-079 closed scope set (no tools.admin scope)

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (+MethodToolsList / MethodToolsGet / MethodToolsDescribe / MethodToolsMetrics / MethodToolsContentStats / MethodToolsSetApprovalPolicy / MethodToolsRevokeOAuth + the IsToolsMethod / IsToolsAdminMethod predicates — IsControlMethod now excludes the tools cluster); internal/protocol/types/tools.go (the seventeen Tools wire types — Tool, ToolFilter, ToolListRequest / ToolListResponse, ToolAggregates, ToolGetRequest, ToolDescribeRequest, ToolManifest, ToolMetricsRequest / ToolMetrics, ToolContentStatsRequest / ToolContentStats, ToolContentBucket, ToolSetApprovalPolicyRequest / ToolSetApprovalPolicyResponse, ToolRevokeOAuthRequest / ToolRevokeOAuthResponse — single source per D-002); internal/protocol/singlesource/singlesource.go (the seven method strings + seventeen wire-type homes registered in CanonicalMethods / CanonicalWireTypes); internal/tools/protocol/ (the new package — protocol.go the seven-method Service, filter.go the facet predicate + aggregate fold, catalog_projector.go the V1 CatalogProjector over a tools.ToolCatalog + the optional Annotator seam, events.go the ToolsAdminActionPayload + audit.admin_scope_used emit); internal/protocol/transports/stream/tools_handler.go (the POST /v1/tools/{method} wire handler — identity at the edge, admin-scope gate, error classification); internal/protocol/transports/transports.go (the WithToolsService mux option); cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (both production + devstack boot paths wire the Tools service over the live tool catalog — CLAUDE.md §17.6); web/console/src/routes/tools/+page.svelte + web/console/src/lib/components/tools/ (the seven page components) + web/console/src/lib/protocol/tools.ts (the typed ToolsClient) + web/console/src/lib/protocol/session.ts + web/console/src/lib/tools/export.ts + web/console/src/lib/db/saved_filters_tools.ts (the typed wrapper over Phase 72h's saved_filters table); web/console/tests/tools-page.spec.ts (the per-page Playwright spec); scripts/smoke/phase-73f.sh.

Decision. Two calls land here.

1. The Tools page ships its seven-method Protocol surface + UI in one phase; the page IS the primitive's consumer (§13). The five read methods (tools.list / tools.get / tools.describe / tools.metrics / tools.content_stats) project the registered tool catalog for the Console operator lens; the two admin methods (tools.set_approval_policy / tools.revoke_oauth) mutate runtime tool state. The wire surface and the SvelteKit page land together so the §13 primitive-with-consumer rule is satisfied trivially — the page exercises every method end-to-end, and test/integration/tools_page_test.go is the same-PR end-to-end consumer (real tools.ToolCatalog + real wire transport + real ES256 auth). The methods route through a dedicated Tools dispatcher (IsToolsMethod), a sibling of the task-control / search / posture / topology surfaces — never the steering inbox.

2. The admin methods gate on the D-079 closed scope set (auth.ScopeAdmin) — there is NO tools.admin scope. The Phase 73f phase plan and the Wave 13 decomposition table (line 92) were authored before D-079 closed the scope set to exactly two scopes (admin + console:fleet). They named a tools.admin control-scope claim. D-079 is the higher-priority artifact (a settled decision); minting a third scope would re-litigate it. Departure from the phase plan, resolved per CLAUDE.md §4.3 + §15 (the RFC / settled-decision wins over the plan): the two Tools admin methods gate on the verified auth.ScopeAdmin claim, exactly as pause.list / events.subscribe / the posture methods do. A non-admin caller is rejected fail-closed with CodeIdentityScopeRequired (HTTP 403). The decomposition-table cell + the phase plan's relevant lines are corrected in the same PR. Consequently the deliverable set is the seven methods named above — the phase plan's speculative tools.invoke ("Try this tool" form) is NOT shipped in 73f; it is a heavier developer-scope surface deferred to a follow-up (the page-spec §6 "Try this tool" row stays [wave-13-extends] for that follow-up).

Why. Closes the Phase 73f acceptance criteria + the Wave 13 §5 Stage-2.1 lock-in. The CatalogProjector is the §4.4 seam: the V1 implementation projects the planner-visible catalog (no-granted-scopes view) + reads OAuth / approval / metrics / content-stats through an optional Annotator; a future phase elevates it to the admin full-discovery view (page-tools.md §9). The saved-filter wrapper is a typed view onto Phase 72h's shipped saved_filters table scoped to page = 'tools' — it adds NO table, honouring D-061.

Protocol additions. Seven method names + seventeen wire types (listed above). One wire-transport route: POST /v1/tools/{method}. Zero new error codes — the surface reuses CodeIdentityRequired (401), CodeIdentityScopeRequired (403), CodeNotFound (404), CodeInvalidRequest (400), CodeUnknownMethod (404), CodeRuntimeError (500). Zero new scopes (D-079 holds).

Departures from the phase plan (CLAUDE.md §4.3). (a) Admin gating is auth.ScopeAdmin, not a tools.admin scope — D-079 wins (call #2). (b) tools.invoke is not shipped in 73f — the seven-method deliverable set is what the task scope + the closed-scope posture allow; the "Try this tool" form is a follow-up. (c) protocol.ts is NOT regenerated — the cmd/harbor-gen-protocol-ts generator (D-093) has not shipped yet (Phase 72h committed protocol.ts as a hand-written stub, and there is no make protocol-ts-gen target). The Tools typed client is therefore hand-authored in a SIBLING module (web/console/src/lib/protocol/tools.ts), keeping the generated stub untouched; when the generator lands, the types migrate into protocol.ts mechanically.

Acceptance:

  • internal/protocol/methods/methods.go declares the seven tools.* methods + IsToolsMethod / IsToolsAdminMethod; IsControlMethod returns false for all seven.
  • internal/protocol/types/tools.go is the single home for the seventeen Tools wire types; the singlesource lockstep test passes.
  • The seven methods enforce identity-mandatory (CodeIdentityRequired on an incomplete triple); the two admin methods fail closed with CodeIdentityScopeRequired without the verified auth.ScopeAdmin claim.
  • internal/tools/protocol ships an N≥100 concurrent-reuse test (D-025) + an identity-isolation test; test/integration/tools_page_test.go exercises the surface end-to-end with real drivers + ES256 auth + an N≥10 stress run.
  • The Tools page (web/console/src/routes/tools/+page.svelte) renders the catalog table + detail panel + right-rail cards + run-history strip; it talks to the Runtime only through the typed ToolsClient — no hand-rolled fetch.
  • web/console/src/lib/db/saved_filters_tools.ts is a typed wrapper over the shipped saved_filters table scoped to page = 'tools' — no new table.
  • web/console/tests/tools-page.spec.ts covers the catalog render + facet toggle + drill-down + Approve path; it SKIPs cleanly pre-Phase-73m.
  • scripts/smoke/phase-73f.sh round-trips all seven methods and asserts the admin-scope reject path.

Structural precedents. D-079 (closed two-scope set) is the scope contract call #2 obeys. D-002 (single-source wire types) + D-072 (single-source method strings) are the protocol-package discipline tools.go / methods.go follow. D-024 (ToolPolicy reliability shell) + D-026 (heavy-content threshold) + D-062 (MCP-Apps DisplayMode) + D-083 (tool-side OAuth binding scope) + D-086 (tool-side approval gates) are the runtime concepts the wire types project. D-061 (Console DB local-only) is the carve-out the saved-filter wrapper honours. D-091 (harbor console deployment) + D-092 (Svelte 5 runes) + D-093 (generated protocol.ts) are the Console contracts the page rides on. D-110 (pause.list) is the structural precedent for the one-shot request/response wire handler in the stream package.


D-117 — Phase 73i Console Flows page: six flows.* Protocol methods on a dedicated stream-package handler family + a flow.Registry source-of-truth + the read-only Flows-page UI

Date: 2026-05-20

Status: Accepted.

Context. Phase 73i ships the Console Flows page — the read-only viewer for the runtime's engine-graph flows (D-063). The Wave 13 decomposition (docs/plans/wave-13-decomposition.md §12) and docs/design/console/page-flows.md §12 pin six [wave-13-extends] Protocol additions: flows.list (catalog with aggregate metrics), flows.describe (engine-graph payload), flows.runs.list (run history), flows.runs.describe (per-run timeline), flows.run (one-shot invocation), flows.metrics (sparkline aggregates). The page is the consumer (§13 satisfied trivially). It consumes the shipped Phase 73 state.history posture for per-run detail — no same-wave Protocol dependency.

Decision.

1. The six flows.* methods are a dedicated Protocol surface, not steering-control methods. They are declared in internal/protocol/methods/methods.go and registered in canonicalMethods; a closed canonicalFlowsMethods sub-set + an IsFlowsMethod O(1) predicate route them through the Flows-page handler — the same pattern Phase 72e (pause.list) and Phase 74 (topology.snapshot) established. IsControlMethod returns false for all six (the steering inbox stays exclusive to the Phase 54 nine).

2. The wire types live in internal/protocol/types/flows.go only. Flow, FlowFilter, FlowDescription (nodes + edges + per-node FlowNodePolicy + per-flow FlowBudget per D-023), FlowRun, FlowRunDescription, FlowRunRequest, FlowMetrics and their request/response envelopes — all registered in singlesource.CanonicalWireTypes. The FlowNodePause wire value is "pause_point" (not "pause") so it never collides with the pause Protocol method name in the single-source checker.

3. The runtime side is a transport-agnostic flow/protocol.Surface over two interface seams (§4.4). Catalog (registered flows + run history projections) and Invoker (one-shot run launcher). The production Catalog is RegistryCatalog, backed by a NEW flow.Registry — a real runtime subsystem (registered Definitions + a bounded per-flow run-history ring), NOT a test stub. The production Invoker is FuncInvoker, adapting a runtime-supplied LaunchFunc that delegates to the task registry's SpawnTool path. cmd/harbor dev and harbortest/devstack both wire an empty flow.Registry at boot — a fresh stack with no graph-family agents correctly serves an empty catalog (the right "no flows registered" empty state, not a missing surface).

4. flows.run is the only mutating method; it is gated on auth.ScopeAdmin (D-079). No new scope is minted (D-079 closed two-scope set). The Surface fails closed with ErrRunScopeRequiredCodeScopeMismatch (HTTP 403) when the claim is absent. The other five methods are read-only; a cross-tenant catalog / run-history filter requires auth.ScopeAdmin and fails closed with CodeIdentityScopeRequired (HTTP 403) without it.

5. Heavy run outputs route by-reference through the ArtifactStore (D-026). flows.runs.describe ships a FlowArtifactRef for any run output meeting the configured heavy-content threshold — never inline bytes. The RegistryCatalog fails loud on a store failure.

6. The wire transport is a dedicated stream-package handler family. internal/protocol/transports/stream/flows_handler.go mounts six POST /v1/flows/* routes (list / describe / runs/list / runs/describe / run / metrics), wired via transports.WithFlows. Each dispatch emits a per-page audit event — flows.page_viewed for the five reads, flows.run_invoked for the mutating run — onto the canonical EventBus.

7. The Console UI is view-only (D-063). The Flows page (web/console/src/routes/flows/) renders the catalog table, Flow Metrics card, the read-only engine graph canvas, the Budget meter, the run-history table, and the selected-run summary panel. There is NO authoring affordance — Add node / Delete edge / Save graph / New flow do not render, by construction. The engine graph canvas (web/console/src/lib/components/graph/EngineGraphCanvas.svelte) is SHARED with the future Phase 73b Live Runtime topology view; this phase establishes the typed GraphInput interface. All Runtime access flows through the typed FlowsClient (web/console/src/lib/flows/client.ts) — no hand-rolled fetch in .svelte files.

Why. Closes the Phase 73i acceptance criteria + the page-flows.md §12 binding refinements. The dedicated-handler-family + interface-seam shape keeps the Flows surface testable with deterministic fixtures and decoupled from the task subsystem's concrete type, mirroring the proven Phase 72e / 74 patterns. The flow.Registry is a genuine runtime subsystem so the Console projects a real catalog, not a stub (CLAUDE.md §13).

Protocol additions. Six method names (flows.list / flows.describe / flows.runs.list / flows.runs.describe / flows.run / flows.metrics); the internal/protocol/types/flows.go wire-type cluster; six POST /v1/flows/* REST routes; two canonical event types (flows.page_viewed, flows.run_invoked).

Acceptance: see docs/plans/phase-73i-console-flows-page.md — every criterion is covered by the surface / catalog / handler unit tests, the N≥100 concurrent-reuse test, test/integration/flows_page_test.go, the Vitest suites, the Playwright spec, and scripts/smoke/phase-73i.sh.

Structural precedents. D-023 (Flow-as-Tool: Go-coded V1 + per-flow Budget) is the Budget surface the page reads. D-026 (context-window safety net) is the heavy-output bypass. D-061 (Console DB local-only) is the posture Save snapshot / Compare versions honour. D-063 (Flows page = view over engine graphs; authoring post-V1) is the view-only mandate. D-079 (closed two-scope set) is the scope the mutating flows.run reuses. D-091 / D-092 / D-093 are the Console deployment + Svelte 5 + generated-client contracts. Phase 72e (pause.list) and Phase 74 (topology.snapshot) are the dedicated-Protocol-surface precedents.

Out of scope (post-V1). Flow authoring / editor / versioning / import-export (D-063); flows.set_budget per-flow Budget edit (page-flows.md §10); declarative YAML flow descriptors (D-023 — V1.1); "Convert to evaluation" (D-064); the cross-runtime flows aggregator (D-091). None is blocked by the V1 Flows-page shape; each lands additively.


D-118 — Phase 73j Console Memory page: three read-only memory.* Protocol methods over the shipped MemoryStore.Snapshot surface; NO new memory scope (D-079 closed-set reuse — audit B1); per-turn projection model; D-026 heavy-value bypass mirrored at the memory-inspector edge

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: RFC §5.2 (Protocol surface) + §6.6 (Memory subsystem) + §7 (Console layer), docs/plans/phase-73j-console-memory-page.md, docs/design/console/page-memory.md, internal/protocol/methods/methods.go (the MethodMemoryList / MethodMemoryGet / MethodMemoryHealth constants + the canonicalMemoryMethods set + the IsMemoryMethod predicate + the IsControlMethod exclusion), internal/protocol/types/memory.go (the thirteen wire structs — MemoryItem / MemoryFilter / MemoryListRequest / MemoryAggregates / MemoryListResponse / MemoryArtifactRef / MemoryMetadata / MemoryGetRequest / MemoryItemDetail / MemoryGetResponse / MemoryHealthRequest / MemoryHealthAggregate / MemoryHealthResponse — plus the MemoryScope / MemoryStrategyName / MemoryDriverName enums and the DefaultMemoryListPageSize 50 / MaxMemoryListPageSize 200 bounds), internal/memory/protocol/ (the List / Get / Health functions + the ErrContextLeak / ErrInvalidFilter / ErrPageOutOfRange sentinels + the per-turn projection helpers), internal/protocol/transports/stream/memory_handler.go (the three POST /v1/memory/{list,get,health} HTTP handlers), internal/protocol/transports/transports.go (the WithMemory mux option), internal/protocol/singlesource/singlesource.go (the CanonicalMethods + CanonicalWireTypes lockstep entries), internal/protocol/conformance/conformance.go (the matrix-exhaustiveness memory.* entries + skip branches), cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (the production + fixture WithMemory wiring — §17.6 production-mirror), web/console/src/lib/protocol-memory.ts (the typed MemoryClient), web/console/src/lib/db/saved_filters_memory.ts (the typed wrapper over the Phase 72h saved_filters table), web/console/src/routes/memory/+page.svelte + web/console/src/lib/components/memory/ (the page route + eight components), web/console/tests/memory-page.spec.ts (the per-page Playwright spec), test/integration/memory_page_test.go, scripts/smoke/phase-73j.sh.

Why: Phase 73j is the Wave 13 Stage-2.1 Console Memory page — it bundles the per-page Protocol additions and the page UI into one phase per the decomposition doc §5. Four design calls warrant a durable home so a later auditor does not flag any of them as drift.

1. NO new memory scope — cross-tenant memory listing gates on the D-079 closed two-scope set (audit B1, binding). The phase plan originally proposed minting memory.read + memory.crosstenant scopes; the cross-reference audit's B1 finding closed that. D-079 settled the canonical scope surface as a CLOSED two-scope set — auth.ScopeAdmin + auth.ScopeConsoleFleet — and Phase 72's Non-goals explicitly forbid a third scope. Reading the caller's own identity quadruple requires only an authenticated JWT carrying (tenant, user, session); widening MemoryFilter.TenantIDs beyond the caller's own tenant requires the verified auth.ScopeAdmin (or auth.ScopeConsoleFleet) claim — exactly like every other Stage-2 page (72 / 72a / 72e / 72f / 72g / 73c). internal/auth/scopes.go is unchanged. A missing-claim cross-tenant request is rejected loudly with CodeIdentityScopeRequired (HTTP 403); a missing/incomplete identity triple fails closed with CodeIdentityRequired.

2. The per-turn projection model — memory.list lists conversation turns within an identity scope. The shipped memory.MemoryStore interface (Phases 23–25) is per-identity: it has no per-item enumeration method. It exposes Snapshot(ctx, id) — an opaque JSON memory.Record{Strategy, Turns} — and Health(ctx, id). Phase 73j's internal/memory/protocol package projects that record into the Console-page row shape: each conversation turn becomes one MemoryItem row, keyed by a deterministic content-addressed per-turn key (memTurnKey — SHA-256 over the identity quadruple + the turn ordinal + the turn timestamp). The rolling-summary text is folded into the strategy metadata, not surfaced as a separate row. This is the honest projection of the runtime's memory state; the alternative (a new MemoryStore.List method) would have widened the shipped Phase 23–25 interface for one consumer. Memory is session-scoped by default (CLAUDE.md §6 rule 4); the projected rows carry Scope = "session".

3. The D-026 heavy-value bypass is mirrored at the memory-inspector edge. memory.get MUST NOT return raw bytes ≥ the heavy-content threshold. The classification — the MemoryItem.HeavyContent flag — is computed ONCE in snapshotTurns (so memory.list and memory.get agree on which rows are heavy); a heavy row's value is routed through the shipped ArtifactStore and the detail ships a by-reference MemoryArtifactRef with the inline Value left empty (exactly one of Value / ValueArtifact is ever populated). A defence-in-depth branch in buildDetail fails loudly with ErrContextLeak when a value that was NOT classified heavy nonetheless carries heavy bytes — mirroring the LLM-edge enforcement pass in internal/llm/safety.go. The negative test (leak_internal_test.go) drives a deliberately mis-classified row through the internal BuildDetailLeakProbe seam and asserts the loud failure.

4. The 24-hour memory.* event counters are tenant-scoped, not triple-scoped. memory.list / memory.health derive the IdentityRejected24h / RecoveryDropped24h counters from the Phase 72a events.aggregate surface over the memory.identity_rejected (D-033) / memory.recovery_dropped (D-035) event types. A memory.identity_rejected event by construction carries a partial identity with <missing> substituted for the empty component(s) (D-033) — a triple-scoped filter would never match a rejection whose session was the missing component. The counters therefore scope to the caller's tenant only; the tenant stays the outer isolation boundary, and cross-tenant fan-in still requires the D-079 scope claim enforced at the wire edge before List runs. The rejection EVENTS still surface verbatim on the page's right-rail card; only the rolled-up 24h count is the tenant-scoped aggregate.

Findings I'm departing from. The page spec docs/design/console/page-memory.md §12 (the mockup-aligned refinements table) names the recovery-dropped event memory.overflow_drop_oldest. The actually-shipped runtime constant is EventTypeMemoryRecoveryDropped with wire string memory.recovery_dropped (per D-035 + internal/memory/events.go). Phase 73j uses the shipped wire string — renaming the shipped event would be a D-035 re-litigation and is explicitly out of scope. A follow-up docs(design) PR reconciles the §12 wording. This is a §12-mockup-refinement drift, NOT an RFC-level departure.

Deviations (CLAUDE.md §4.3). The phase plan's public-API sketch typed the events dependency as events.Store; no such type exists — the shipped surface is *events.Aggregator (a compiled artifact wrapping the EventBus). The internal/memory/protocol functions take *events.Aggregator (a like-for-like signature refinement). The plan also specified the wire types regenerated into web/console/src/lib/protocol.ts by cmd/harbor-gen-protocol-ts (D-093); that generator command does not yet exist (Phase 72h committed protocol.ts as a hand-authored empty-but-typed stub; the generator lands in a later Console-tooling phase). Hand-editing the generated stub now would corrupt the generated-file contract. The Memory-page typed wire client therefore lives in a dedicated hand-authored web/console/src/lib/protocol-memory.ts module, kept in 1:1 lockstep with internal/protocol/types/memory.go; when the generator lands the module folds into protocol.ts mechanically. Neither deviation reaches RFC territory.

Protocol additions. Three method-name constants (memory.list / memory.get / memory.health), thirteen wire structs in internal/protocol/types/memory.go, three HTTP routes (POST /v1/memory/{list,get,health}). No new error code (the existing CodeIdentityRequired / CodeIdentityScopeRequired / CodeInvalidRequest / CodeNotFound / CodeRuntimeError cover every path). No Protocol version bump (additive surface per RFC §5.3).

Acceptance: see docs/plans/phase-73j-console-memory-page.md — every criterion is covered by the unit / leak / concurrent-reuse / integration tests + the phase-73j smoke + the Console-side Vitest + the Playwright spec.

Structural precedents. D-110 (the pause.list snapshot method) is the closest sibling — a read-only projection method mounted in the stream transport package, gating cross-tenant on the D-079 closed set, applying the D-026 heavy-content bypass per row. D-033 (memory.identity_rejected with <missing> substitution) + D-035 (memory.recovery_dropped / OverflowDropOldest) are the event surfaces the page consumes. D-026 (the context-window safety net) is the heavy-value-bypass posture mirrored here. D-061 (Console DB local-only) is the contract the saved_filters_memory.ts typed wrapper honours. D-065 (no session-level priority) is the invariant the Memory table preserves — no priority column, the Pinned chip is a Phase 24 strategy. D-079 (the closed two-scope set) is the binding scope posture (audit B1). D-091/D-092/D-093 are the Console scaffold contracts the page rides on.

Out of scope (Phase 73 / post-V1). The memory mutation surface (memory.put / memory.delete, the manual add/edit/evict UI — the bulk-action toolbar renders disabled-with-tooltip); memory.strategy_trace (the strategy debugger); memory.promotions (the cross-session promotion-policy viewer); TTL-based bulk eviction UI; the cross-runtime memory aggregator. None is blocked by the V1 read surface; each lands additively.


D-119 — Phase 73k Console MCP Connections page: twelve mcp.servers.* Protocol methods on a sibling MCPSurface; mcp.Registry read API; mcp.raw_html_trust_toggled audit event; D-079 closed-scope reuse

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/types/mcp_servers.go (the twelve mcp.servers.* wire types — request/response shapes, all flat Protocol-owned structs, never re-exports of the mcp driver); internal/protocol/methods/methods.go (the twelve MethodMCPServers* constants + IsMCPServersMethod / IsMCPAdminMethod O(1) predicates); internal/protocol/singlesource/singlesource.go (the twelve method strings + the wire-type homes); internal/protocol/mcp.go (the MCPSurface dispatcher — a sibling of the Phase 54 ControlSurface and the Phase 72f PostureSurface — plus the narrow MCPAccessor / MCPOAuthAccessor seam interfaces); internal/protocol/transports/control/mcp_handler.go (serveMCP REST adapter — per-method decoder, identity backfill, error mapping); internal/protocol/transports/control/control.go + internal/protocol/transports/transports.go (the WithMCPSurface wiring + the IsMCPServersMethod route branch); internal/tools/drivers/mcp/registry.go (the new process-local mcp.Registry read API — ListServers / GetServer / ListResources / ListPrompts / RefreshDiscovery / Probe / Health / SetRawHTMLTrust, projection-only ServerView / ResourceView / ... shapes, D-025-safe under a sync.RWMutex); internal/events/events.go (the EventTypeMCPRawHTMLTrustToggled canonical event + the MCPRawHTMLTrustToggledPayload SafePayload); internal/mcpconsole/mcpconsole.go (the wiring-package adapters RegistryAccessor / OAuthAccessor that bridge the mcp driver + tools/auth provider to the protocol-owned interfaces — kept out of internal/protocol so the Protocol package stays driver-free); web/console/src/routes/(console)/mcp-connections/+page.svelte + [server]/+page.svelte (the list + six-tab detail views); web/console/src/lib/mcp-connections/api.ts + state.svelte.ts (the typed Protocol client + the Svelte 5 runes state owner); web/console/tests/mcp-connections-page.spec.ts (the per-page Playwright spec); test/integration/mcp_connections_page_test.go (the cross-subsystem integration test); scripts/smoke/phase-73k.sh (the live-server smoke).

Decision. Four calls land here.

1. The twelve mcp.servers.* methods dispatch through a sibling MCPSurface, not the task-control ControlSurface. The MCP-Connections methods reach the runtime's MCP driver registry + the tool-side OAuth provider, not the steering inbox — exactly the posture of the Phase 72f PostureSurface. IsMCPServersMethod is the O(1) routing predicate; ControlSurface.Dispatch rejects a stray MCP method loudly (CodeInvalidRequest — "dispatch through the MCPSurface instead") rather than silently routing it onto the steering inbox.

2. Phase 28 ships one Provider per MCP server — Phase 73k adds the process-local mcp.Registry. The plan assumed an existing mcp.Registry; Phase 28 in fact ships a single Provider per attachment with no fleet registry. Phase 73k adds Registry — it holds the named providers and tracks per-server runtime stats (state, latency, discovery counts, reconnect history, raw-HTML trust) behind a documented-invariant sync.RWMutex. It is a D-025 reusable artifact; registry_concurrent_test.go pins N=128. Documented departure per CLAUDE.md §4.3.

3. Control-plane verbs gate on auth.ScopeAdmin (D-079 closed set) — no new MCP scope. The three admin verbs (refresh_binding / revoke_binding / set_raw_html_trust) AND the two control-plane verbs (refresh_discovery / probe) require the verified auth.ScopeAdmin claim; a miss surfaces CodeScopeMismatch. No mcp.* scope is minted — D-079's two-scope set (auth.ScopeAdmin + auth.ScopeConsoleFleet) stays closed.

4. Raw HTML from an MCP server is untrusted by default; the per-server trust toggle emits a mcp.raw_html_trust_toggled audit event. Brief 11 §"Open architectural questions" #8 — default-deny, explicit per-source trust toggle, audit when toggled. A successful mcp.servers.set_raw_html_trust emits the new SafePayload audit event (server name + boolean + actor identity quadruple) through the wired Redactor + Bus. A failed audit emit fails the call closed (CodeRuntimeError) — an un-auditable trust toggle is refused, never silently applied. The event is registered in the canonical event taxonomy (internal/events/events.go, alongside topology.changed) — the plan referenced internal/audit/events.go, which does not exist; audit events live in the closed event-type registry.

Why. Closes the Phase 73k acceptance criteria + the page-mcp-connections.md §12 mockup-aligned refinements. The page IS the §13 first consumer of every Protocol method it introduces, end-to-end, in the same PR.

Protocol additions. Twelve new method names (mcp.servers.list / get / resources / prompts / refresh_discovery / probe / health / bindings.list / policy / refresh_binding / revoke_binding / set_raw_html_trust); the matching request/response wire types; the mcp.raw_html_trust_toggled canonical event. Zero new error codes — CodeIdentityRequired / CodeScopeMismatch / CodeNotFound / CodeInvalidRequest / CodeRuntimeError suffice.

Documented deviations (CLAUDE.md §4.3). (a) The plan assumed an existing mcp.Registry; Phase 73k adds it (call #2). (b) The plan referenced internal/audit/events.go; the audit event lands in internal/events/events.go (call #4). (c) D-093's generated protocol.ts + the cmd/harbor-gen-protocol-ts generator + the make protocol-ts-gen target do not yet exist in-repo — protocol.ts is the Phase 72h empty stub. web/console/src/lib/mcp-connections/api.ts is the hand-authored typed client stand-in: every type mirrors the Go wire shape verbatim, all Protocol calls funnel through one protocolCall choke point, and no .svelte file hand-rolls a fetch (the §13 rule the page satisfies today). When the generator lands, api.ts's types regenerate into protocol.ts. (d) The auth.Provider exposes no fleet-wide binding-enumeration API; OAuthAccessor.ListBindings projects the configured binding scope + the caller's own token freshness — page-mcp-connections.md §8 confirms non-admin operators see only their own ScopeUser binding regardless. (e) harbor dev hosts no MCP servers, so its mux leaves the MCP surface unwired — mcp.servers.* returns CodeUnknownMethod (404) and the smoke 404→SKIP convention keeps preflight green, same posture as Phase 74's engine-less topology accessor.

Acceptance:

  • internal/protocol/types/mcp_servers.go declares the twelve request/response wire types; all are listed in singlesource.CanonicalWireTypes.
  • internal/protocol/methods/methods.go declares the twelve MethodMCPServers* constants + the IsMCPServersMethod / IsMCPAdminMethod predicates; Methods() returns 38.
  • MCPSurface.Dispatch fails closed on a missing identity (CodeIdentityRequired), gates the admin/control verbs on auth.ScopeAdmin (CodeScopeMismatch), and maps an unknown server to CodeNotFound.
  • mcp.Registry exposes the seven-method read API + SetRawHTMLTrust; TestRegistry_ListServers_ConcurrentReuse runs N=128 under -race.
  • A successful set_raw_html_trust emits exactly one mcp.raw_html_trust_toggled event with a SafePayload body carrying the actor quadruple.
  • test/integration/mcp_connections_page_test.go wires real mcp.Registry + real auth.Provider + real control transport + real bus + real redactor; asserts identity propagation, admin-claim gating, the audit emit, the not-found failure mode, and N=16 concurrent SSE-style subscriber stress; runs under -race.
  • The Console MCP Connections list + detail pages render through the typed api.ts client (no hand-rolled fetch in .svelte); all colour/spacing values are design tokens.
  • web/console/tests/mcp-connections-page.spec.ts lands in the same PR; scripts/smoke/phase-73k.sh is upgraded from skeleton to real assertions.

Structural precedents. D-114 (Phase 74 topology) is the sibling-surface pattern this phase mirrors — a read-only Protocol surface dispatched outside the steering inbox, left unwired on the engine-less harbor dev stack. D-111 / D-112 (PostureSurface) is the dispatcher-sibling shape. D-079 (closed two-scope set) is the no-new-scope rule. D-083 (tool-side OAuth auth.BindingScope) is the binding-state contract the OAuth & Auth tab consumes. D-062 (MCP-Apps DisplayMode + canonical renderer registry) is the no-bespoke-renderer rule. D-061 (Console DB local-only) is the carve-out the raw-HTML trust runtime-mirror sits inside. D-093 (generated protocol.ts) is the contract api.ts stands in for until the generator lands.

Out of scope (post-V1). Adding/removing MCP servers from the Console (mcp.servers.register — needs a runtime-config-mutation surface); per-tool MCP-Apps renderer customization (forbidden — brief 11 §PG-3); cross-runtime MCP catalog aggregator (D-091); per-server scheduled health checks / alerting; editing ToolPolicy from the Policy tab; the bulk "Disable" action; a fleet-wide per-server OAuth binding catalog (needs an auth.Provider enumeration extension).


D-120 — Phase 73l Console Artifacts page: artifacts.list/put/get_ref Protocol surface on a sibling ArtifactsSurface; the canonical renderer-registry skeleton; CodePresignUnsupported fail-loud resolver

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (the three artifacts.* method constants + the canonicalArtifactsMethods set + the IsArtifactsMethod O(1) predicate + the IsControlMethod exclusion); internal/protocol/errors/errors.go (the two new canonical codes — CodePresignUnsupported / CodeRequestTooLarge); internal/protocol/types/artifacts.go (the eleven flat artifacts wire types — ArtifactScope, SizeRange, TimeRange, ArtifactRef, ArtifactRow, ArtifactsListRequest/Response, ArtifactsPutOpts/Request/Response, ArtifactsGetRefRequest/Response — plus the ArtifactSource closed enum); internal/protocol/artifacts.go (the ArtifactsSurface dispatcher — a sibling of the Phase 54 ControlSurface and the Phase 72f PostureSurface, not an extension — plus the artifacts.uploaded event type + ArtifactUploadedPayload); internal/protocol/transports/control/control.go + artifacts_handler.go (the WithArtifactsSurface option + the serveArtifacts REST adapter, which applies an 8 MiB transport-edge body cap for the upload payload); internal/protocol/transports/control/status.go (the CodePresignUnsupported → 501 / CodeRequestTooLarge → 413 status mappings); internal/protocol/transports/transports.go (the WithArtifactsSurface mux option); internal/protocol/singlesource.go (the lockstep map gains the three method names + the eleven wire types); internal/protocol/conformance/conformance.go (the method-count bump 26→29, the error-code matrix gains the two new codes + their status pins, the artifacts methods are excluded from the happy-path / malformed-request matrices the same way the search / posture / pause / topology clusters are); internal/config/config.go (the ProtocolConfig.MaxRequestBytes field + DefaultMaxRequestBytes (4 MiB) + ResolvedMaxRequestBytes); cmd/harbor/cmd_dev.go (the NewArtifactsSurface + WithArtifactsSurface boot wiring — the dev inmem artifact store has no Presigner, so artifacts.get_ref fails loud with CodePresignUnsupported); web/console/src/lib/chat/renderers/ (the canonical renderer-registry skeletonindex.ts dispatch table + the six MIME renderers markdown/code/image/pdf/audio/json + a fallback renderer + the README.md dispatch-contract one-pager); web/console/src/lib/protocol.ts (hand-extended with the artifacts wire types + the ProtocolClient interface + HTTPProtocolClient — see the deviation note below); web/console/src/routes/console/artifacts/ (the page route — +page.svelte + filter_bar.svelte + artifacts_table.svelte + right_rail.svelte + preview_pane.svelte + bulk_toolbar.svelte); web/console/tests/artifacts-page.spec.ts (the Playwright per-page spec); internal/protocol/artifacts_test.go + artifacts_concurrent_test.go (the surface unit + D-025 concurrent-reuse tests); test/integration/artifacts_page_test.go (the §17.1 integration test across the in-mem / SQLite / fs drivers); web/console/src/lib/chat/renderers/registry.spec.ts (the renderer-registry Vitest); scripts/smoke/phase-73l.sh (the live-server smoke); docs/plans/README.md + README.md (Phase 73l → Shipped); docs/glossary.md (the four new vocabulary entries).

Decision. Four calls land here.

1. The artifacts methods route through a sibling ArtifactsSurface, not the task-control ControlSurface. The three artifacts.* methods are not steering controls — they do not reach the task registry, they carry their own flat wire shapes, and the upload method carries a payload larger than the 64 KiB control-body cap. They follow the exact pattern the Phase 72c SearchSurface, the Phase 72f PostureSurface, and the Phase 74 topology dispatcher established: a dedicated dispatcher, an O(1) IsArtifactsMethod predicate, an IsControlMethod exclusion, and an additive WithArtifactsSurface transport option that preserves the 404 → SKIP smoke path on a build without the surface wired. The phase plan placed the surface at internal/protocol/handlers/artifacts.go; the codebase has no handlers/ sub-package — every Protocol surface (ControlSurface, SearchSurface, PostureSurface) lives at internal/protocol/. Phase 73l follows the established convention and places the surface at internal/protocol/artifacts.go — a documented CLAUDE.md §4.3 deviation; the RFC §5.1 single-source rule and the existing surface pattern win over the stale plan path.

2. artifacts.get_ref fails loud with CodePresignUnsupported on a non-S3 driver. The read-side presigned-URL resolver type-asserts the ArtifactStore to artifacts.Presigner (the optional capability only the Phase 19 S3 driver implements). A driver without the capability returns the new CodePresignUnsupported code (HTTP 501) — never a silent fallback to byte-streaming (D-022 fail-loud posture, CLAUDE.md §13). The Console renders the typed error as a "Preview not available — driver does not support presigned URLs" placeholder plus a Download fallback. Heavy bytes never cross the Protocol inline (D-026): artifacts.list returns metadata-only rows, artifacts.get_ref returns a presigned URL, artifacts.put accepts upload bytes on the request leg only and returns a reference.

3. The canonical renderer-registry skeleton is an extensible dispatch table — open for registration, closed for modification. Phase 73l is the FIRST in-staging consumer of the shared renderer registry (web/console/src/lib/chat/renderers/, the canonical path per Brief 12). It ships the dispatch core (index.ts — a first-match-wins ordered rule list + registerRenderer / dispatchRenderer / mimeIs / mimePrefix) plus six MIME renderers + a fallback. Phase 73n (Playground, Stage 2.3) is the SECOND consumer and EXTENDS the registry with chat-bubble / tool-call / diff renderers by calling registerRenderer from its own module init — it does NOT edit the dispatch core. This closes the Wave 12 R5 audit finding (a hard-coded per-mime switch would bit-rot at the second consumer). The Artifacts preview pane dispatches through dispatchRenderer; the route directory carries NO bespoke per-mime .svelte renderer (the Playwright spec asserts this statically).

4. The artifacts surface uses ONLY the D-079 closed scope set. No new scope is minted. A cross-tenant artifacts.list (request scope tenant differs from the caller's verified tenant) requires auth.ScopeAdmin or auth.ScopeConsoleFleet; a cross-tenant artifacts.put requires auth.ScopeAdmin. Identity is mandatory at every boundary — a missing tenant/user/session returns CodeIdentityRequired, fail-closed (CLAUDE.md §6 rule 9). The mutation surfaces (artifacts.delete / set-retention) are NOT shipped — the Console renders them disabled-with-tooltip per the page spec §10 deferred list.

Why. Closes the Phase 73l acceptance criteria + the Wave 13 decomposition row 73l. The page IS the consumer of the extended artifacts.list filter shape and the new artifacts.put method (the §13 primitive-with-consumer rule, satisfied in the same wave). The renderer-registry skeleton lands with its first consumer (the Artifacts preview pane) so the dispatch contract is validated against a real call site before Phase 73n extends it.

Findings I'm departing from. Two documented deviations. (a) The phase plan's internal/protocol/handlers/artifacts.go path — the codebase has no handlers/ sub-package; the surface lands at internal/protocol/artifacts.go matching the SearchSurface / PostureSurface convention (call #1 above). (b) web/console/src/lib/protocol.ts is hand-extended — D-093 specifies protocol.ts is generated by cmd/harbor-gen-protocol-ts, but that generator binary has not yet landed (Phase 72h committed protocol.ts as a hand-shaped stub and noted "Downstream Console phases regenerate it"). Phase 73l hand-extends the file following the stub's shape and keeps the CODE GENERATED … DO NOT EDIT header; when the generator lands it regenerates the file verbatim from the Go CanonicalWireTypes. Both deviations are recorded in the phase plan.

Protocol additions. Three method names (artifacts.list, artifacts.put, artifacts.get_ref), two error codes (presign_unsupported → 501, request_too_large → 413), eleven wire types, one canonical event type (artifacts.uploaded). No new capability constant — the artifacts surface follows the search-cluster precedent (the search.* cluster advertises no Cap* constant either). The wire-transport route is the existing POST /v1/control/{method} REST surface.


D-121 — Console design-system foundation: route group, app shell, shared components/ui/ inventory, 4-state PageState async contract, unified HarborClient, token reconciliation

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: docs/design/console/CONVENTIONS.md (the new binding conventions doc — the authority every future Console page phase cites); CLAUDE.md / AGENTS.md §4.5 (a binding bullet pointing at CONVENTIONS.md as the Console design authority); web/console/src/lib/components/ui/ (the eleven-component shared inventory — PageHeader, FilterBar, SavedViewChips, DataTable, BulkActionBar, DetailRail, RailCard, StatusChip, Pagination, ConnectionFooter, PageState); web/console/src/lib/protocol/ (the unified HarborClient class + the injectable ProtocolClient interface + the single ProtocolError class — namespaces tools / memory / flows / artifacts / mcp ported from the five legacy hand-authored clients); web/console/src/lib/connection.ts (the single {baseURL, token, identity} resolver, null when unattached); web/console/src/lib/tokens.css (reconciled — four border tokens collapsed to one --border-hairline, three rail-width tokens to one --size-rail, raw hex de-hardcoded, phase-stamped comment blocks removed, back-compat aliases retained so the five existing pages still resolve); web/console/src/routes/(console)/+layout.svelte (the app shell — sidebar with the 14-page IA in four clusters, top-bar breadcrumb + identity/connection indicator, shared footer); the five merged page routes relocated under routes/(console)/ (tools/, memory/, flows/ moved in; console/artifacts/(console)/artifacts/; mcp-connections/ already under (console)/); web/console/src/routes/(console)/overview/+page.svelte (the placeholder redirect target); web/console/src/routes/+page.svelte (root //overview redirect).

Decision. A foundation audit of the first five merged Console pages (Tools, Memory, MCP Connections, Artifacts, Flows) found deep per-page drift: three conflicting route conventions plus broken cross-page links, no app shell, five incompatible async-state contracts, ~13 duplicated UI concepts, five hand-authored Protocol clients, fragmented design tokens. This PR lays the shared foundation; a later wave refactors each page's internals onto it. Six calls land here.

1. One route group. Every Console page is a top-level URL segment hosted under web/console/src/routes/(console)/, a SvelteKit route group whose sole purpose is to attach the shared app-shell layout. URLs carry no /console/ prefix and no group name. Detail views are (console)/<page>/[id]/+page.svelte uniformly. Root / redirects to /overview. All inter-page links use the unprefixed form.

2. One app shell. (console)/+layout.svelte renders the persistent sidebar (14-page IA in four clusters — Runtime / Execution / Resources / Settings; Playground is a session-level surface, not a sidebar entry), the top bar (breadcrumb + identity/connection indicator), the shared ConnectionFooter, and the content region.

3. One shared component inventory. web/console/src/lib/components/ui/ holds the eleven cross-page primitives, built on Skeleton, design-tokens-only, Svelte 5 runes. Page-specific components stay in components/<page>/. No two components share a name.

4. One four-state async contract. <PageState> owns Disconnected / Loading / Error / Empty as a mutually-exclusive if/else-if chain. Disconnected (no Runtime) is never conflated with Error. Loading renders a shape-matched skeleton. Error renders code: message plus a mandatory Retry button and suppresses any stale primary view. Detail rails get a nested PageState.

5. One typed client layer. web/console/src/lib/protocol/ ships the HarborClient class with method namespaces, the injectable ProtocolClient interface, and one ProtocolError class with uniform (code, message, status) — status is never dropped. connection.ts is the single {baseURL, token, identity} resolver. One fetch choke point. Each method targets whatever route the Runtime actually mounts for it.

6. One reconciled token scale. tokens.css is a single coherent scale extended in place — one hairline-border token, one rail-width token, no raw hex literals outside the base palette, no phase-stamped comment blocks. Back-compat aliases retain every existing var(--…) reference so the five relocated pages still build.

Why. Closes the foundation-audit findings. The Console grew page-by-page with no shared spine; the drift was cumulative and would compound with every future page. CONVENTIONS.md is the forcing function — every future page phase plan cites it in a mandatory "Console consistency" section, and a divergent page PR is rejected on sight. This is the §13 "two parallel implementations" rule applied to the Console: one route convention, one shell, one client, one token scale.

Findings I'm departing from. This PR relocates and links the five pages onto the foundation but does NOT refactor their internal logic — each page keeps its existing per-page components and legacy client for now; the internal refactor onto components/ui/ and HarborClient is an explicitly deferred later wave. The unified HarborClient and the legacy per-page clients coexist transiently until that wave; this is a deliberate, time-boxed exception to the §13 "two parallel implementations" rule, scoped to the foundation→refactor transition and recorded here so it is not mistaken for permanent drift.

Protocol additions. None — this is a Console-only consolidation. No new Protocol methods, error codes, or wire types. The HarborClient namespaces port the union of the five existing hand-authored clients' method surfaces against the routes the Runtime already mounts.


D-122 — Console Sessions page: sessions.list + sessions.inspect Protocol methods + the SvelteKit Sessions list/detail route

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (the two sessions.* method constants + the canonicalSessionsMethods set + the IsSessionsMethod O(1) predicate + the IsControlMethod exclusion); internal/protocol/types/sessions.go (the nine flat Sessions wire types — Window, SessionFilter, SessionsListRequest, SessionRow, SessionsListResponse, InterventionSummary, ArtifactRefSummary, SessionsInspectRequest, SessionsInspectResponse — plus the SessionStatus / SessionSort closed enums + the pagination-bound constants); internal/sessions/protocol/ (the Service — a sibling of the Phase 73f tools/protocol.Service — plus the Projector seam, the ListerProjector V1 implementation over the Phase 08 SessionLister, the opaque versioned cursor codec, the facet-filter predicate, and the SessionsAdminQueryPayload audit emit); internal/protocol/transports/stream/sessions_handler.go (the SessionsHandler wire adapter — POST /v1/sessions/{list,inspect}); internal/protocol/transports/transports.go (the WithSessionsService mux option); internal/protocol/singlesource/singlesource.go (the lockstep map gains the two method names + the nine wire types); internal/protocol/conformance/conformance.go (the method-count bump + the sessions-cluster skip, the same posture the search / pause / flows clusters take); cmd/harbor/cmd_dev.go (the sessions.New registry + NewListerProjector + NewService + WithSessionsService boot wiring); web/console/src/lib/protocol/client.ts + harbor.ts (the SessionsNamespace added to the unified HarborClient); web/console/src/lib/protocol/sessions.ts (the typed SessionsProtocol wrapper); web/console/src/lib/sessions/ (the wire types + the formatting helpers); web/console/src/lib/db/saved_filters_sessions.ts (the Console-DB saved-filter wrapper — page = 'sessions' scoped, NO new table); web/console/src/lib/components/sessions/ (the page-specific components — SessionFacetChips, SessionSummaryCard, RecentInterventionsCard, RecentArtifactsCard, BottomDockTabs, IdentityCell); web/console/src/routes/(console)/sessions/ (the list route + the [id]/ detail route); web/console/tests/sessions-page.spec.ts (the Playwright per-page spec); internal/sessions/protocol/protocol_test.go + concurrent_test.go (the Service unit + D-025 N≥100 concurrent-reuse tests); internal/protocol/transports/stream/sessions_handler_test.go (the handler unit tests); test/integration/sessions_page_test.go (the §17.1 integration test — real registry + real wire transport + real ES256 auth); web/console/src/lib/sessions/tests/format.spec.ts + web/console/src/lib/db/tests/saved_filters_sessions.spec.ts (the Console-side Vitest); scripts/smoke/phase-73c.sh (the live-server smoke); docs/plans/README.md + README.md (Phase 73c → Shipped); docs/glossary.md (the four new vocabulary entries).

Decision. Four calls land here.

1. The sessions methods route through a sibling Sessions handler, not the task-control ControlSurface. The two sessions.* methods are read-only projections — they do not reach the task registry, they carry their own flat wire shapes. They follow the exact pattern the Phase 73f tools.* handler and the Phase 73i flows.* handler established: a dedicated stream-package handler at POST /v1/sessions/{verb}, an O(1) IsSessionsMethod predicate, an IsControlMethod exclusion, and an additive WithSessionsService transport option that preserves the 404 → SKIP smoke path on a build without the surface wired. The phase plan placed the handler at internal/server/sessions_list.go; the codebase has no internal/server/ package — every Protocol wire handler lives under internal/protocol/transports/. Phase 73c follows the established convention — a documented CLAUDE.md §4.3 deviation; the RFC §5.1 single-source rule and the existing handler pattern win over the stale plan path.

2. sessions.list cursor pagination is opaque + version-prefixed; Truncated replaces a silent exact total. The cursor is a base64-url-encoded (version, sort-key, cost, session-id) tuple — opaque to clients, carrying a 1-byte version prefix so a future encoding change fails loudly with CodeInvalidRequest rather than silently degrading. The response emits Truncated bool (D-026 fail-loudly) — never an exact O(N) total under high cardinality. The forward-then-filter resolution for the Query field (the runtime forwards a free-text query to the search.sessions index first, the SessionFilter axes are post-search refinements) is pinned here per the phase plan's open question.

3. The sessions.inspect Row projection ships; the recent-interventions / recent-artifacts cards populate from the live event stream. The Phase 08 Session record does not model per-session cost / token / task / event counters or the agent binding — those are surfaced from the Console's own event-stream subscription on the detail route (the llm.cost.recorded aggregation is Console-local per the page spec §3). sessions.inspect ships the SessionRow projection + empty capped recent_interventions / recent_artifacts slices; the cards populate client-side from the event stream. This keeps sessions.list / sessions.inspect pure registry projections (no shadow aggregation store — D-061) and is a documented Phase 73c deviation. The phase plan framed sessions.inspect as an additive extension of a Phase 73 parent method; Phase 73 has not shipped sessions.inspect, so Phase 73c lands it whole.

4. The Sessions page uses ONLY the D-079 closed scope set. No new scope is minted. A cross-tenant sessions.list (a TenantIDs entry naming a tenant other than the caller's verified tenant) requires auth.ScopeAdmin (or auth.ScopeConsoleFleet); a missing claim fails closed with CodeScopeMismatch (HTTP 403) and a successful admin-scope query emits an audit.admin_scope_used event. Identity is mandatory at every boundary — a missing tenant/user/session returns CodeIdentityRequired, fail-closed (CLAUDE.md §6 rule 9). The bulk Cancel / Pause toolbar actions are control-plane verbs (D-066): the toolbar always renders them, disabled-with-tooltip — never a faked success string.

Why. Closes the Phase 73c acceptance criteria + the Wave 13 decomposition row 73c. The page IS the consumer of the new sessions.list method (the §13 primitive-with-consumer rule, satisfied in the same phase). The Sessions-page Identity column is the same-wave consumer of Phase 72b's IdentityScope impersonation triplet — it renders the verified actor triple plus a separate impersonating chip for admin-initiated runs, discharging Phase 72b's binding cross-reference.

Findings I'm departing from. Three documented deviations. (a) The phase plan's internal/server/ handler path — the codebase has no internal/server/ package; the handler lands at internal/protocol/transports/stream/sessions_handler.go matching the Phase 73f / 73i precedent (call #1). (b) sessions.inspect is shipped whole, not as an additive extension of a Phase 73 parent method that has not landed (call #3). (c) web/console/src/lib/protocol.ts is NOT hand-edited — D-093 pins it as generated, and the generator has not landed; Phase 73c follows the Phase 73i Flows-page precedent instead, placing the Sessions wire types in web/console/src/lib/sessions/types.ts and a typed SessionsProtocol wrapper over the unified HarborClient sessions namespace.

Protocol additions. Two method names (sessions.list, sessions.inspect), nine wire types, one reused canonical event type (audit.admin_scope_used — a new SessionsAdminQueryPayload). No new error codes — the four mapped codes (CodeIdentityRequired, CodeScopeMismatch, CodeNotFound, CodeInvalidRequest) are all canonical. No new capability constant — the sessions surface follows the search / tools / flows precedent. The wire-transport route is POST /v1/sessions/{verb}.


D-123 — Phase 73d Console Tasks page: tasks.list/tasks.get read surface on a sibling Tasks dispatcher; the kanban board as the primary view; the bulk toolbar consumes the shipped Phase 54 control verbs

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (MethodTasksList / MethodTasksGet constants + the canonicalTasksMethods set + the IsTasksMethod predicate); internal/protocol/types/tasks.go (the Tasks-page wire types — TaskRow, TaskFilter, TaskListAggregates, TaskListCursor, TaskListRequest, TaskListResponse, TaskDetail, TaskParentSessionRef, TaskParentTaskRef, TaskCostRollup, TaskCostStep, TaskPlannerSnapshotRef, TaskGetRequest + the TaskStatus / TaskKind wire enums); internal/tasks/protocol/ (the Tasks Protocol Service + the Projector seam + the V1 RegistryProjector + the Enricher seam + the TasksAdminActionPayload audit emit); internal/protocol/transports/stream/tasks_handler.go (the POST /v1/tasks/{method} wire handler); internal/protocol/transports/transports.go (the WithTasksService mux option); cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (the production + devstack wiring); web/console/src/routes/(console)/tasks/ (the Tasks page route); web/console/src/lib/components/tasks/ (the kanban board + per-task action bar + detail tabs + rail-card bodies); web/console/src/lib/protocol/tasks.ts (the wire-type surface); web/console/src/lib/protocol/client.ts (the tasks + control HarborClient namespaces); web/console/src/lib/db/saved_filters_tasks.ts (the Console-DB saved-filter wrapper).

Decision. Phase 73d ships the Console Tasks page — the task-granularity counterpart to Sessions — as a Protocol client built on the D-121 design-system foundation. Three calls land here.

1. Two net-new READ methods, on a sibling Tasks dispatcher. tasks.list (paginated, faceted task-row projection + per-status aggregates + cursor pagination) and tasks.get (enriched single-task detail — parent-session ref, parent-task ref, per-step cost rollup, planner-snapshot ref). Both are READS. The wire-transport route is POST /v1/tasks/{method}, a sibling of tools.* / memory.*IsTasksMethod is the O(1) predicate the stream transport branches on, and IsControlMethod excludes the pair. Identity is mandatory (CodeIdentityRequired); a cross-tenant tasks.list fan-in requires the verified auth.ScopeAdmin claim (D-079 closed two-scope set — CodeScopeMismatch otherwise) and emits audit.admin_scope_used; a cross-tenant tasks.get returns CodeNotFound (existence is never revealed). No new error code, no new scope.

2. The bulk toolbar consumes the SHIPPED Phase 54 control verbs. The Tasks page's bulk-action toolbar + per-task action bar + card-drag-across-columns all invoke the EXISTING Phase 54 cancel / pause / resume / prioritize / approve / reject methods through the control transport (POST /v1/control/{verb}) — there is NO tasks.* mutating method (CLAUDE.md §13 "no parallel implementations"). The Console-side ControlNamespace on HarborClient is a typed thin wrapper over those routes; a verb targets a task by its run id carried in identity.run. Card drag is wired to the matching verb (Running to Paused = pause, Paused to Running = resume, Running to Failed = cancel); Pending to Running is server-initiated and the drag is a no-op with an inline toast.

3. The kanban board is the depth-bar primary view. Per the Tasks page spec the primary view is a 4-column kanban board (Pending / Running / Paused / Failed), not a flat DataTable — but it still mounts inside the shared app shell, routes async state through the four-state <PageState>, carries Console-DB-backed SavedViewChips + real Pagination + a DetailRail, and offers a list-mode DataTable toggle. The kanban pieces live in components/tasks/ and compose ui/ primitives underneath; the bulk-action toolbar is the shared BulkActionBar — not a forked per-page toolbar.

Why. The Tasks page answers "what's running across all sessions right now?" / "every task that failed in the last hour" — questions one notch below Sessions. tasks.list is high-cardinality and runtime-side (brief 11): the runtime owns the index, not Console-side substring matching. The two methods are reads because the page is observation + control, and control is already a shipped surface (Phase 54) — re-minting a tasks.cancel would be the §13 "two parallel implementations" violation. The page is the §13 primitive-with-consumer discharge for the Tasks Protocol surface — it lands in the same phase as the methods.

Findings I'm departing from. None against the briefs. The internal/protocol/types/tasks.go wire types are Protocol-local (the TaskStatus / TaskKind enums are NOT the runtime-internal tasks.TaskStatus / tasks.TaskKind) — internal/protocol/types does not import internal/tasks, keeping the Protocol layer's vocabulary its own (CLAUDE.md §8). The Phase 73d plan sketched TaskDetail.Task as the internal tasks.Task; this PR projects a flat TaskRow instead so the Console never reads an internal Go type (the same posture tools.go took). RegistryProjector scopes tasks.list to the caller's own session — the realistic V1 surface, since tasks.TaskRegistry.List is session-scoped; the Projector seam admits a future cross-runtime aggregating projector without reshaping the Service.


D-124 — Phase 73e Console Agents page: eight agents.* read-only Protocol methods on a sibling agents handler over a registry/protocol Service; control verbs stay the shipped registry.* surface

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/types/agents.go (the twenty-eight Agents-page wire types + the three string enums); internal/protocol/methods/methods.go (the eight agents.* method constants + canonicalAgentsMethods + IsAgentsMethod); internal/runtime/registry/protocol/ (the new package — Service over a Projector seam, the V1 RegistryProjector over a registry.AgentRegistry with an optional ConfigSource join); internal/protocol/transports/stream/agents_handler.go (the POST /v1/agents/{method} wire handler); internal/protocol/transports/transports.go (WithAgentsService); cmd/harbor/cmd_dev.go (the dev stack constructs the Agent Registry + the agents Service and mounts the route); web/console/src/lib/protocol/agents.ts + the agents namespace on HarborClient; web/console/src/routes/(console)/agents/ (list + [id] detail routes); web/console/src/lib/components/agents/; web/console/src/lib/db/saved_filters_agents.ts.

Decision. The Console Agents page consumes eight NEW agents.* Protocol methods. Three calls land here.

1. Eight read-only methods on a sibling handler, not the control surface. agents.list / agents.get / agents.tools / agents.memory / agents.governance / agents.skills / agents.permissions / agents.metrics are all read-only projections of the Agent Registry. They route through a dedicated POST /v1/agents/{method} wire handler in the stream package — the same posture as the Phase 73f tools.* cluster — not the task-control ControlSurface. IsAgentsMethod makes IsControlMethod return false for them so the steering inbox stays the Phase 54 nine.

2. Control verbs stay the shipped registry.* surface — Phase 73e mints NO control method. The five fleet-control verbs the Agents page exposes (Pause / Drain / Restart / Force-Stop / Deregister) are the EXISTING shipped registry.* control verbs (Phase 53a, D-066), gated on the elevated control-scope claim. Phase 73e adds no control Protocol method and no new wire type for them; the page renders the control buttons disabled-with-tooltip for an operator without the claim (CONVENTIONS.md §5 — no stubbed action). This is the §13 "no two parallel implementations" rule: the registry control surface already exists; the page consumes it rather than cloning it.

3. agent_id is NOT an isolation principal. Every agents.* method is identity-mandatory and filters by the (tenant, user, session) tuple read from the request context — never by agent_id (D-059, CLAUDE.md §6 clarifying note). The RegistryProjector delegates scoping to the registry's own tuple-scoped storage; a cross-tenant agents.get returns not_found, never another tenant's agent.

Why. Brief 11 / Brief 12 pin agent management as a binding V1 Console surface; the Agent Registry (D-059 / D-060) already owns the data. Splitting the page surface into eight specialised methods rather than overloading agents.get follows Brief 11's recommendation directly — each detail tab loads independently through its own nested PageState. Keeping the methods read-only and routing control through the shipped registry.* surface keeps the Console an honest Protocol client and avoids a parallel control path.

Findings I'm departing from. None on design. One implementation note: the registry persists only the version_hash of an agent's AgentConfig, not the config itself, so the configuration-derived projections (agents.tools / agents.memory / agents.governance / agents.skills and the AgentConfig on agents.get) join through an optional ConfigSource seam on the RegistryProjector. When no ConfigSource is wired the methods return an HONEST empty projection (an empty binding list, a zero-value memory binding) — they still validate identity and the agent's existence and still fail loud with not_found for a missing agent; this is not a stubbed success (CLAUDE.md §13). Production wiring supplies a ConfigSource as the subsystems that own that data (tool catalog, memory configs, Phase 36 governance, skills catalog) grow their join surfaces.

Protocol additions. Eight method names (agents.list / agents.get / agents.tools / agents.memory / agents.governance / agents.skills / agents.permissions / agents.metrics), twenty-eight wire types in internal/protocol/types/agents.go. No new error code (the agents surface reuses identity_required / not_found / invalid_request / runtime_error). No new capability constant. No new scope — agent control/admin gates on the existing auth.ScopeAdmin (D-079 closed two-scope set). The wire-transport route is the new POST /v1/agents/{method}.


D-125 — Phase 73g Console Events page: composition-only UI over shipped events.subscribe / events.aggregate / artifacts.get_ref; no new Protocol method

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: web/console/src/routes/(console)/events/+page.svelte + +page.ts (the page route — served at /events, no /console/ URL prefix, per CONVENTIONS.md §1 / D-121); web/console/src/lib/events/ (the page-local lib module — filters.ts, sparkline.ts, export.ts, taxonomy.ts, subscription.svelte.ts, aggregate.svelte.ts, state.svelte.ts, saved-views.svelte.ts); web/console/src/lib/components/events/ (the page-specific components — EventFilterChips, EventRateSparkline, EventTable rendered inline via the shared DataTable, EventDetailRail, PauseStreamToggle, ExportMenu, TruncatedPayloadLink); web/console/src/lib/protocol/events.ts (the Console-side events.* wire types) + the new EventsNamespace on HarborClient; web/console/src/lib/db/saved_filters_events.ts (the typed wrapper over the shipped Phase 72h saved_filters table, scoped to page = 'events' — NO new table); web/console/tests/events-page.spec.ts (the per-page Playwright spec); test/integration/events_page_test.go (the Go-side §17.1 integration test).

Decision. The Console Events page is the runtime event-bus stream as a full-screen, query-driven investigative surface. Four calls land here.

1. Composition-only — NO new Protocol method. Phase 73g ships zero new Protocol methods, wire types in internal/protocol/types/, or method names in internal/protocol/methods/methods.go. The page is a pure UI consumer of already-shipped surface: events.subscribe (the GET /v1/events SSE table feed — Phase 72), events.aggregate (the POST /v1/events/aggregate sparkline feed — Phase 72a), and artifacts.get_ref (the heavy-payload Open artifact resolver — Phase 73l). The §13 primitive-with-consumer rule is satisfied trivially: 73g IS the consumer Phase 72a's primitives waited for.

2. The EventsNamespace joins the unified HarborClient. New page surfaces add a namespace, never a new top-level client (CONVENTIONS.md §6). client.events.aggregate(...) wraps the events.aggregate POST through the single Transport choke point; client.events.subscribeURL(...) builds the SSE EventSource target (a long-lived GET the request/response Transport does not model — the bearer token rides as a query param because EventSource cannot set an Authorization header). No .svelte file constructs the URL or the EventSource by hand.

3. Saved views, pause-stream, export, pagination size are Console-local (D-061). The saved-filter chips persist a JSON-encoded EventFacetState in the shipped saved_filters Console DB table scoped to page = 'events' — no new table, no Protocol round-trip. The Pause-stream toggle is a Console-local render gate, distinct from the runtime pause Protocol method (which is task-scoped — RFC §5.2): while paused the SSE cursor keeps advancing per D-029 and incoming events buffer; resume flushes them in cursor order. Export ▾ serialises the loaded page to NDJSON / CSV client-side.

4. Heavy payloads flow by reference (D-026) — cross-tenant gated on the closed scope set (D-079). A truncated event payload carries an artifact_ref, never inline bytes; the TruncatedPayloadLink resolves it via artifacts.get_ref. No Svelte component inlines heavy bytes. The Tenant ▾ facet is gated on auth.ScopeAdmin / auth.ScopeConsoleFleet — the D-079 closed two-scope set; NO events.crosstenant scope is minted (the PR #142 audit closed that). Cross-tenant fan-in emits audit.admin_scope_used, which the page's own table surfaces.

Why. Closes the Phase 73g acceptance criteria + the Wave 13 decomposition row 73g. The page validates Phase 72a's EventFilter / events.aggregate primitives against a real call site in the same wave, and routes every heavy payload through artifacts.get_ref so the D-026 leak shape is closed at the Console edge.

Findings I'm departing from. One documented deviation. The phase plan (authored before D-121 landed) lists the route at web/console/src/routes/console/events/+page.svelte. CONVENTIONS.md §1 (D-121) is the binding cross-cutting authority and pins the (console) route group with NO /console/ URL prefix; the page ships at web/console/src/routes/(console)/events/ accordingly. The phase plan's web/console/src/lib/events/components/ path is likewise corrected to web/console/src/lib/components/events/ to match the components/<page>/ convention CONVENTIONS.md §3 pins. Both corrections follow CLAUDE.md §15 (a plan that contradicts a higher-priority artifact yields to it).

Protocol additions. None — this is a Console-only page phase. No new Protocol methods, error codes, or wire types. The Console-side events.ts wire types mirror the shipped internal/protocol/types/events.go field-for-field.


D-126 — Phase 73b Console Live Runtime page: composition over shipped surfaces + the single tasks.list status-counter-strip aggregate; no new Protocol method

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/types/tasks.go (the new TasksListStatusCounterStrip wire type + the opt-in TaskListRequest.IncludeStatusCounterStrip field + the TaskListResponse.StatusCounterStrip field); internal/tasks/protocol/list.go (the server-side, identity-scoped aggregate computation); internal/protocol/singlesource/singlesource.go (the TasksListStatusCounterStrip CanonicalWireTypes entry); web/console/src/routes/(console)/live-runtime/ (the page route at /live-runtime + the [session_id] deep-link route — no /console/ URL prefix, per CONVENTIONS.md §1 / D-121); web/console/src/lib/components/live-runtime/ (the page components, incl. the composer/ subtree); web/console/src/lib/live-runtime/ (the pure strip.ts + topology-adapter.ts logic); web/console/src/lib/protocol/topology.ts + the TopologyNamespace on HarborClient; web/console/src/lib/db/saved_filters_live_runtime.ts (the typed wrapper over the shipped saved_filters table — NO new table); web/console/tests/live-runtime-page.spec.ts (the per-page Playwright spec); test/integration/live_runtime_page_test.go (the Go-side §17.1 integration test).

Decision. The Console Live Runtime page is the operator's present-tense execution workbench — the topology canvas as the centrepiece, a header status-counter strip, a tab strip (Topology / Timeline / Metrics / Health), a bottom-dock Event Stream + per-task detail / composer, and a session detail rail. Four calls land here.

1. Mostly composition-only — ONE net-new Protocol addition, and it is not a method. Phase 73b is overwhelmingly a UI consumer of already-shipped surface: topology.snapshot (Phase 74 / D-114), events.subscribe SSE (Phase 60 / 72), tasks.get + state.history (Phase 73), and the Phase 54 task-control verbs. The single net-new Protocol addition is the tasks.list status-counter-strip aggregateTasksListStatusCounterStrip (a five-count pending / running / completed / paused / failed struct), opt-in via the new TaskListRequest.IncludeStatusCounterStrip flag, carried on TaskListResponse.StatusCounterStrip. It is a single-source CanonicalWireTypes extension; internal/protocol/methods/methods.go is unchanged — no new method name. The strip is computed server-side over the FULL identity-scoped task set (not the filtered view — the header strip is session-wide present-tense posture) and is identity-scoped: a second session never sees the first's counts.

2. The events.subscribe run filter is composition-only — NO new filter type. The phase plan sketched an events.subscribe RunID filter field as a [wave-13-extends] addition. The shipped internal/protocol/types.EventFilter ALREADY carries RunIDs []string, and events.FilterFromWire ALREADY maps it onto events.Filter.Run (the D-082 X-Harbor-Run carrier's structured counterpart). Minting a parallel scalar RunID field would be the §13 "no parallel implementations" violation. Phase 73b therefore ships NO new events wire type — the bottom-dock Trace tab narrows its subscription via the already-shipped run carrier. This is a documented departure from the plan's acceptance criterion wording; the run-scoped filter still ships and is exercised end-to-end (the integration test's trace_tab_run_scoped_filter arm + the internal/events TestFilter_RunScoped_TraceTab unit test).

3. The topology canvas REUSES the shared <EngineGraphCanvas> — it is not forked. The Topology tab's primary view is the Phase 73i shared engine-graph canvas (components/graph/), fed through a thin <TopologyCanvas> adapter that maps a Protocol TopologyProjection onto the canvas's typed GraphInput contract (the pure mapping is $lib/live-runtime/topology-adapter.ts, Vitest-tested). The Timeline tab is a sibling projection of the SAME topology.snapshot data laid out as swimlanes — no parallel topology store.

4. The composer is NOT the chat module (D-091). The bottom-dock Start / Redirect / Inject context / User message / Cancel / Pause / Resume composer is built with non-chat Skeleton primitives (composer/run-composer.svelte) calling the shipped Phase 54 control verbs through the typed Protocol client directly. The canonical chat module's V1 first consumer is 73n Playground; a second in-V1 consumer would force extraction to web/shared/chat/ (out of V1 scope per CLAUDE.md §4.5 #11's "encapsulate first, extract on second consumer" rule). The page proves no import from $lib/chat/. There is NO session-level priority field anywhere on the page (D-065 — task-level priority via the shipped prioritize method is the only V1 priority surface, exposed on the per-task detail pane).

Why. Closes the Phase 73b acceptance criteria + the Wave 13 decomposition row 73b. The page is the §13 primitive-with-consumer discharge for the status-counter-strip aggregate (it lands in the same PR as its only consumer) and validates the Phase 74 topology projection + the Phase 60 run-scoped SSE filter against a real call site.

Findings I'm departing from. Two documented deviations. (a) The plan's acceptance criterion "events.subscribe gains a RunID filter field" — the shipped EventFilter.RunIDs + FilterFromWire ALREADY provide the run-scoped filter; minting a parallel field is a §13 violation, so Phase 73b is composition-only on the events surface (decision call 2 above). (b) The plan (authored before D-121 landed) lists the route at web/console/src/routes/console/live-runtime/; CONVENTIONS.md §1 (D-121) is the binding cross-cutting authority and pins the (console) route group with NO /console/ URL prefix — the page ships at web/console/src/routes/(console)/live-runtime/ accordingly (CLAUDE.md §15 — a plan that contradicts a higher-priority artifact yields to it).

Protocol additions. One wire-type extension: TasksListStatusCounterStrip (struct) + TaskListRequest.IncludeStatusCounterStrip (opt-in field) + TaskListResponse.StatusCounterStrip (field), all in internal/protocol/types/tasks.go, registered in singlesource.CanonicalWireTypes. No new Protocol method names, no new error codes.


D-127 — Phase 73a Console Overview page: composition-only UI over shipped runtime.counters / runtime.health / pause.list / events.subscribe / Phase 54 approve/reject; no new Protocol method

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: web/console/src/routes/(console)/overview/ (the page route at /overview — no /console/ URL prefix, per CONVENTIONS.md §1 / D-121; the / redirect target); web/console/src/lib/components/overview/ (the page components — counter card + sparkline, health-chip strip, cost-rollup card, intervention queue, recent-activity feed, Quick Links grid, + New menu, footer); web/console/src/lib/overview/ (the pure aggregations.ts / activity.ts / cost.ts projection logic); web/console/src/lib/protocol/posture.ts + pause.ts (the runtime.* / pause.* wire types) + the RuntimeNamespace / PauseNamespace on HarborClient; web/console/src/lib/db/saved_filters_overview.ts (the typed wrapper over the shipped saved_filters table — NO new table); web/console/tests/overview-page.spec.ts (the per-page Playwright spec).

Decision. The Console Overview page is the operator's at-a-glance hub — the default route on a fresh attach. It composes the 4-card counter row (Events/min, Tasks Running, Background Jobs, MCP Connections), the sub-header health-chip strip, the cost-rollup card, the intervention queue, the recent-activity feed, the 2×3 Quick Links grid, and the + New quick-create menu, all inside the shared app shell. Five calls land here.

1. Composition-only — NO new Protocol method, NO new Go-side surface. Phase 73a is a pure UI consumer of already-shipped surface: runtime.counters + runtime.health (Phase 72f / D-111, posture methods routed through the control transport at POST /v1/control/runtime.{counters,health}), pause.list (Phase 72e / D-110 at POST /v1/pause/list), events.subscribe SSE (Phase 60 / 72), and the Phase 54 approve / reject control verbs. internal/protocol/methods/methods.go, internal/protocol/types/, and internal/protocol/singlesource/ are unchanged — there is no new Go code in internal/. The TS wire types posture.ts / pause.ts mirror the already-shipped internal/protocol/types/{posture,pause}.go field-for-field; the new RuntimeNamespace / PauseNamespace join the unified HarborClient (CONVENTIONS.md §6 — a new page surface adds a namespace, never a new top-level client).

2. The counter sparklines + recent-activity feed + cost rollup are folded CLIENT-SIDE off the events.subscribe cursor. Per page-overview.md §12 these are [shipped] subscription-derived surfaces — no new Protocol method. aggregations.ts buckets the SSE Event[] into a windowed per-minute rate series (1m / 5m / 15m windows; per-minute, not per-second, per the mockup); activity.ts projects the operator-relevant event subset into the feed; cost.ts folds llm.cost.recorded events into a per-agent (default) / per-tenant (admin) rollup. All three are pure, Vitest-tested, and drop a malformed / un-buckable event rather than mis-counting it (CLAUDE.md §13 — fail loudly, never silently mis-read).

3. The intervention queue's Approve / Reject invoke the SHIPPED Phase 54 control verbs — no parallel implementation. The queue composes the pause.list snapshot; each row's Approve / Reject calls client.control.approve / client.control.reject (the shipped Phase 54 verbs) against the paused run. There is NO pause.* mutating method — pause.list is read-only. The verbs are control-plane (D-066): without the admin scope claim the buttons render disabled-with-tooltip, never hidden into a fake-success state (CONVENTIONS.md §5 — no stubbed action presented as done); the runtime re-checks server-side regardless (CodeScopeMismatch).

4. The Quick Links grid is exactly six tiles — no Evaluations. Sessions / Tasks / Background Jobs / Agents / Tools / Settings, each an unprefixed Console route (CONVENTIONS.md §1). There is NO Evaluations tile — D-064 pins Evaluations as post-V1. The + New menu is a Console-local navigation surface only: each item deep-links into the create flow owned by that page's phase plan; the Overview provides the menu, not the flows.

Why. Closes the Phase 73a acceptance criteria + the Wave 13 decomposition row 73a. The page is the §13 primitive-with-consumer discharge for the Stage-1 posture + pause-snapshot surfaces (it is the first UI consumer of runtime.counters / runtime.health / pause.list from a Console page), built entirely on the D-121 CONVENTIONS.md foundation: the four-state <PageState> async contract (with nested PageState per panel — the health strip and the intervention queue each get their own), the shared components/ui/ inventory, the unified HarborClient + connection.ts, Console-DB-backed SavedViewChips, and tokens.css with no raw literals.

Findings I'm departing from. None on design. One path-resolution note: the phase plan's "Files added or changed" block (authored before D-121 landed) lists the route at web/console/src/routes/overview/+page.svelte; CONVENTIONS.md §1 (D-121) is the binding cross-cutting authority and pins the (console) route group with NO /console/ URL prefix — the page ships at web/console/src/routes/(console)/overview/ accordingly (CLAUDE.md §15 — a plan that contradicts a higher-priority artifact yields to it). The plan's web/console/src/routes/overview/+page.svelte reference and its smoke-script /console/overview route are corrected to the unprefixed (console)-group form.

Protocol additions. None. Phase 73a mints no Protocol method, no wire type in internal/protocol/types/, no error code. The TS-side posture.ts / pause.ts are client-side mirrors of already-shipped Go wire types; the RuntimeNamespace / PauseNamespace are client-side namespace additions to HarborClient.


D-128 — Phase 73h Console Background Jobs page: tasks.list filter/row-shape extensions, Console-side orphan detector, no bulk control endpoint

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/types/tasks.go (the TaskFilter.GroupID + TaskFilter.HasPendingApproval filter extensions + the TaskRow.Progress / TaskRow.Tags / TaskRow.LastActivityAt / TaskRow.IsBackground / TaskRow.HasPendingApproval row-shape enrichments); internal/tasks/protocol/list.go (the GroupID + HasPendingApproval server-side facet filtering); internal/tasks/protocol/registry_projector.go (the IsBackground / LastActivityAt projection + the ListGroups task→group reverse-index that populates TaskRow.GroupID); internal/tasks/list_filter.go (the new pure ListFilterFromWire wire→runtime TaskFilter translator); web/console/src/routes/(console)/background-jobs/ (the page route at /background-jobs — no /console/ URL prefix, per CONVENTIONS.md §1 / D-121); web/console/src/lib/components/background-jobs/ (the page components — QueueTable, BulkToolbar, OrphanBadge, RightRail, SavedFilterChips); web/console/src/lib/background-jobs/orphan-detector.ts (the pure Console-side detector); web/console/src/lib/db/saved_filters_background_jobs.ts (the typed wrapper over the shipped saved_filters table — NO new table); web/console/tests/background-jobs-page.spec.ts (the per-page Playwright spec); test/integration/background_jobs_page_test.go (the Go-side §17.1 integration test).

Decision. The Console Background Jobs page is the queue view for planner-spawned background tasks — a focused tasks.list projection with kinds=["background"], queue-shaped affordances, a per-job right rail, and a bulk-action toolbar. Three calls land here.

1. No new Protocol method — tasks.list is filter/row-shape-extended only. Phase 73d shipped tasks.list / tasks.get as the two tasks.* read methods. Phase 73h adds NO method name to internal/protocol/methods/methods.go. It extends the existing types.TaskFilter with two facets (GroupID for the per-job "Related Sessions" sibling drill-in; HasPendingApproval for the facet chip) and enriches types.TaskRow with five fields (Progress, Tags, LastActivityAt, IsBackground, HasPendingApproval). The background-job filter is the canonical plural Kinds []TaskKind slice from 73d set to ["background"] — never a type=background scalar. The RegistryProjector populates TaskRow.GroupID from a ListGroups task→group reverse-index (the registry's Task record carries no GroupID field; the projector resolves membership through the identity-scoped ListGroups read). All extensions are single-source types struct fields — no new CanonicalWireTypes registration is required (new fields on already-registered structs).

2. The AwaitTask orphan detector lives Console-side — no new Protocol field. A background job whose parent_task_id is non-empty and absent from the same tasks.list snapshot's id set is an orphan — a planner SpawnTask whose parent finished / was GC'd without joining via AwaitTask. The detector is a pure Console-side O(N) cross-check (detectOrphans(rows): Set<TaskID>) — it adds no Protocol field and issues no Protocol call. It surfaces, at the UI, the §13 binding that SpawnTask + AwaitTask MUST emit in the same phase (Phase 47 / D-056 closed this for ReAct); the page is the observability surface for that property, not a re-implementation of the runtime join. A runtime-side parent_alive boolean would be the obvious post-V1 lift if the per-render cost ever bites — but at V1 the Console-side cross-check needs no Protocol surface change.

3. The bulk-action toolbar invokes per-row Phase 54 control verbs — no bulk endpoint. The Cancel / Pause / Resume / Prioritize bulk actions invoke the SHIPPED Phase 54 control verbs (cancel / pause / resume / prioritize) ONCE PER selected row. A single-call bulk endpoint would be a §13 "no parallel implementations" violation. The toolbar gates on the operator's control scope claim (D-066 / D-079) and degrades to disabled-with-tooltip when the claim is missing (CONVENTIONS.md §5 — no stubbed action presented as done). Partial completion is rendered inline (per-row pass/fail), never a silent batch abort.

Why. Closes the Phase 73h acceptance criteria + the Wave 13 Stage 2.3 decomposition row 73h. The page is the §13 primitive-with-consumer discharge for the tasks.list filter/row-shape extensions (it lands in the same PR as its only consumer) and validates the orphan-detection property + the bulk-control degradation path against a real call site.

Findings I'm departing from. Two documented deviations, both yielding to a higher-priority artifact (CLAUDE.md §15). (a) The phase plan (authored before D-121 landed) lists the route at web/console/src/routes/background-jobs/ and serves it at /console/background-jobs; CONVENTIONS.md §1 (D-121) is the binding cross-cutting authority and pins the (console) route group with NO /console/ URL prefix — the page ships at web/console/src/routes/(console)/background-jobs/ served at /background-jobs. (b) The plan's "Files added" section (likewise pre-D-121) places page pieces under web/console/src/lib/pages/background-jobs/; CONVENTIONS.md §3 binds page components to components/<page>/ — the page components ship at components/background-jobs/ and the pure orphan-detector logic at lib/background-jobs/ (the Live Runtime page's lib/live-runtime/ precedent). (c) The plan pre-assigned D-114 to this phase; the dispatch reassigned D-128 to avoid a collision (D-114 is already taken by the Phase 74 topology decision) — this entry is D-128 and the glossary / plan references are reconciled to it.

Protocol additions. Filter extensions: TaskFilter.GroupID (string), TaskFilter.HasPendingApproval (*bool). Row-shape enrichments: TaskRow.Progress (*float64), TaskRow.Tags ([]string), TaskRow.LastActivityAt (time.Time), TaskRow.IsBackground (bool), TaskRow.HasPendingApproval (bool) — all in internal/protocol/types/tasks.go. No new Protocol method names, no new error codes, no new CanonicalWireTypes struct registrations.


D-129 — Phase 73m Console Settings page + the harbor console subcommand: one net-new auth.rotate_token method, the embedded-build subcommand, a TokenIssuer seam

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (the single new MethodAuthRotateToken constant + the IsAuthMethod predicate); internal/protocol/types/auth.go (the AuthRotateTokenRequest / AuthRotateTokenResponse wire types); internal/protocol/auth/rotate_token.go (the auth.TokenIssuer seam + the RotateSurface transport-agnostic handler); internal/protocol/transports/stream/auth_handler.go (the POST /v1/auth/{method} wire adapter); internal/protocol/transports/transports.go (the WithAuthSurface mux option); internal/protocol/singlesource/singlesource.go (the two CanonicalWireTypes entries + the CanonicalMethods entry); cmd/harbor/cmd_console.go (the harbor console subcommand); cmd/harbor/console_embed.go + cmd/harbor/consoledist/ (the embed.FS of the SvelteKit build); cmd/harbor/devauth.go (the dev signer's IssueToken — the V1 TokenIssuer); web/console/src/routes/(console)/settings/ (the page route at /settings, no /console/ URL prefix per CONVENTIONS.md §1 / D-121); web/console/src/lib/components/settings/ (the 12 section cards + the sub-nav rail + the mock-mode banner); web/console/src/lib/settings/ (the page state + Console-DB + saved-view controllers); web/console/src/lib/protocol/ (the PostureNamespace + AuthNamespace on HarborClient).

Decision. Phase 73m ships TWO bundled deliverables. Four calls land here.

1. The Settings page is a pure CONSUMER of the posture surfaces — exactly ONE net-new Protocol method. The 12-card Settings page composes 72f's runtime.info / runtime.drivers posture reads (D-111) and 72g's governance.posture / llm.posture reads (D-112), plus 72h's Console DB tables (runtime_registry / profiles / keybindings / notifications_routing / auth_profiles / pat_store). The ONLY net-new Protocol method is auth.rotate_tokeninternal/protocol/methods/methods.go's diff adds exactly one method name. 73m re-ships none of 72f / 72g / 72h's methods or tables.

2. auth.rotate_token is admin-gated and goes through a TokenIssuer seam. The method rotates the operator's Protocol-auth token: the Runtime re-mints a JWT for the caller's already-verified (tenant, user, session) identity, one-time-reveal. It requires the verified auth.ScopeAdmin claim (D-079 closed two-scope set — there is NO auth.admin scope); a request without it is rejected CodeIdentityScopeRequired (HTTP 403). Every successful rotation emits a redacted audit.admin_scope_used event. A Runtime does not in general mint its own tokens (a real deployment's tokens come from an external OIDC provider), so the re-mint goes through an auth.TokenIssuer §4.4 seam — the harbor dev / harbor console dev signer is the V1 implementation; a post-V1 release-engineering phase fits an RFC 8693 token-exchange issuer behind the same shape. When no TokenIssuer is wired the surface fails loudly — never a silent no-op.

3. harbor console is the embedded-build subcommand (D-091) — and is self-contained. harbor console bakes the static SvelteKit build into cmd/harbor via embed.FS and serves it at /. The Console build directory web/console/build/ is gitignored (CLAUDE.md §13); make console-build stages the bundle into the committed-.gitkeep-only cmd/harbor/consoledist/ directory the embed resolves, so a bare checkout still builds (harbor console then serves a synthesized "run make console-build" placeholder). harbor console boots the SAME embedded Runtime stack harbor dev boots (reusing bootDevStack) AND additionally mounts the Console assets — the result is a single, self-contained Console deployment that is already attached to a live Runtime; the operator can re-point it at any remote Runtime from the Settings page. The Console build is served ONLY by harbor console, NEVER by harbor dev (harbor dev --help advertises no console-serving flag — the binding D-091 rule, smoke-asserted). Zero-config harbor console (no harbor.yaml) boots an embedded in-memory + mock-LLM default and prints the §13 dev-mock banner.

4. The Settings page clears the CONVENTIONS.md §5 depth bar. The page routes under (console)/settings/, renders inside the app shell, composes the shared ui/ inventory (PageHeader / FilterBar / DataTable / DetailRail / RailCard / SavedViewChips / Pagination / ConnectionFooter / PageState), routes all async state through the four-state <PageState>, and talks to the Runtime only through HarborClient + connection.ts. The LIST_PAGES enum gains a 'settings' entry (an additive constant extension — no new Console DB table, no migration) so the section-bookmark SavedViewChips are Console-DB-backed. The Rotate token action either invokes the real auth.rotate_token method or renders disabled-with-tooltip when the connection lacks the admin scope claim — no stubbed action presented as done.

Why. Closes the Phase 73m acceptance criteria + the Wave 13 decomposition §12 item 9 (harbor console lock-in). The page is the §13 primitive-with-consumer discharge for auth.rotate_token (the method lands in the same PR as its only consumer) and for the harbor console subcommand (the Connected-Runtimes card is its first user-facing consumer).

Findings I'm departing from. Two documented deviations. (a) The phase plan (authored before D-079 was re-read) names the auth.rotate_token scope as console.admin; D-079's closed two-scope set is {admin, console:fleet} only — there is no console.admin scope, so the method gates on auth.ScopeAdmin (CLAUDE.md §15 — a plan that contradicts a higher-priority decision yields to it). (b) The plan sketches harbor console as a thin static-asset server; to give the e2e harness + the Connected-Runtimes card a live Protocol surface to attach to, harbor console reuses bootDevStack and serves the Protocol surface co-resident with the static build — D-091's "the Console can also run attached to a remote Runtime" stays true (the operator re-points it from Settings); the co-resident Runtime is the zero-config default, not a constraint, and the Console build is still served exclusively by harbor console.

Protocol additions. One method: auth.rotate_token (MethodAuthRotateToken). Two wire types: AuthRotateTokenRequest / AuthRotateTokenResponse in internal/protocol/types/auth.go, registered in singlesource.CanonicalWireTypes. No new error codes (auth.rotate_token reuses CodeIdentityScopeRequired / CodeIdentityRequired / CodeRuntimeError).


D-130 — Phase 73n Console Playground page: runs.set_overrides Protocol method + the shared chat module (first consumer) + the chat-bubble renderer extension

Date: 2026-05-20 Status: Settled (shipping with this PR)

Where it lives: internal/protocol/methods/methods.go (the runs.set_overrides method constant + the IsRunsMethod predicate); internal/protocol/types/runs.go (the RunOverrides / RunSetOverridesRequest / RunSetOverridesResponse wire types — single source, D-002); internal/runtime/runs/protocol/ (the runs.set_overrides Service + the in-process override Store); internal/protocol/transports/stream/runs_handler.go (the POST /v1/runs/set_overrides wire handler); internal/events/events.go (the runs.overrides_set audit event type + RunOverridesSetPayload); internal/protocol/singlesource/singlesource.go (the method + wire-type CanonicalWireTypes entries); internal/protocol/conformance/conformance.go (the matrix entry + the per-surface skip); cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (the WithRunsService wiring); web/console/src/lib/chat/ (the shared chat module — ChatPanel / ChatComposer / MessageBubble / the cards / the injected ChatProtocolClient interface); web/console/src/lib/chat/renderers/chat_bubble.ts (the chat-bubble renderer extension over the Phase 73l registry); web/console/src/routes/(console)/playground/ (the page route at /playground + the [session_id] deep-link route — no /console/ URL prefix, per CONVENTIONS.md §1 / D-121); web/console/src/lib/components/playground/ (the page components); web/console/src/lib/db/saved_filters_playground.ts (the typed wrapper over the shipped saved_filters table — NO new table); web/console/tests/playground-page.spec.ts (the per-page Playwright spec); test/integration/playground_overrides_test.go (the Go-side §17.1 integration test).

Decision. The Console Playground page is a real Harbor session workbench — a chat-style stream, a multimodal composer, a right rail of Controls / Pending Interventions / Recent Artifacts / Trace. Five calls land here.

1. One net-new Protocol method — runs.set_overrides. It records the reasoning-effort / temperature / max-tokens / system-prompt override the operator applies to the NEXT message in a session. The override is session-scoped (keyed by the (tenant, user, session) triple in an in-process Store) and one-shot — it is consumed by the next user_message / start and is never retroactive. Identity is mandatory; an override whose session_id names a session other than the caller's verified session is rejected with CodeScopeMismatch. The method routes through its own stream-package handler (POST /v1/runs/set_overrides) — IsRunsMethod is its own predicate; it is NOT a control method.

2. The shared chat module — encapsulate-first, first consumer (D-091, CLAUDE.md §4.5 #11). The chat module ships self-contained at web/console/src/lib/chat/: it imports NOTHING outside $lib/chat/ and depends on an injected ChatProtocolClient interface the Playground page adapts over the Console HarborClient. The Playground is the FIRST consumer; the future packed harbor dev UI is the second, at which point the git mv $lib/chat → web/shared/chat is mechanical. Every message round-trips through the SHIPPED Phase 54 user_message method — there is NO parallel chat protocol.

3. The chat-bubble renderers EXTEND the Phase 73l registry — they do not fork it. Phase 73l shipped the canonical renderer-registry dispatch core (renderers/index.ts) plus six MIME renderers. Phase 73n adds tool-call-trace / diff / artifact-reference renderers by calling registerRenderer from renderers/chat_bubble.ts — the dispatch core is open for registration, closed for modification. There is exactly one registry.

4. The "Run as identity" header selector consumes Phase 72b's IdentityScope (Brief 11 §PG-5, D-107). When the operator carries the auth.ScopeAdmin claim, the header renders a "Run as identity" dropdown; selecting a target populates IdentityScope.Impersonating on the next user_message / start. Non-admin operators do not see the selector (rendered absent, not disabled). This lands the consumer alongside the 72b primitive (§13 primitive-with-consumer).

5. Heavy chat content flows by reference (D-026). A chat bubble never carries inline heavy bytes — artifacts ride as ChatArtifactRef and resolve to a presigned URL via artifacts.get_ref; the renderer fetches from that URL.

Why. Closes the Phase 73n acceptance criteria + the Wave 13 decomposition row 73n. The page is the §13 primitive-with-consumer discharge for both runs.set_overrides (its only consumer lands in the same PR) and the shared chat module (the Playground is the module's first real call site).

Findings I'm departing from. Three documented deviations. (a) The plan (authored before D-121 landed) lists the route at web/console/src/routes/playground/[session_id]/; CONVENTIONS.md §1 (D-121) is the binding cross-cutting authority and pins the (console) route group with NO /console/ URL prefix — the page ships at web/console/src/routes/(console)/playground/ accordingly (CLAUDE.md §15). (b) The plan names Shiki / KaTeX / Mermaid as renderer dependencies; those heavy frontend dependencies are not in web/console/package.json, and adding them is an RFC change (CLAUDE.md §13 "no heavy frameworks"). Phase 73n ships the same safe-text V1 posture the Phase 73l MIME renderers already take (code.svelte / markdown.svelte render raw text without a highlighter); the renderer-registry seam means a highlighter slots in later without a chat-module reshape. (c) The plan lists three .ts renderer files (tool_call_trace.ts / diff_view.ts / artifact_reference.ts); the shipped Phase 73l registry uses .svelte renderer components, so 73n ships three .svelte renderers plus one chat_bubble.ts registration module — same dispatch contract, matching the shipped registry shape.

Protocol additions. One method (runs.set_overrides), three wire types (RunOverrides / RunSetOverridesRequest / RunSetOverridesResponse), one event type (runs.overrides_set + RunOverridesSetPayload). No new error codes — the handler reuses CodeIdentityRequired / CodeScopeMismatch / CodeInvalidRequest / CodeUnknownMethod.


D-131 — Phase 75a Wave 13 wave-end suite: the console-build build-path fix, the dev-only fixture seeder, and the page-coverage gate

Date: 2026-05-21 Status: Settled (shipping with this PR)

Where it lives: .github/workflows/ci.yml (the frontend-e2e job's console-buildmake build ordering + the wave13-coverage-check step); cmd/harbor/devseed.go (the dev-only runtime-entity fixture seeder); cmd/harbor/cmd_dev.go (the HARBOR_DEV_SEED_FIXTURES boot hook); cmd/harbor/console_default.yaml (memory.strategy: truncation); web/console/tests/wave13.spec.ts (the wave-end Playwright aggregator); web/console/tests/fixtures/harbor-runtime.ts (the harness sets HARBOR_DEV_SEED_FIXTURES=1 + HARBOR_DEV_ALLOW_MOCK=1); test/integration/wave13_test.go (the Go-side wave-end E2E); scripts/console/check-page-coverage.sh + the make wave13-coverage-check target (the page-coverage gate); scripts/smoke/phase-75a.sh.

Decision. Phase 75a is the Wave 13 (Console) closeout. Three things land.

1. The frontend-e2e CI job builds the Console bundle before the binary. harbor console (D-091) serves the Console via embed.FS of cmd/harbor/consoledist/, which is gitignored except .gitkeep — the bundle is a build artifact (§4.5 #9). make console-build runs npm ci && npm run build and stages web/console/build/cmd/harbor/consoledist/. Phase 73m shipped harbor console but make build does not depend on console-build, and the CI frontend-e2e job ran make build without ever running make console-build — so the binary embedded an empty consoledist/, harbor console served the index but the SvelteKit app never hydrated, and ~50 Playwright page specs failed. The fix orders the job: install Node → make console-build (builds + stages the real bundle) → make build (embeds it). This is a §17.6 cross-phase fix of a Phase 73m build-pipeline gap.

2. A dev-only runtime-entity fixture seeder. A fresh harbor console runtime boots empty, so the per-page Playwright specs SKIP every data-shaped assertion. cmd/harbor/devseed.go adds seedDevFixtures, gated behind the explicit HARBOR_DEV_SEED_FIXTURES=1 env var (the §13 dev-only-escape-hatch posture — never the default; a production runtime boots empty; the binary prints a stderr banner when the hatch fires). It seeds sessions, agents, tasks, artifacts, tools, flows (+ run-history), and memory turns under the dev-token identity. The embedded console_default.yaml memory.strategy flips none → truncation so seeded memory turns persist and the Console Memory page renders rows. The e2e harness sets the env var when it spawns harbor console; the 25 SEED_DEPENDENT per-page skips are un-skipped and pass for real.

3. The wave-end suite + the page-coverage gate. web/console/tests/wave13.spec.ts walks all 14 V1 Console pages (Evaluations excluded — D-064), asserts the IA navigation, the scope-claim degradation, and the cross-page identity gate. test/integration/wave13_test.go exercises the consolidated Wave 13 observability seam (the SSE wire transport) with real drivers via devstack.Assemble (D-094): a wire-type identity round-trip, cross-tenant isolation, a missing-identity (D-033) failure mode, and an N=12 concurrent-SSE-subscriber stress. scripts/console/check-page-coverage.sh (via make wave13-coverage-check, wired into frontend-e2e) asserts every docs/design/console/page-<slug>.md has a matching web/console/tests/<slug>-page.spec.ts — the operator §12-lock-in-#7 binding rule expressed as a mechanical gate.

Why. Closes the Wave 13 §17.5/§17.7-step-5 wave-end E2E and the red frontend-e2e CI step. The seeder is the §13 primitive-with-consumer discharge for the e2e harness: the harness's runtime-entity seam was a no-op stub; this lands a real consumer.

Findings I'm departing from. Two documented deviations. (a) The 75a plan's coverage script expects web/console/tests/<slug>.spec.ts; the per-page specs that actually shipped (73a-73n) use the <slug>-page.spec.ts suffix, so check-page-coverage.sh matches the real shipped naming. (b) Six per-page tests (Live Runtime tab content ×2, Playground chat module ×3, Events pause-stream toggle ×1) are NOT un-skipped: they render inside <PageState>, which renders children only when status === 'ready', and ready requires run-trajectory data (a non-empty topology.snapshot, a session chat history, an established SSE subscription) that is projected from a live planner/engine run — a larger fixture seam than registry entity seeding. They carry an explicit §17.6 deferral skip naming the distinct (non-seeding) blocker; tracked as a Phase 75a follow-up.

Protocol additions. None — Phase 75a ships no new Protocol method, error code, or wire type.


D-132 — Wave 13 §17.5 checkpoint: search / notifications / runtime-posture wired into the live binary; Agents control buttons disabled pending a registry.* Protocol surface; D-093 protocol-ts generator formally deferred post-Wave-13

Date: 2026-05-21 Status: Settled (shipping with the Wave 13 §17.5 checkpoint audit-fix PR)

Where it lives: internal/protocol/transports/transports.go (the new WithSearch mux option + the searchSurface mux-config field); internal/telemetry/metrics.go (the new MetricsRegistry.Snapshot + the OTel-SDK-free MetricSnapshot / CounterPoint shapes); internal/runtime/posture/posture.go (the shared CountersProvider / MetricsProvider posture-seam constructors); internal/search/scope.go (AdminScopeFromAuth now honours console:fleet); cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (the search-surface + notifications-Subscriber + MetricsRegistry-bridge + live Counters/Metrics wiring); cmd/harbor/devseed.go (the events-seeding step); docs/plans/README.md (the 72c/72d rows + the 73k Shipped flip); docs/plans/phase-72e-pause-list-snapshot.md (the conformance §17.6 deferral note); docs/plans/phase-75a-wave13-wave-end-suite.md (the live-planner-run trajectory §17.6 deferral block); docs/design/console/page-tools.md (the tools.invoke V1 deferral); web/console/src/lib/tokens.css, web/console/src/lib/protocol.ts, web/console/src/lib/protocol/artifacts.ts, and the Console Agents / Tools / Memory / Artifacts / MCP-Connections pages.

Decision. The Wave 13 §17.5 wave-end checkpoint audit found six FAIL-severity drift items and seventeen WARN-severity items. The audit-fix PR lands all of them. The load-bearing calls:

1. The search.*, notification.*, and runtime-posture surfaces are wired into the LIVE binary. Before this checkpoint, internal/protocol.NewSearchSurface, the notifications.Subscriber, and the posture Counters/Metrics seams existed but were referenced only by tests — the five search.* methods 404'd on harbor dev, the notification.* topic had no producer, and runtime.counters / metrics.snapshot returned empty stubs. The fix adds a transports.WithSearch mux option (threading the search dispatcher through control.WithSearchSurface), constructs + Run()s the notifications.Subscriber as a joined background goroutine, and constructs a telemetry.MetricsRegistry + BridgeBusToMetrics bridge — all in BOTH cmd/harbor/cmd_dev.go::bootDevStack AND harbortest/devstack.Assemble (the §17.6 source-of-truth invariant). The posture Counters/Metrics seams read live runtime state through the new shared internal/runtime/posture package; the runtime-posture integration test asserts against the PRODUCTION seam (a real spawned task → non-zero TasksRunning), never a fabricated one.

2. The Console Agents control buttons are disabled-with-tooltip pending a registry.* Protocol surface. The five fleet-control verbs (Pause / Drain / Restart / Force-Stop / Deregister) are exposed by the shipped registry.* IN-PROCESS Go API — there is NO Protocol method a Console client can call. The previous wiring set a controlFeedback string, a fake-success path (CLAUDE.md §13). ControlButtons.svelte now renders all five buttons disabled-with-tooltip REGARDLESS of scope claim. Re-enabling them is the job of a future fleet-control Protocol-surface phase that lands the registry.* methods AND flips the buttons live in the same wave (CLAUDE.md §13 primitive-with-consumer).

3. The D-093 cmd/harbor-gen-protocol-ts generator is formally deferred post-Wave-13. D-093 specified a generator that regenerates web/console/src/lib/protocol.ts from internal/protocol/singlesource.CanonicalWireTypes with a make protocol-ts-gen-check CI gate. The generator was never built — Phase 72h committed protocol.ts as a hand-shaped stub and later phases hand-extended it while the file carried a false // CODE GENERATED … DO NOT EDIT header. This checkpoint corrects the header to an accurate "hand-maintained — keep in lockstep with internal/protocol/singlesource.CanonicalWireTypes" notice; the per-page artifacts wire types move into web/console/src/lib/protocol/artifacts.ts mirroring every other page. Building the generator + the CI gate is tracked in issue #179. This is a documented amendment to D-093: the generator is a post-Wave-13 deliverable, not a Wave 13 one.

4. The tools.invoke Protocol method is a documented V1 deferral. The Console Tools page surfaces a disabled-with-tooltip "Try this tool" affordance naming the deferral rather than silently omitting it (CONVENTIONS.md §5). docs/design/console/page-tools.md §3 records the V1 deferral; the tools.invoke method + the live form land post-V1.

Why. Closes the Wave 13 §17.5 checkpoint punch list. The six FAILs were live-binary wiring gaps (the §17.6 "fix what the integration test finds" rule applied across phase boundaries — 72c / 72d / 72f surfaces shipped their seams but the boot path never connected them) plus a fake-success Console path and stale master-plan rows. The seventeen WARNs were Console hygiene (token-alias cleanup, component-name collisions, double footers, localStorage reads), a scope-predicate bug (search ignored console:fleet), and stale deferral references. The thirteen NITs are tracked in issue #180; the D-093 generator in #179; the live-planner-run trajectory e2e fixtures in #178.

Findings I'm departing from. None — this is a checkpoint audit-fix PR; it implements the punch list. The 72e plan's "conformance happy-path + malformed scenario in the same PR" line yielded to the §17.6 finding W9: the conformance matrix entry lands in lockstep, the scenario bodies defer to the Phase 80 conformance-Stack harness extension (the same architectural shape every other dedicated-surface method already takes in the suite).

Protocol additions. None — no new Protocol method, error code, or wire type. transports.WithSearch is a mux-wiring option over the already-shipped search.* methods; MetricsRegistry.Snapshot is a Go-internal read API.


D-133 — Phase 73 ("Console state inspection surface") was dissolved during Wave 13: consumed methods absorbed by the page phases, unconsumed methods deferred post-V1 per §13

Date: 2026-05-21 Status: Settled (shipping as the Wave 14 Stage-0 master-plan reconciliation)

Where it lives: docs/plans/README.md (the Phase 73 status-table row + detail block).

Decision. Phase 73 was scoped as a single "Console state inspection surface" phase bundling nine Protocol methods (sessions.inspect, tasks.get, state.history, state.list_trajectories, state.load_planner_checkpoint, artifacts.list, artifacts.get, artifacts.get_ref, artifacts.delete). It never landed as a standalone phase — there is no phase-73.md plan file and no phase-73 PR. During Wave 13 the surface was decomposed: each Console page phase that needed a slice landed that slice whole rather than depending on a separate Phase 73 PR. This was the correct application of CLAUDE.md §13 "no primitive without its consumer" — the methods shipped exactly when, and only when, a page consumed them. The decomposition was already recorded piecemeal in the Phase 73c / 72-cluster decisions entries ("Phase 73 has not shipped sessions.inspect, so Phase 73c lands it whole"); this entry consolidates it and reconciles the stale master-plan row.

What shipped (each absorbed by its consuming page phase, verified present in internal/protocol/methods/methods.go):

  • sessions.inspect — landed by Phase 73c (Console Sessions page).
  • tasks.get — landed by Phase 73d (Console Tasks page).
  • artifacts.list, artifacts.put, artifacts.get_ref — landed by Phase 73l (Console Artifacts page, D-120). Note artifacts.put was added by 73l and was not in the original Phase 73 list.

What did NOT ship, and why that is correct. state.history, state.list_trajectories, state.load_planner_checkpoint, artifacts.get, and artifacts.delete have no V1 Console consumer — no Wave 13 page needed them. Per §13, a primitive without a consumer must not land. They are therefore deferred post-V1: each lands additively in the same wave as the first Console surface that consumes it (a trajectory-inspector page, an artifact-detail/delete affordance, etc.). The runtime-side data they would project (StateStore history, trajectory records, the artifact store) is all already shipped; only the Protocol projection waits for a consumer.

Reconciliation. The docs/plans/README.md Phase 73 row flips from Pending to Shipped* with the asterisk resolved in the detail block: "dissolved — consumed methods absorbed by 73c/73d/73l; state.* + artifacts.get/artifacts.delete deferred post-V1 (no V1 consumer, §13)." This removes a stale Pending row that would otherwise read as an unshipped V1 phase blocking the V1 cut.

Why. CLAUDE.md §4.2 rule 11: "Stale Pending rows for shipped phases are a drift signal." Wave 14 is the V1-completion wave; an honest master plan is a precondition for the Phase 82 V1 cut. Leaving row 73 Pending would either falsely block the cut or force a misleading "all V1 phases shipped" claim.

Findings I'm departing from. None — this is a documentation reconciliation of an already-settled, already-executed decomposition. No code changes.

Protocol additions. None.


D-134 — Phase 76 cross-tenant isolation conformance harness: home, fast-vs-soak split, real-drivers-at-the-seam

Date: 2026-05-21 Status: Settled (shipping with Phase 76)

Where it lives: test/integration/isolation_conformance_test.go (the harness); .github/workflows/ci.yml (the isolation job); scripts/smoke/phase-76.sh; docs/plans/phase-76-cross-tenant-isolation-harness.md.

Decision. Phase 76 ships the master cross-tenant + cross-session isolation conformance harness — the V1 integrity gate (RFC §4.3). Three design calls are settled here.

1. The harness home is test/integration/, not a new test/conformance/ directory. AGENTS.md §3 is the binding repository layout; adding a top-level directory is an RFC change. AGENTS.md §17.2 already names test/integration/ as the canonical home for tests that span more than two subsystems — the harness spans six. It lives as a single _test.go file in package integration_test alongside the wave-end E2E suites it resembles. No new directory; no RFC churn.

2. The every-PR soak window is fast (~3 s); the master-plan 30 s soak is opt-in. The master plan specifies "100 sessions × random ops × 30 s under -race". A 30 s race-instrumented soak on every PR would dominate CI wall-clock. The split: the default window is isolationFastWindow (~3 s) — with 100 concurrent session-workers each running thousands of randomized op-cycles, a cross-scope leak surfaces with overwhelming probability inside it. The master-plan 30 s window is available via HARBOR_ISOLATION_SOAK=<go-duration>; testing.Short() (-short) forces the fast window regardless. Both windows drive the identical code path — only the soak duration changes; there is no "with-flag / without-flag" parallel implementation (AGENTS.md §13). The dedicated isolation CI job runs the fast window on every PR.

3. Real production drivers at every seam — no mocks (AGENTS.md §17.3 #1, §17.4). Every subsystem is opened through its production registry factory: state.Open, artifacts.Open, memory.Open, skills.OpenDriver, tasks.Open, events.Open. The harness drives the real V1 in-memory drivers for five subsystems; the SkillStore has a single V1 driver — localdb, SQLite-backed — which the harness runs against a :memory: DSN (the SQLite path is what operators ship; :memory: keeps the harness filesystem-free with identical isolation logic). A mock at the boundary would defeat the gate's purpose: the harness exists to prove the shipped drivers hold the isolation invariant under concurrent load against a single shared instance — the cross-subsystem composition of every subsystem's own D-025 + D-001 contracts.

Why. The per-subsystem conformance suites (internal/<subsystem>/conformancetest) each prove their own driver isolates correctly in isolation. They do NOT prove the six subsystems hold the invariant simultaneously — a shared-process race or a cross-subsystem identity-context bleed only surfaces when all six are hammered together under load. Phase 76 closes that gap with one gate that runs on every PR. A regression here is a security bug (master-plan Phase 76 "Risks"), so the gate is non-skippable and the harness fails loudly with a categorized breach report naming the subsystem and the expected-vs-observed identity.

Findings I'm departing from. None. The harness is a pure composition of patterns brief 05 (§"Concurrency tests", §"Cross-tenant isolation", §"Conformance test approach") and brief 06 (§124, §147) already established, plus the wave-end E2E shape from test/integration/wave*_test.go. It introduces no new design surface.

Protocol additions. None — Phase 76 ships no Protocol method, error code, REST endpoint, or wire type. It is a _test.go-only integration gate.


D-135 — Goroutine-leak conformance harness: one table-driven -race suite over every long-lived Runtime component

Date: 2026-05-21 Status: Settled (shipping with Phase 77)

Where it lives: test/integration/phase77_goroutine_leak_test.go (the harness); .github/workflows/ci.yml (the leak-harness job); scripts/smoke/phase-77.sh (static-only artefact smoke); docs/plans/phase-77-goroutine-leak-harness.md (the phase plan).

Decision. Phase 77 generalises the per-package goroutine-leak tests that Phases 10 / 12 / 13 / 50 / 52 each shipped individually into ONE table-driven conformance suite, TestE2E_Phase77_GoroutineLeakConformance. The load-bearing design calls:

1. The harness is table-driven — a future long-lived component is one new row. leakCases is a slice of {name, exercise} rows; each exercise closure constructs the real component with real drivers, drives a representative workload, and tears it down. The harness owns baseline capture, the bounded poll, and the assertion. V1 rows: runtime/engine.Engine, events/drivers/inmem.EventBus, events/drivers/durable.EventBus, sessions.Registry, tasks/drivers/inprocess.TaskRegistry — every long-lived component that starts goroutines and exposes Stop / Close / CloseRegistry. Components that are passive registries with no background goroutines (the pauseresume.Coordinator, the steering Registry / per-run Inbox, the per-run steering RunLoop) are deliberately NOT rows — they have no teardown seam to leak from; the Phase 50 dependency is satisfied by the pause primitive being exercised inside the Engine row's run lifecycle, not by a Coordinator Stop.

2. N cycles, not one. Each row runs leakCycles (12) construct → exercise → teardown iterations. A single cycle hides a slow leak (one stray goroutine per cycle); 12 cycles amplify a per-cycle leak to a delta of ≥ 12, far above the leakTolerance (4) absolute slack. A warm-up cycle runs before the baseline is captured so first-use lazy initialisation (driver registries, sync.Once globals) is not miscounted.

3. Bounded eventually-poll, never an instant snapshot. Go does not retire parked goroutines instantly; an instant runtime.NumGoroutine() check immediately after teardown is flaky (CLAUDE.md §17.4). The harness reuses the established bounded-poll pattern — a deadline plus a 10 ms interval plus runtime.Gosched — with the small absolute leakTolerance absorbing the test runner's own background goroutines. The harness does NOT call t.Parallel: NumGoroutine is process-global and a parallel sibling test would pollute the count.

4. CI runs it on every PR. A dedicated leak-harness job in .github/workflows/ci.yml runs the suite under -race on every PR. The job is isolated so a failure names the harness directly; the suite also runs inside the go job's make test.

Why. RFC §5 Go conventions require "goroutines started by long-lived components must be cancellable by a ctx and joined on shutdown"; RFC §3.5 guarantee #4 requires "no goroutine leaks — each invocation's goroutines are joined before the invocation returns". CLAUDE.md §11 made per-component leak tests mandatory but nothing asserted the contract across the whole component surface at once — a new long-lived component could ship without a leak test and nobody would notice. Phase 77 closes that gap with a single conformance gate that a future component opts INTO by adding a table row.

Findings I'm departing from. None. brief 01 (core runtime / streaming leak) and brief 05 (long-lived sweepers, background task goroutines) both flag the leak sources the harness pins; the harness follows them.

Leaks found. None — all five V1 component rows pass the conformance suite under -race on first run (make test and the dedicated leak-harness job). Had a component leaked, CLAUDE.md §17.6 (fix-where-you-find-it) would have required the fix in the Phase 77 PR.

Protocol additions. None — Phase 77 ships a test harness; no Protocol method, error code, wire type, REST endpoint, or CLI subcommand.


D-136 — Phase 79 performance benchmarks: the benchmark suite, the benchstat regression gate, and committed baselines

Date: 2026-05-21 Status: Settled (shipping with this PR)

Where it lives: test/benchmarks/ (the Benchmark* suite — engine_bench_test.go, bus_bench_test.go, memory_bench_test.go, doc.go); docs/perf/baseline.txt (the committed baseline numbers); scripts/perf/check-regression.sh (the regression gate); Makefile (the bench + bench-check targets); .github/workflows/ci.yml (the additive perf-regression job); docs/plans/phase-79-performance-benchmarks.md.

Decision. Phase 79 ships a go test -bench suite over Harbor's three hottest runtime seams plus a perf-regression gate. Four load-bearing calls.

1. The benchmark suite calls REAL components — no mocks (CLAUDE.md §13). BenchmarkEngineThroughput drives N concurrent runs (1 / 8 / 32) against a single shared engine.Engine — the D-025 concurrent-reuse shape — and reports a custom envelopes/sec metric; a companion BenchmarkEngineStreamingThroughput exercises the Phase 12 per-run capacity-waiter EmitChunk path (brief 01 §"Backpressure inside streaming"). BenchmarkBusFanOut sweeps subscriber counts {1, 8, 16} (capped at the default MaxSubscribersPerSession per brief 06 §"Filter expressions") against the real inmem EventBus driver wired with a real audit redactor, confirming brief 06 §"Fan-out"'s O(1)-publish claim empirically. BenchmarkMemoryStrategy covers truncation vs rolling_summary AddTurn latency against real inmem StateStore + EventBus drivers. The suite is itself the cross-subsystem integration exercise the §17 obligation requires — real drivers on every seam, identity propagated through every layer.

2. The regression gate uses benchstat confidence intervals + a noise-tolerant 30% threshold — so shared-CI-runner noise does not flake it. scripts/perf/check-regression.sh runs the suite with -count=6 (giving benchstat a sample to compute variance from), compares against the base run (point 4) via golang.org/x/perf/cmd/benchstat in CSV mode, and fails the build only on a delta that is both statistically significant (p < 0.05benchstat's ~ verdict is always a pass) and past the threshold. The master-plan acceptance criterion gives "> 10% slowdown blocks" as an illustrative example ("e.g."); the gate's default threshold is 30%, an empirical calibration. Measured during this phase's development, a self-comparison (same baseline, fresh run) on a contended developer machine produced apparent deltas of +80-100% on the bus benchmark from CPU contention alone — Go microbenchmarks of the concurrent engine/bus paths legitimately swing ±20-30% run-to-run. A literal 10% gate would flake on every PR; 30% stays above genuine jitter while still catching the regression class that matters — a refactor that halves throughput (-50%) or doubles latency (+100%). The full human-readable benchstat report is always printed so a reviewer can eyeball any sub-threshold drift the gate intentionally lets pass. The threshold is overridable via PERF_THRESHOLD. This is the master plan's own "design the gate to tolerate noise" directive, honoured honestly rather than shipping a gate that flakes.

3. benchstat is a dev/CI-only tool — it never enters the runtime binary's go.mod production require surface. The gate invokes benchstat via go run golang.org/x/perf/cmd/benchstat@<pinned-version>; the version is pinned in the script. This keeps the harbor binary's dependency surface untouched (CLAUDE.md §13 — heavy frameworks need an RFC, but a CI-only benchmarking tool invoked via go run is the reasonable, dependency-surface-neutral call). go run of a @version-pinned tool resolves into the build cache, not the module's require block.

4. CI compares base-vs-PR on the SAME runner — not against the committed baseline. Go embeds GOMAXPROCS in benchmark names (BenchmarkFoo-10 on a 10-core machine, BenchmarkFoo-4 on a 4-core runner); benchstat pairs rows by name, so a baseline file generated on different-core-count hardware cannot be paired against a CI run at all — the comparison yields zero rows, not a regression verdict. The perf-regression CI job therefore runs the suite twice on the one runner — once on the PR commit, once on the PR's base commit — and benchstat-compares those two same-hardware runs. scripts/perf/check-regression.sh takes the base run via PERF_BASE_FILE and the (pre-generated) PR run via PERF_PR_FILE; the CI job sets both. When the base commit has no test/benchmarks/ directory (the PR that introduces the suite — i.e. this one), there is no prior baseline to regress against and the comparison step is honestly skipped. The committed docs/perf/baseline.txt is retained as the local-dev reference: make bench-check with no env override compares a fresh run against it, valid on the machine that generated it (make bench > docs/perf/baseline.txt). The baseline is refreshed deliberately by a human in a reviewed PR — never auto-rewritten, since a silent auto-refresh would erase the very regression the gate exists to catch.

Why. Closes the Phase 79 master-plan acceptance loop ("Baseline numbers committed; perf regression threshold gates PRs"). The three benchmarked seams (engine §6.1, bus §6.13, memory §6.6) are the runtime's hot paths; a regression gate on them catches a whole class of "a refactor quietly halved throughput" drift that unit tests and coverage gates miss.

Findings I'm departing from. The master-plan acceptance line "perf regression threshold gates PRs (e.g. > 10% slowdown blocks)" — the gate ships with a 30% default threshold, not 10%. This is not a design departure but a calibration of the master plan's own explicitly-illustrative "e.g." figure against measured reality (see point 2). The master plan's binding requirement is "perf regression threshold gates PRs" and "design the gate to tolerate noise"; a 10% gate cannot satisfy the second clause. The 30% default is documented in the phase plan §Risks and is overridable via PERF_THRESHOLD.

Protocol additions. None — Phase 79 ships no Protocol method, error code, or wire type, and touches no production code. It adds a benchmark suite, a CI gate, and two Makefile targets.


D-137 — Phase 78 chaos / fault-injection harness: the harness home, the fault-injection-via-decorator approach, and the five failure modes

Date: 2026-05-21 Status: Settled (shipping with this PR)

Where it lives: test/integration/phase78_chaos_fault_injection_test.go (the harness — the table-driven TestE2E_Phase78_ChaosFaultInjection); test/integration/phase78_faults_test.go (the fault-injecting decorators); scripts/smoke/phase-78.sh (the static-only smoke); .github/workflows/ci.yml (the additive chaos job); docs/plans/phase-78-chaos-fault-injection-harness.md.

Decision. Phase 78 ships the master chaos / fault-injection harness — a table-driven -race integration suite that injects each of the five master-plan-named failure modes against the real Runtime components and asserts every fault produces its documented loud error / event AND the documented recovery path. Three load-bearing calls.

1. The harness lives in test/integration/, not a new top-level directory. The master plan says the harness is "used in integration tests; not on hot path" — it is test-scoped code, not production runtime code. Phases 76 (cross-tenant isolation, D-134) and 77 (goroutine-leak, D-135) each established the test/integration/ home for a master conformance harness; Phase 78 follows that precedent exactly. A new top-level directory would need an RFC change (CLAUDE.md §3); none is warranted — a chaos harness is an integration test, and test/integration/ is its canonical home. The harness is one *_test.go file plus a fault-injecting-decorator helper file in the same integration_test package.

2. Faults are injected by THIN DECORATORS over the real production components — this is the §17.3 "real drivers at the seam" pattern with a fault overlay, NOT the §13 "test stub as production default" anti-pattern. The harness opens every component through its production registry factory / constructor (events.Open, state.Open, engine.New, pauseresume.New, retry.Wrap). Where a fault must be induced, the harness wraps the real component in a thin decorator (faultyStateStore decorates a real state.StateStore; the kill-mid-run row uses a blocking node closure; the provider-quirk row wraps a quirkLLMDriver in the real retry.Wrap retry-with-feedback layer). The decorators DECORATE — they delegate every non-faulting call verbatim to the real driver — they never re-implement subsystem behaviour, and the fault auto-clears after a bounded number of calls so a single row can assert both the loud-failure half AND the recovery half. Crucially, the decorators live in *_test.go files in the integration_test package: they are never registered with a driver registry, never a DefaultDriver, never reachable by the harbor binary — the runtime resolves only real drivers at boot. CLAUDE.md §13's "test stub as production default" forbids a stub being the only shipped implementation an operator's binary resolves; it does not forbid a test-tree decorator that wraps a real driver to induce a controlled failure for an integration test. That is exactly what §17.3 #3 ("≥1 failure mode") asks for. The dispatch prompt's framing is adopted verbatim: a decorator that wraps (not replaces) a real driver, lives in the test tree, and is never a registry default is §13-compliant.

3. The five failure modes are the master plan's, each asserting the documented event/error AND the documented recovery path (CLAUDE.md §13 — no silent degradation). (a) Kill mid-run — a run is held in-flight by a blocking node, then cancelled; the row asserts the engine's RunCancelledHandler seam fires (the production wiring publishes runtime.run_cancelled from this notice), FetchByRun observes ErrRunCancelled, and Engine.Stop tears down cleanly within a bounded deadline with no goroutine leak. (b) Drop messages — a tiny-buffered subscription is saturated so the inmem bus's drop-oldest backpressure fires; the row asserts the typed bus.dropped event is delivered carrying a non-empty dropped sequence range. The notice is windowed (DropWindow), so the row publishes a burst, lets the window elapse, then publishes exactly ONE trigger event — one trigger, not a second burst, keeps the just-landed notice from being displaced back out of the small buffer. (c) Provider quirks — a quirkLLMDriver returns malformed output; wrapped in the real retry.Wrap with a rejecting Validator, the row asserts the llm.retry_with_feedback event fires AND the call exhausts loudly with llm.ErrRetryExhausted when the quirk persists, plus a recovery sub-case where one bad response then a good one succeeds after one retry. (d) StateStore disconnect — the faultyStateStore decorator returns a transport error; the row asserts the error surfaces loudly out of Save/Load (never silently swallowed) and the reconnect recovery path works once the fault budget clears. (e) Pause-deserialize failure — a PauseRequest whose trajectory's LLMContext carries a live channel fails Coordinator.Request loud with trajectory.ErrUnserializable naming a non-empty field path (the D-069 / RFC §3.4 fail-loud contract — never a half-persisted checkpoint, never (nil, nil)), plus a recovery sub-case where a clean trajectory Requests + Resumes successfully. A dedicated chaos CI job runs the suite under -race on every PR.

Why. Closes the Phase 78 master-plan acceptance loop ("Each failure mode produces the documented event + recovery path"). Phases 76 / 77 prove the runtime holds a happy-path invariant under stress; Phase 78 is the complementary gate — it proves the runtime behaves correctly UNDER FAILURE, surfacing every fault loudly (never a silent degradation — CLAUDE.md §13) and recovering on the documented path. A chaos harness catches the class of resilience regression — a refactor that swallows a StateStore error, drops a cancellation event, or silently returns a malformed LLM response — that unit tests and the isolation / leak harnesses miss.

Findings I'm departing from. None.

Protocol additions. None — Phase 78 ships no Protocol method, error code, or wire type, and touches no production code. It adds an integration-test harness, a CI job, and a smoke script.


D-138 — Phase 80 documentation-hygiene polish: the enforced revive lint gate, the worked examples, and the recipe docs

Date: 2026-05-21 Status: Settled (shipping with this PR)

Where it lives: .golangci-revive.yml (the dedicated revive-only lint config); .golangci.yml (the revive exported rule gains disableStutteringCheck; an A2A var-naming exclude); Makefile (new lint-revive target); .github/workflows/ci.yml (the lint job now installs golangci-lint and runs make lint-revive; a new examples job); examples/agents/echo/ + examples/tools/weather/ + examples/README.md (worked examples); docs/recipes/ (recipe how-to docs); scripts/smoke/phase-80.sh (the static-only smoke); docs/plans/phase-80-documentation-hygiene-polish.md.

Decision. Phase 80 closes the documentation-hygiene loop the master plan names: every package has a doc comment, every exported symbol has godoc, the revive lint rules that enforce that are actually run in CI, and examples/ plus docs/recipes/ give a reader runnable, real-API-grounded entry points. Four load-bearing calls.

1. The lint gate is revive only, run via a dedicated .golangci-revive.yml, NOT the full make lint. The master plan's Phase 80 acceptance is worded precisely — "golangci-lint's revive exported and package-comments clean" — not "all linters clean". An audit found the repo-wide make lint carries ~1000 pre-existing issues across ~20 linters (govet fieldalignment, errcheck, gofmt, errorlint, …). That backlog accumulated because the CI lint job never actually ran the linter: it called make lint, whose command -v golangci-lint guard silently skipped because CI never installed the binary. Phase 80 fixes the silent skip (CI now installs golangci-lint v1.64.8) but scopes the enforced gate to revive — the documentation linter the phase mandate names. Clearing the broader backlog is a separate release-hardening effort (several errcheck fixes change error-handling behaviour, out of scope for a docs phase). The gate runs via a committed .golangci-revive.yml rather than a command-line --enable-only revive flag because --enable-only was found to bypass issues.exclude-rules processing in golangci-lint v1.64; a dedicated config file is the reliable way to run exactly one linter with its settings and excludes honoured. make lint (all linters) is kept as-is for local use and future hardening.

2. revive's exported rule keeps godoc enforcement but disables the stutter naming sub-check (disableStutteringCheck). revive's exported rule does two things: (a) flag any exported identifier missing a godoc comment — the actual Phase 80 mandate — and (b) flag exported type names that "stutter" against their package (state.StateStore, react.ReActPlanner, …). The repo had ~20 stutter hits and ZERO missing-godoc-on-types hits: the codebase already documents its exported surface well. Acting on the stutter sub-check would mean renaming ~20 exported types across package boundaries — explicitly out of scope for a documentation phase ("do not rename code") and a wide API-churn risk. disableStutteringCheck switches OFF only the naming-opinion sub-check; the godoc-presence enforcement (the binding mandate) stays fully on. The genuine documentation gaps the rule DID surface — a handful of exported const/var blocks missing a block comment, two malformed package comments, eight detached package comments — were all fixed in this PR.

3. The worked examples are buildable Go, not just config. examples/ previously held only two annotated YAML configs. Phase 80 adds examples/agents/echo/ (a worked harbortest.Agent + test — the same shape harbor scaffold produces) and examples/tools/weather/ (a worked inproc.RegisterFunc in-process tool + a register→resolve→invoke test). Both build and their tests pass under -race; a new CI examples job runs go build ./examples/... + go test -race ./examples/..., so a drift in a public surface (harbortest, the tool catalog) that breaks an example fails the build. The examples deliberately have trivial behaviour (echo input, return canned weather) — the value is the SHAPE a reader copies, not the data.

4. Recipe docs live under docs/recipes/, grounded in real current APIs. docs/recipes/ is a new subdirectory of the already-permitted docs/ tree (CLAUDE.md §3) — no new top-level directory, no RFC change. It ships five task-oriented how-to guides (scaffold an agent, define a tool, configure a planner, run harbor dev, test an agent). Every recipe references only symbols and flags that exist in the tree at this phase (the harbor dev flag set, inproc.RegisterFunc, the planner config block, the harbortest surface) — a recipe that cited a non-existent symbol would be worse than no recipe.

Why. Phase 80 is the documentation-hygiene gate for the V1 cut. The silent-skip discovery is the load-bearing finding: a lint gate that never runs is not a gate. Fixing the skip and scoping the enforced rule set to the phase's stated mandate gives Harbor a real, enforced godoc/package-comment gate without conflating it with a much larger pre-existing lint backlog. The worked examples and recipes give a first-time reader a runnable, real-API on-ramp that CI keeps honest.

Findings I'm departing from. None.

Protocol additions. None — Phase 80 ships no Protocol method, error code, or wire type. It changes no runtime behaviour: the production-code edits are documentation comments and whitespace only.


Date: 2026-05-21 Status: Settled (shipping with this PR)

Where it lives: cmd/harbor/root.go (HarborVersion becomes a var); cmd/harbor/cmd_version.go (header doc — the product-vs-Protocol distinction); CHANGELOG.md (the Keep-a-Changelog changelog); scripts/release-build.sh + scripts/release-dryrun.sh (the release tooling); Makefile (release-build / release-dryrun targets); .github/workflows/release.yml (the v*-tag release workflow); scripts/smoke/phase-81.sh (the static-only smoke); docs/plans/phase-81-release-engineering.md.

Decision. Phase 81 ships Harbor's release engineering — the tooling that turns a pushed v* git tag into a published release artifact, ahead of the Phase 82 v1.0.0 cut. Five load-bearing calls.

1. The product release version is stamped at link time via -ldflags -X 'main.HarborVersion=…'; HarborVersion changes from a const to a var. A Go const cannot be overridden by -ldflags -X — the linker can only rewrite a package-level string var. Phase 63 pinned HarborVersion as a const "v0.0.0-dev" and explicitly anticipated "a later release-engineering phase injects a real semver via -ldflags"; Phase 81 is that phase. The conversion is a one-symbol change — same name, same type, same default value — so the existing cmd_version_test.go (which reads the symbol, never assumes const-ness) passes unchanged. An un-stamped build (go build, go run, go test, a plain make build) keeps the v0.0.0-dev default — the load-bearing operator signal that "this is not a release artifact", the same fail-loudly sentinel discipline buildHash() already follows (CLAUDE.md §5). The version is derived from git describe --tags (or HARBOR_RELEASE_VERSION when the release workflow sets it from the pushed tag ref).

2. The product release version is STRICTLY DISTINCT from the Harbor Protocol version. HarborVersion is the binary's own product semver; internal/protocol/types.ProtocolVersion (RFC §5.3, D-077) is the Runtime↔Console wire-contract version. They are different things versioned independently: a Runtime refactor that bumps the release version need not bump the Protocol version, and a Protocol-surface addition need not bump the release version. harbor version already prints both as separate labelled fields (harbor / protocol, D-084); Phase 81 only makes the harbor field carry a real release value. The two are NOT conflated anywhere — scripts/release-build.sh stamps only main.HarborVersion, never the Protocol constant (whose bump is an RFC change). The cmd_version.go and CHANGELOG.md headers document the distinction so a future contributor does not collapse them.

3. The CHANGELOG follows Keep-a-Changelog and lives at the repo root. CHANGELOG.md is the conventional discoverable home for a release history; the Keep-a-Changelog format (an [Unreleased] section, ### Added / ### Changed / … subsections, version-link references) is the widely-understood default and needs no tooling. Content is grouped by delivery wave / subsystem (foundations, events/state/sessions, runtime engine, persistence, tools, LLM, skills, planner, steering, observability, Protocol, CLI, Console, release hardening) rather than as a flat 90-entry phase list — the wave grouping is how Harbor was actually built (CLAUDE.md §17.7) and how a reader best understands the V1 surface. Every V1 phase (01–81 plus the lettered 26a/33a/36a/36b/53a/64a/72*/73*) is covered. The [Unreleased] section is the living record; Phase 82's v1.0.0 cut moves it under a dated [1.0.0] heading.

4. The release build logic has ONE home — scripts/release-build.sh — consumed by both the Makefile target and the workflow. The -ldflags -X stamping incantation, the CGo-free static-build flags (CGO_ENABLED=0, -ldflags='-s -w', -trimpath), the version-resolution priority chain, the checksum emission, and the post-build stamp-verification all live in exactly one shell script. make release-build, make release-dryrun (via scripts/release-dryrun.sh), and .github/workflows/release.yml all delegate to it — there is no second copy of the build incantation, avoiding the CLAUDE.md §13 "two parallel implementations" smell. scripts/release-dryrun.sh is the master-plan "release dry-run" test: it runs the exact release-build path with a synthetic version and asserts the artifact + checksum exist, the checksum verifies, and the stamped binary's harbor version reports the stamped string — plus that an un-stamped build still reports v0.0.0-dev (the stamp is opt-in, never silently applied). No heavyweight release framework (goreleaser) is introduced — stdlib go build + a shell script + a GitHub Actions workflow is the deliberate dependency-light surface (CLAUDE.md §13).

5. SLSA-style build provenance lands NOW, not as a post-V1 deferral. The master plan names SLSA-style attestations as a stretch — "include it if it's clean to add, otherwise document the deferral". It is clean to add: GitHub's native actions/attest-build-provenance@v1 action generates a signed, verifiable provenance attestation for the release artifact with no extra runtime dependency and no framework — it needs only the id-token: write / attestations: write job permissions. Because the stretch lands cleanly with a first-party action, deferring it would be gratuitous; the release workflow attaches provenance to the artifact on every v* tag push. The release workflow also exposes a workflow_dispatch path that runs the dry-run, so the release build can be exercised in CI without a tag. Phase 81 itself creates NO v* tag — tagging is the operator's job in Phase 82.

Why. Phase 81 is the last build phase before the v1.0.0 cut. It closes the master-plan acceptance loop — a pushed v1.0.0-rc.1 tag produces a release artifact, and the CHANGELOG covers every V1 phase — without conflating the product release version with the Protocol wire-contract version, without a heavyweight release framework, and with build provenance attached from day one. The const → var conversion is the minimal production-code change that makes link-time version stamping possible; everything else is build tooling and documentation.

Findings I'm departing from. None.

Protocol additions. None — Phase 81 ships no Protocol method, error code, or wire type. The single production-code change is the const → var conversion of cmd/harbor.HarborVersion; it changes no runtime behaviour and no Protocol surface.


D-140 — Wave 14 §17.5 checkpoint: zero-FAIL V1-readiness verdict, research-brief predecessor-name scrub, drift-audit scan extension, and CI examples-job CGo alignment

Date: 2026-05-22 Status: Settled (shipping with this PR)

Where it lives: docs/research/01-core-runtime.md07-code-level-tool-calling.md (the seven source-distilled briefs); scripts/drift-audit.sh (forbidden-name scan file set); .github/workflows/ci.yml (the examples job test step).

Decision. Wave 14's read-only V1-readiness checkpoint audit (§17.5) returned zero FAIL, one WARN, and one actionable NIT. This entry records the verdict and the audit-fix PR that closes the WARN and the NIT. Three load-bearing calls.

1. Wave 14 is clean — zero FAIL. The checkpoint audit read every shipped phase in the wave (source, tests, plan, RFC reference) hunting for wiring gaps, RFC drift, depth issues, weak tests, and hygiene regressions. It found no FAIL-class issue: the V1 surface holds together. The only blocking-adjacent finding was a hygiene WARN (below); the rest of the wave passed without remediation.

2. The predecessor project name is scrubbed from the research briefs, not carved out. The WARN: the predecessor project's name appeared 62 times across the seven source-distilled briefs (docs/research/0107), almost entirely as external source-path citations (~/Repos/<name>/.../core.py:1557) and a "Source path" table column. The scripts/drift-audit.sh forbidden-name scan deliberately excluded docs/research/, which is why the leak survived — CLAUDE.md §13 forbids the predecessor's name and any synonym ("the prior project", "the predecessor", "the reference implementation", "the source", abbreviations, author names) anywhere in committed text. The operator's call was explicit: scrub, do not carve out. Every name occurrence and every external repo-path citation is removed; each brief's actual design finding is kept and re-expressed as a standalone Harbor design statement (a finding attached to a source path becomes the finding alone — e.g. "DeadlineAt is wall-clock, not duration" with the trailing path dropped); "Source map" tables become non-referential "Concept map" tables. The briefs now read as Harbor's own design research with no allusion to any specific prior project. A case-insensitive scan for the predecessor's name across docs/research/ returns nothing.

3. The drift-audit forbidden-name scan now covers docs/research/*.md; the CI examples job pins CGO_ENABLED=0 on its test step. The scan-extension is the structural fix that makes the scrub permanent: scripts/drift-audit.sh previously scanned rule files, phase plans, indices, and Go source but not the research briefs — so the leak could recur silently. The scan now globs every docs/research/*.md brief, and the success message names the wider scope. The NIT: the examples job's go test -race step had no explicit CGO_ENABLED while its sibling go build step pinned CGO_ENABLED: '0'; the test step now pins it too, aligning with the repo-wide CGo-free discipline (CLAUDE.md §5). On Go 1.26 the race detector runs cgo-free, so the pin is harmless and consistent.

Why. The §17.5 checkpoint audit gates the next wave's planning. Recording the zero-FAIL verdict closes Wave 14; scrubbing the briefs and widening the drift-audit scan turn a one-time cleanup into an enforced invariant so the predecessor's name cannot leak back into the research tree.

Findings I'm departing from. None.

Protocol additions. None — this is a checkpoint audit-fix PR. It changes documentation (research briefs, this log), one CI workflow step, and one drift-audit shell script; it ships no Protocol method, error code, wire type, or runtime-behaviour change.


D-141 — Lint hardening before the v1.0.0 cut: govet drops fieldalignment + shadow; the full make lint becomes the enforced CI gate

Date: 2026-05-22 Status: Settled (shipping across the Wave 14 lint-hardening PRs)

Where it lives: .golangci.yml (the govet.disable list); .github/workflows/ci.yml (the lint job, flipped to the full make lint once the backlog clears); the fix(lint): ... burn-down PRs.

Context. Phase 80 (D-138) discovered the CI lint job had been a silent no-op since the project's start — golangci-lint was never installed on the runner, so make lint's command -v guard skipped it. Phase 80 fixed the silent skip but scoped the enforced gate to revive only (the doc-hygiene linter it named), tracking the wider backlog in issue #190. The operator's call for the v1.0.0 cut: burn the full backlog down and make the complete make lint the enforced gate.

Decision. Two load-bearing calls.

1. govet no longer enables fieldalignment or shadow. .golangci.yml carried govet.enable-all: true, which force-enables every govet analyzer including the two that are widely left off by deliberate choice:

  • fieldalignment reorders struct fields to minimise memory padding. Its autofixer destructively strips struct-field doc comments when it reorders — a burn-down agent measured 761 comment lines deleted from internal/protocol/types/ wire structs alone. The memory-packing win is negligible on structs that exist to be JSON-serialised onto the wire, and byte-packing actively fights logical field grouping and readability. Enforcing it trades documentation and clarity for a non-benefit.
  • shadow flags variable shadowing, including the idiomatic Go if err := f(); err != nil / per-iteration err :=. The mechanical "fix" (:==) reintroduces data races when the shadowed variable is an err inside a goroutine body — each goroutine needs its own binding. A linter whose fix introduces races is net-negative.

Together these accounted for ~180 of the ~327 raw backlog issues — the low-value, harmful-to-enforce half. They are disabled via govet.disable; every other govet analyzer stays on. The remaining ~147 issues (errcheck, unparam, unused, gocritic, gosec, errorlint, nilerr, stylecheck, ineffassign, staticcheck, copyloopvar, intrange, …) are genuinely worth fixing and are burned down to zero.

2. The full make lint becomes the enforced CI gate. Once the backlog is zero, the CI lint job flips from make lint-revive (the Phase 80 interim narrow gate) back to the full make lint. The revive doc-hygiene rules remain part of that full run. This closes issue #190 — the gate can no longer silently rot, because every linter in .golangci.yml now runs on every PR.

Why. A v1.0.0 framework should enforce the lint rules that catch real defects and not enforce micro-optimisation noise whose autofix damages the codebase. Disabling fieldalignment/shadow is not lowering the bar — it is removing two rules that were never a quality signal, so the gate that remains is entirely load-bearing.

Findings I'm departing from. The naive reading of "burn the whole backlog down" would have hand-reordered 42+ structs for fieldalignment and rewritten ~90 shadow sites. That path destroys Protocol wire-type godoc and risks races; this decision rejects it in favour of disabling the two analyzers — the reasonable-deviation call (CLAUDE.md §4.3), recorded here because it is a permanent .golangci.yml policy change.

Protocol additions. None — .golangci.yml + .github/workflows/ci.yml only; no Protocol method, error code, wire type, or runtime-behaviour change.


D-142 — The v1.0.0 cut: a framework-quality root README, a de-jargoned CHANGELOG, and the release surfaces

Date: 2026-05-22 Status: Settled (shipping with the Phase 82 v1.0.0 cut)

Where it lives: README.md; CHANGELOG.md; docs/announcements/v1.0.0.md; docs/plans/phase-82-v1-cut.md; the v1.0.0 git tag.

Decision. Phase 82 cuts v1.0.0 — the line at which the V1 surface is complete and stable. Three calls are settled here.

1. The root README is rewritten as a framework front door, not a build log. The organically-grown README had become a ~100-line phase-by-phase status table, each row a paragraph — a development artifact, not a product entry point. The v1.0.0 README leads with positioning and a three-command quickstart, then the four-layer architecture, the usage path, documentation pointers, and an honest V1 status; it carries the Harbor logo and a five-badge row (CI, release, Go Reference, Go version, license). The phase-status table is deleted — the master phase plan (docs/plans/README.md) is the canonical execution index, and the README links to it rather than mirroring it.

2. Public release surfaces carry no internal "phase" vocabulary. "Phase NN" is Harbor's internal development jargon. It belongs in docs/plans/, docs/decisions.md, and the per-phase artifacts — never in the README, the CHANGELOG, release notes, or the launch announcement. The CHANGELOG.md [1.0.0] section is grouped by subsystem and describes the product in feature terms; its section headers dropped the (Wave N, phases XX–YY) parentheticals. A scripts/smoke/phase-82.sh check enforces that the CHANGELOG carries no phase-N token.

3. v1.0.0 is the initial release — no migration notes. The master-plan Phase 82 goal lists "migration notes (if any)"; there is no prior released version to migrate from, so none apply. This is recorded rather than left as an open question.

The v1.0.0 git tag is operator-run from main after this PR merges and main CI is green. The Phase 81 release.yml workflow then builds the version-stamped CGo-free static binary, attaches the SHA-256 checksum and SLSA build provenance, and publishes the GitHub Release.

Why. Harbor is a real product; its front door has to look the part, and its public change record has to read for a user, not a maintainer. The README rewrite and the CHANGELOG de-jargoning are the difference between a repo that looks shipped and one that looks mid-build.

Findings I'm departing from. None.

Protocol additions. None — Phase 82 is the release cut. No Protocol method, error code, wire type, runtime behaviour, or CLI subcommand changes.


D-143 — The ReAct system prompt is twelve XML-tagged structured sections; no reasoning field, no rich-output fields

Date: 2026-05-22 Status: Settled (shipping with Phase 83a — the foundation phase of the 83-band)

Where it lives: internal/planner/react/prompt.go (defaultBuilder.buildSystemContent + the twelve section constants); internal/planner/react/react.go (DefaultSystemPrompt sentinel + WithSystemPromptExtra Option); internal/planner/react/testdata/golden_default_prompt.txt (the normative fixture); internal/config/config.go (PlannerConfig.ExtraGuidance).

Decision. Phase 83a replaces the ReAct planner's flat one-string DefaultSystemPrompt (Phase 45/47) with the twelve XML-tagged sections inventoried in brief 13 §2.1 — <identity>, <output_format>, <action_schema>, <finishing>, <tool_usage>, <parallel_execution>, <reasoning>, <tone>, <error_handling>, <available_tools>, <additional_guidance>, <planning_constraints> — assembled in that fixed order, separated by \n\n. Four calls are settled here.

1. The twelve sections are the section anchors the rest of the 83-band builds on. XML tags make each section individually editable (brief 13 §2.1). Phase 83b replaces the <available_tools> body with per-tool args_schema + curated examples; 83c populates <planning_constraints> from RunContext.PlanningHints and merges per-turn repair guidance; 83d injects <read_only_*_memory> UNTRUSTED-framed blocks. Each is a localised edit against an established anchor, not a structural rewrite. The two optional sections (<additional_guidance>, <planning_constraints>) are omitted entirely — never emitted as empty tag pairs — when their content is absent.

2. The action JSON drops the reasoning field; the <tone> CRITICAL clamp is ported verbatim. Per brief 13 §2.6 (2026-05-19 revision), reasoning is captured from the provider's reasoning channel (Phase 83e) and persisted on the trajectory step — never required in the model's structured output. The rendered <action_schema> example is {tool, args} only; the trajectory replay renderer echoes {tool, args} only; <tone> carries the two CRITICAL lines instructing the model not to emit a thought / reasoning field. Phase 83a is the prompt-side alignment; Phase 83e narrows the runtime-side Decision sum.

3. Rich output is dropped from Harbor entirely — not reserved, not deferred. The <finishing> block carries only args.answer (plain text). No confidence / route / requires_followup / warnings finish-args fields; <error_handling> guides clarification via args.answer, not a requires_followup flag. Rich UI is delivered through MCP-Apps tools the planner invokes (brief 13 §5), never through a typed finish-payload.

4. DefaultSystemPrompt becomes a routing sentinel; operators inject guidance without forking the builder. The old single-string constant is removed (not renamed to a dangling legacyDefaultSystemPrompt — the golden fixture is the normative spec, a legacy constant would be dead code per CLAUDE.md §13). DefaultSystemPrompt is now a stable non-empty sentinel string the builder compares against to choose the structured layout vs. honouring a verbatim WithSystemPrompt override. The new WithSystemPromptExtra(s string) Option and the new planner.extra_guidance config key flow operator-supplied domain guidance into <additional_guidance> without writing Go.

Why. The Phase 45 flat prompt gave the LLM no schema discipline, no failure-recovery framing, and no explicit injection points. The twelve-section layout is the load-bearing structure the dynamic-augmentation, tool-schema, and memory-framing phases (83b/c/d) all depend on; landing it first as a content-only refactor de-risks the band.

Findings I'm departing from. None — this phase matches brief 13's 2026-05-19 revised design exactly. The brief's own §9 records the departure from the superseded "rich-output deferred to V2" note; Phase 83a inherits that closed departure rather than re-opening it.

Protocol additions. None — Phase 83a is a planner-internal prompt-content refactor plus one operator-facing config key (planner.extra_guidance) and one constructor Option (WithSystemPromptExtra). No Protocol method, error code, wire type, or CLI subcommand changes.


D-147 — The ReAct action schema is narrowed to {tool, args}; reasoning is captured on the provider channel, not the decision

Date: 2026-05-22 Status: Settled (shipping with Phase 83e)

Where it lives: internal/planner/decision.go (CallTool); internal/llm/llm.go (CompleteResponse.Reasoning); internal/llm/drivers/bifrost/reasoning.go + bifrost.go + translate.go; internal/planner/repair/parser.go + repair.go; internal/planner/trajectory/trajectory.go (Step.ReasoningTrace); internal/planner/react/react.go; internal/planner/events.go (DecisionPayload, ActionExtraFieldDroppedPayload).

Decision. The planner.CallTool decision shape drops its Reasoning field. The model emits {tool, args} only. The provider-side thinking trace — Anthropic extended thinking, OpenAI o-series, DeepSeek native, Gemini thought:true parts — is captured separately: llm.CompleteResponse gains a Reasoning string field, the bifrost driver reads BifrostChatResponse.Choices[0].Message.ReasoningDetails (bifrost's normalised canonical surface) on BOTH the unary and streaming paths, and the captured trace persists on trajectory.Step.ReasoningTrace. This closes two gaps brief 13 §2.6's empirical Bifrost probe pinned: the unary-path gap (OnReasoning was streaming-only) and the Gemini-direct black hole (bifrost populated reasoning_details[] on the message but Harbor dropped it). Phase 44's schema-repair parser tolerates incoming reasoning / thought fields by silently stripping them and emitting a planner.action_extra_field_dropped telemetry event per dropped field — the runtime fails OPEN for backward compatibility with older trained models.

Why. A reasoning string inside the structured decision conflates two concerns: the action the runtime executes, and the model's chain of thought. The conflation cost a schema field the model had to fill on every step, and it never carried the real provider thinking trace — only whatever free text the model echoed. Reading the provider's normalised reasoning channel captures the genuine trace; narrowing the action schema removes the model's expectation of an echo field. The "we need reasoning visible in the trajectory" use case is preserved by D-148's replay knob — by configuration, not by schema.

Findings I'm departing from. This is a binary departure from Phase 45 / D-051, which shipped CallTool{Tool, Args, Reasoning} as the V1 action shape. The departure is recorded here; the deterministic planner's CallToolStep.Reasoning field is dropped in the same change since it has nowhere to land.

Protocol additions. None — CompleteResponse and the planner Decision sum are internal Go types, not Protocol wire types. The planner.decision and planner.action_extra_field_dropped events are internal event-bus types (registered in internal/planner/events.go), surfaced to operators via harbor inspect-runs event replay — not new Protocol methods.


D-148 — Reasoning replay is a per-agent operator knob; never by default for ALL models, two modes only

Date: 2026-05-22 Status: Settled (shipping with Phase 83e)

Where it lives: internal/planner/planner.go (ReasoningReplayMode, RunContext.ReasoningReplay, EffectiveReasoningReplay); internal/config/config.go (PlannerConfig.ReasoningReplay) + internal/config/validate.go; internal/planner/react/react.go (WithReasoningReplay) + prompt.go; internal/planner/registry.go + react/init.go.

Decision. Whether a prior step's captured reasoning trace is re-injected into the next turn's prompt is an operator-controlled per-agent knob: config.PlannerConfig.ReasoningReplay, a string enum validated to never / text (empty resolves to never). The ReasoningReplayMode Go enum's zero value resolves to never — replay is OFF unless an operator opts in, for ALL models. When the mode is text, the ReAct trajectory renderer prepends each prior step's captured ReasoningTrace as a text block above the prior {tool, args} action JSON. A per-run RunContext.ReasoningReplay *ReasoningReplayMode override wins over the agent-configured value for tenant- or run-specific policy. V1 ships exactly two modes — there is NO provider_native mode.

Why. The predecessor never replayed reasoning; Harbor's stance is the same default (never-replay for every model — thinking-class or not), with a deliberate per-agent opt-in for workloads where chain-of-thought continuity across turns measurably helps. Making it a knob rather than a hardcoded behaviour means the "reasoning visible in the trajectory" use case D-147 removed from the schema is recovered by configuration. The zero-value-resolves-to-never contract is load-bearing: a misconfigured or zero-value enum must NOT silently opt an agent into replay.

Findings I'm departing from. Brief 13 §2.6 noted three candidate modes (never, text, provider_native). V1 ships only the first two. provider_native would round-trip Anthropic's signature-bearing thinking blocks through bifrost as API constructs across turns; Bifrost's docs do not address that round-trip, so Harbor cannot guarantee correctness today. Deferred — revisit when (a) Bifrost documents the signed-thinking-block round-trip or (b) a real workload measurably benefits.

Protocol additions. None — PlannerConfig.ReasoningReplay is a harbor.yaml config key (restart-required; no reload:"live" tag). No Protocol method, error code, wire type, or CLI subcommand change.


D-145 — Repair counters live on RunContext, not on the ReActPlanner struct

Date: 2026-05-22 Status: Settled (shipping with Phase 83c)

Where it lives: internal/planner/planner.go (RepairCounters, PlanningHints, BudgetHints, RunContext.RepairCounters, RunContext.PlanningHints, PlanningNudges); internal/planner/events.go (EventTypePlannerRepairGuidanceInjected, RepairGuidanceInjectedPayload); internal/planner/react/repair_guidance.go + planning_hints.go + prompt.go; internal/planner/repair/repair.go (RunResult.Repair, RepairOutcome).

Decision. Phase 83c's per-run, across-step failure counters — RepairCounters{FinishRepair, ArgsRepair, MultiAction} — live on the per-run planner.RunContext, never on the shared ReActPlanner struct. The runtime constructs one RepairCounters per run and threads the same pointer through every per-step RunContext; the ReAct planner reads the counters in its prompt builder and updates them after each step (updateRepairCounters). A nil pointer means "no augmentation". The richer PlanningHints struct (constraints, preferred order, parallel groups, disallow/preferred tools, budget caps) also lives on RunContext as RunContext.PlanningHints and renders into the <planning_constraints> prompt section; the pre-existing parallel/transport nudge struct was renamed PlanningHints → PlanningNudges to free the name.

Why. The reference design the brief drew from stores failure counters on the planner instance, persisting across runs ("no orchestrator wiring required"). Harbor cannot: the ReActPlanner is a shared compiled artifact (D-025), and a mutable counter field on it would be a §13-forbidden mutable-state-on-a-compiled-artifact bug — two concurrent runs sharing the planner would cross-contaminate each other's counters. Scoping the counters to the per-run RunContext is the only shape that satisfies the concurrent-reuse contract. The cost — the counters must be threaded through RunContext rather than read off this — is the wiring the reference design saved; Harbor pays it deliberately. The cross-run isolation test (TestE2E_React_RepairGuidanceCrossRunIsolation, plus TestUpdateRepairCounters_ConcurrentDisjointRunContexts at N=128) is the proof.

Findings I'm departing from. Brief 13 §2.2's "planner-instance counters, no orchestrator wiring" — departed for the D-025 reason above. The departure is the whole point of this decision; the per-run scope is the chosen contract.

Protocol additions. None — RepairCounters / PlanningHints / RepairOutcome are internal Go types, not Protocol wire types. planner.repair_guidance_injected is an internal event-bus type (registered in internal/planner/events.go), surfaced to operators via the Console / harbor inspect-runs event replay — not a new Protocol method, error code, or CLI subcommand.


D-144 — ReAct tool catalog renders args_schema + side_effects + tag-ranked examples

Date: 2026-05-22 Status: Settled (shipping with Phase 83b)

Where it lives: internal/tools/tools.go (Tool.Examples, ToolExample); internal/tools/example_validation.go (validateExamples, ErrToolExampleInvalid) + internal/tools/catalog.go (Register calls it); internal/planner/react/prompt.go (renderTool, renderAvailableToolsSection, toolRenderConfig, rankedExamples, compactJSON); internal/planner/react/react.go (WithMaxToolExamplesPerTool); internal/planner/react/init.go; internal/config/config.go + internal/config/validate.go (PlannerConfig.MaxToolExamplesPerTool); internal/planner/registry.go; cmd/harbor/cmd_dev.go.

Decision. The ReAct system prompt's <available_tools> section renders each tool with its full args_schema (compact single-line JSON), declared side_effects class, and up to N curated examples — not the Phase 45 / 83a name + description shape. Examples are a new opt-in field Tool.Examples []ToolExample ({Args, Description, Tags}); they are tag-ranked minimal (rank 0) > common (1) > edge-case (2) > untagged (3), stable-sorted by (rank, originalIndex), and the renderer keeps the top MaxToolExamplesPerTool (operator knob, default 3). A tool that ships no examples renders through its side_effects line and omits the examples: line entirely — no registration-site code change is needed for existing tools. Curated examples are validated at catalog registration: an example whose Args names a key not declared in the tool's args_schema.properties fails Register loudly with ErrToolExampleInvalid (a passing example is a working example). args_schema is re-marshalled to compact JSON via encoding/json (deterministic map-key order) so the section stays KV-cache-stable across turns.

Why. The dominant ReAct failure mode the prompt-quality band closes is the args-validation-failure cascade: with only name + description exposed, the LLM guesses argument shapes, the catalog edge rejects the guess, and the planner burns steps recovering. Brief 13 §2.4 pins examples as the most token-efficient way to constrain args — "a single concrete example is worth several lines of schema prose." Surfacing the schema + examples gives the LLM the information to decide correctly the first time (brief 07 §3: runtime owns dispatch, the LLM is the decision-maker). Registration-time example validation closes the secondary risk that examples become performative — an example that contradicts the schema would teach a shape the runtime then rejects.

Findings I'm departing from. None — this matches brief 13 §2.4 exactly. The Tool.Examples field and ToolExample type pre-existed (added speculatively with Phase 26's catalog); Phase 83b gives them their first consumer (the renderer) and their first guard (the validator), satisfying the §13 primitive-with-consumer rule.

Protocol additions. None — Phase 83b is a planner-internal prompt-content change plus one operator-facing config key (planner.max_tool_examples_per_tool) and one constructor Option (WithMaxToolExamplesPerTool). No Protocol method, error code, wire type, or CLI subcommand change.


D-146 — ReAct memory + skills inject as separate UNTRUSTED-framed system messages; serialisation fails loudly

Date: 2026-05-22 Status: Settled (shipping with Phase 83d)

Where it lives: internal/planner/planner.go (MemoryBlocks type, RunContext.MemoryBlocks, RunContext.SkillsContext); internal/planner/errors.go (ErrMemoryBlockUnserializable); internal/planner/react/memory_wrappers.go (wrapper copy + render helpers); internal/planner/react/prompt.go (buildRequest / baseRequest); internal/planner/react/react.go (Next drives the error-returning build path).

Decision. Pre-fetched memory blobs and pre-retrieved skill bodies are injected into the ReAct planner's system prompt as separate llm.ChatMessage system-role entries — never concatenated into the twelve-section base system message. The Runtime populates RunContext.MemoryBlocks ({External any, Conversation any}) and RunContext.SkillsContext []any; the planner renders. Three wrappers, in a fixed order — <read_only_external_memory><read_only_conversation_memory><skills_context> (most-stable → least-stable → operator-curated, so the message-slice prefix stays KV-cache-stable across turns). Each memory wrapper carries the verbatim five-line anti-prompt-injection rule list from brief 13 §2.3; the skills wrapper carries an analogous shorter operator-curated framing. Payloads are compact JSON (sorted keys, no whitespace, HTML-escaping off). A nil tier / nil MemoryBlocks / empty SkillsContext is omitted entirely — no empty wrapper is rendered.

A value json.Marshal rejects (a chan, a function, a cyclic structure) fails the planner step loudly with a typed planner.ErrMemoryBlockUnserializable naming the offending tier / index — never a silently dropped tier or an empty wrapper. The PromptBuilder.Build interface signature is unchanged (it cannot return an error); the planner instead drives the in-package defaultBuilder via the error-returning buildRequest and surfaces the sentinel from Next.

Why. Memory feeds (Phase 23 / 24) can carry user-contributed conversational content susceptible to prompt-injection; the UNTRUSTED framing is the prompt-time mitigation that makes memory safe to inject. Distinct tag names per tier let the model use tier semantics and let Console traces / debugging tools grep one tier. Separate messages (not one mega system prompt) keep each tier independently isolatable. Fail-loud serialisation closes the silent-context-loss failure mode the project explicitly closes (CLAUDE.md §5 + §13): a dropped memory tier is invisible context loss.

Findings I'm departing from. None — brief 13 §2.3 + brief 04 are followed as written. Render-only is deliberate: runtime-side retrieval policy (when to fetch, what query, cardinality) stays on the runtime where it has identity + cost context.

Protocol additions. None — RunContext.MemoryBlocks / SkillsContext and ErrMemoryBlockUnserializable are internal Go types, not Protocol wire types. No new method, error code, config key, or CLI subcommand.


D-149 — Phase 83f: dev RunLoop driver populates the four 83-band primitives + session-scoped memory/skills fetch + fail-loud on store errors

Date: 2026-05-22 Status: Settled (shipping with Phase 83f)

Where it lives: cmd/harbor/cmd_dev_runloop.go (perTaskRunLoopDriver opts + runOne fetch path + projectMemoryBlocks / projectSkillsContext helpers); cmd/harbor/cmd_dev.go (bootDevStack opens skills.SkillStore when configured and threads MemoryStore / SkillStore / SkillsContextMax / projected PlanningHints into the driver; plannerHintsFromConfig is the YAML→Go projector); internal/config/config.go (PlannerConfig.SkillsContextMax + PlannerConfig.PlanningHints of type PlannerPlanningHintsCfg); internal/config/validate.go (validates the new fields); harbortest/devstack/devstack.go (mirror of the production driver per D-094); examples/harbor.yaml.

Decision. Phase 83f closes the §17.5 Wave 15 audit's W3/W4 finding (issue #208). Four calls are settled here.

1. Where the fetch happens — perTaskRunLoopDriver.runOne. After MarkRunning and before building the steering.RunSpec, the driver: (a) calls tasks.Get(taskCtx, taskID) to read the user-facing Query; (b) calls memory.GetLLMContext(taskCtx, sessionQ) when MemoryStore is configured; (c) calls skills.Search(taskCtx, sessionQ, task.Query, skillsContextMax) when SkillStore is configured AND task.Query != ""; (d) allocates &planner.RepairCounters{} per run; (e) projects the operator-supplied *planner.PlanningHints from config. The RunSpec.Base is then built with Query, Goal: Query, MemoryBlocks, SkillsContext, RepairCounters, PlanningHints — every field 83c/83d/83e require. The runtime side of the 83-band is now genuinely on the operator's golden path.

2. Memory + skills are session-scoped — sessionQ := {Identity: q.Identity}. Per RFC §6.6 ("Memory is session-scoped by default") + §6.7 (skills DB schema keys by (tenant, user, session) only), the fetch quadruple zeroes RunID so each run inherits the session's accumulated state rather than seeing only its own (empty) per-run slice. This matches the brief 02 §6 split — runtime owns identity-scoped fetch, planner is render-only. The MemoryStore inmem driver's internal keying currently includes RunID; the driver works around that by always handing RunID="" — a future memory-driver phase should normalise the inmem key to the session triple, but that is out of 83f's scope.

3. Fail-loud on store errors — runtime_fetch_error. Any non-nil error from tasks.Get / memory.GetLLMContext / skills.Search immediately calls tasks.MarkFailed with Code: "runtime_fetch_error" and a Message naming the failing call site, and bails BEFORE the LLM is called. No silent degradation to nil blocks; no provider cost burned on a degraded run. This matches CLAUDE.md §5 fail-loud and the §13 "silent degradation forbidden" rule. The integration test pins this with a forced MemoryStore.GetLLMContext error.

4. YAML surface is intentionally small for V1.1. planner.skills_context_max (int, default 5 via package const, validator rejects negatives) caps the Search result count. planner.planning_hints is a struct with constraints (free-form text) + preferred_tools ([]string); the richer Go-struct fields on planner.PlanningHints (ParallelGroups, DisallowTools, Budget) remain reachable through a custom planner Option but not via harbor.yaml. Empty YAML block ⇒ nil pointer projection ⇒ <planning_constraints> section omitted from the prompt. The richer surface lands in a follow-up when an operator actually needs it.

Why. Wave 15 shipped the four primitives (MemoryBlocks, SkillsContext, RepairCounters, PlanningHints) and the 83e reasoning-trace capture path, but the production dev binary never populated them — only the test code did. That made the wave's value real for library consumers building their own RunContext but invisible to operators running harbor dev. The audit (W3/W4) named this a §13 "test stubs as production defaults on operator-facing seams" failure mode read one level out: the seams exist but the production wiring doesn't fill them. 83f closes the consumer gap so the prompt-quality band's value reaches the operator on the golden path.

Findings I'm departing from. None — 83f is a pure consumer phase against already-shipped primitives. The memory keying observation (point 2) surfaces a divergence between the RFC's session-scope and the inmem driver's run-scope, documented here for a future memory-driver phase to normalise.

Protocol additions. None — 83f is internal wiring plus two operator-facing config keys. No new Protocol method, error code, wire type, or CLI subcommand.


D-150 — Phase 83g: dev binary spawns + registers MCP southbound providers at boot; fail-loud on connect/discover

Date: 2026-05-23 Status: Settled (shipping with Phase 83g)

Where it lives: cmd/harbor/cmd_dev.go (attachDevMCPServer helper + the bootDevStack per-server loop); harbortest/devstack/devstack.go (mirror per D-094); cmd/harbor-mcptest-stdio/ (the integration test's stdio MCP server fixture); test/integration/phase83g_mcp_dev_consumer_test.go.

Decision. Phase 83g closes the second consumer gap surfaced during the Phase 83f operator-validation work. The 83-band's gap (issue #208) was that the dev binary populated only Quadruple on RunContext; the MCP gap is the same shape one layer over: cfg.Tools.MCPServers[] is declared in the config schema, validated at boot, and exposed READ-ONLY by Phase 73h's Console mcp.servers.* Protocol methods, but nothing in bootDevStack calls mcpdrv.New to spawn an MCP server, open a session, discover tools, or register them into the tool catalog. Configuring an mcp_servers[] entry in harbor.yaml was silently ignored. Three calls are settled here.

1. Per-server attachment shape — attachDevMCPServer. For each cfg.Tools.MCPServers[i], the dev boot: (a) constructs mcpdrv.Config with the configured transport / URL / Command / Headers / KeepAlive + the dev token's identity (tenant=dev / user=dev / session=dev) for server-pushed mcp.resource_updated events; (b) calls mcpdrv.New(cfg) then provider.Connect(ctx); (c) calls provider.Discover(ctx) for the tool list; (d) registers each returned ToolDescriptor on the tool catalog via cat.Register(d); (e) registers the live Provider with the boot-time mcp.Registry so the Console MCP-page mount lands with no re-spawn when the surface wiring follows. The Provider's Close is appended to the dev stack's closer chain — stack teardown drains every subprocess; no orphan-process regression.

2. Fail-loud on Connect / Discover / Register errors. Any non-nil error from mcpdrv.New / provider.Connect / provider.Discover / cat.Register is returned wrapped (mcp[<name>]: <stage>: <err>) and bootDevStack calls closeAll(ctx) + returns. The dev binary exits non-zero with the operator-actionable error. No silent-degradation branch that boots without a configured MCP server. Matches the §5 / §13 / 83f convention. The decision to fail-loud rather than degrade (optional: true per-server flag, --skip-mcp-on-error CLI flag) is deliberate for V1.1 — an operator who declared an MCP server should see a clear failure if it cannot reach. Graceful-degradation knobs are a follow-up if pain accrues.

3. Console MCP-page mount is a follow-up, not part of 83g. Wiring the Registry onto the Protocol mux via mcp.NewRegistryAccessor + protocol.NewMCPSurface requires a single *auth.Provider accessor (per mcpconsole.NewOAuthAccessor's signature). The dev binary's OAuth side is a slice of per-tool-entry providers (returned from applyToolCatalogWiring), not a master *auth.Provider. Plumbing that is a small but separate phase. 83g constructs and populates the Registry so the follow-up only adds the surface mount — no re-spawning, no second source of truth. The integration test asserts on the Registry directly (via stack.MCPRegistry); operator visibility through the Console UI lands when the surface mount does.

Why. Without 83g, configuring an mcp_servers[] entry in harbor.yaml was a no-op — the operator's chat-with-MCP-tools story (the headline use case for the v1.1 cut) didn't work out of the box. The §17.5 audits to date didn't trace the MCP path end-to-end with ps-real subprocess spawning, so the gap escaped Wave 15's checkpoint. 83g is the lift-and-cover phase: same shape as 83f (primitive without dev-binary consumer), same fail-loud posture, same D-094 devstack mirror discipline.

Findings I'm departing from. None — 83g is a pure consumer phase. The decision to defer the Console MCP-page mount is documented in the phase plan's risks section, not a departure from any brief.

Protocol additions. None — 83g consumes existing cfg.Tools.MCPServers config and the already-exported mcpdrv API. No new method, error code, wire type, or CLI subcommand.


D-151 — Phase 83h: hot-reload watcher skips DB sidecars; LLM safety wrapper defaults req.Model from cfg.Model

Date: 2026-05-23 Status: Settled (shipping with Phase 83h)

Where it lives: cmd/harbor/cmd_dev_hot_reload.go (dbSidecarSuffixes, isDBSidecar, shouldTrigger); cmd/harbor/cmd_dev_hot_reload_test.go (TestShouldTrigger_SkipsDBSidecars); internal/llm/safety.go (the req.Model = c.cfg.Model default-fill in safetyClient.Complete); internal/llm/safety_test.go (TestSafety_DefaultsModelFromConfigSnapshot).

Decision. Two hard-block bugs surfaced when the v1.1 operator validation booted harbor dev against a real bifrost LLM + the scaffolded sqlite-backed state + skills drivers. Both fixes are tiny; the audit lesson is bigger.

V1 — hot-reload watcher reboot-loops on SQLite WAL/SHM/journal sidecars. Default harbor dev watches the cwd. SQLite (the scaffold-default for state.driver and skills.driver) rewrites its *.sqlite-wal / *.sqlite-shm companions on every commit. fsnotify fires CREATE/WRITE on each rewrite; the watcher triggers a drain+reboot; the rebooted binary opens SQLite, the WAL gets rewritten, repeat. ~700ms loop, dev binary unusable. Fix: extend shouldTrigger with a fixed suffix-deny list — .sqlite-wal, .sqlite-shm, .sqlite-journal, .db-wal, .db-shm, .db-journal, -journal. Operator-supplied glob ignores stay deferred to a follow-up when pain accrues; the fixed list unblocks V1.1 against the scaffold defaults.

V2 — LLM safety wrapper rejects requests with empty Model. The react planner (Phase 45 / 83a) builds llm.CompleteRequest{Messages: ...} without setting Model. The safety wrapper's validateRequest rejects with CompleteRequest.Model is empty. The mock LLM driver used in every existing dev-binary integration test does not invoke the safety wrapper's structural validation path (the mock returns canned responses), so the gap escaped the Wave 13 / 14 / 15 checkpoints. Real-bifrost reaches validateRequest and fails at step 0. Fix: in safetyClient.Complete, before validateRequest, default req.Model = c.cfg.Model when the caller did not pin one. Callers that DO pin Model (multi-model agents, posture sub-clients) keep their pin.

Audit lesson — record explicitly so the §17.5 audits to come catch these earlier. Both V1 and V2 are §13 "test stubs as production defaults on operator-facing seams" failure modes read one layer over: the integration tests used the mock LLM and didn't spawn real subprocesses + write real sqlite + send real prompts, so two real-bifrost+real-sqlite-binding bugs sat untested through Wave 14's V1 cut + Wave 15's prompt-quality band. The post-83h checkpoint audit specifically targets the harbor dev + real-bifrost end-to-end path to find whatever V4/V5/V6 are waiting after the next prompt.

Why now. Without V1 the operator's first harbor dev boot enters an infinite reboot loop after the planner persists any state; without V2 the operator's first prompt is rejected before the LLM call. Together they are the difference between "v1.1 ships a working out-of-the-box framework" and "v1.1 ships only to operators who already know to set --no-hot-reload and pin Model upstream." The fixes are 10 lines each + a unit test apiece.

Findings I'm departing from. None — both fixes follow the project's fail-loud / fill-loud posture (V1: filter inputs that produce noise; V2: fill the documented default at the documented boundary).

Protocol additions. None — both fixes are inside implementation packages and preserve existing function signatures.


D-152 — Phase 83i: runloop ToolExecutor + Catalog/Trajectory/Emit/Memory wiring closes the v1.1 operator-validation blockers

Date: 2026-05-23 Status: Settled (shipping with Phase 83i)

Where it lives: internal/runtime/steering/runloop.go (ToolExecutor interface, ErrDecisionShapeUnsupported, RunSpec.ToolExecutor, the default-case dispatch + trajectory append); cmd/harbor/cmd_dev_executor.go (the dev binary's devToolExecutor); cmd/harbor/cmd_dev_catalog_view.go (the planner-facing runtimeCatalogView); cmd/harbor/cmd_dev_runloop.go::runOne (Catalog + Trajectory + Emit + Executor + MaxSteps wiring + memory.AddTurn writeback + extractAssistantAnswer); harbortest/devstack/devstack.go (D-094 mirror).

Decision. Wave 17's operator validation against harbor dev + real bifrost + mcp-youtube hit the "64 steps, 0 tool calls" failure mode. The §17.5 audit pinned four root causes, all the same shape: primitives shipped without their production runtime consumer. 83i closes all four. Four calls are settled here.

1. steering.ToolExecutor is the runloop's dispatch seam. Phase 53 left the runloop's default: case as dead code with a comment that "a later phase wires the executor." That phase never landed; every CallTool decision was observed and discarded, and the planner was structurally a "decide-without-doing" loop. 83i ships the interface (ExecuteDecision(ctx, rc, decision) (observation, llmObservation, error)) and the runloop's dispatch path. CallParallel / SpawnTask / AwaitTask remain executor-side errors (ErrDecisionShapeUnsupported) for V1.1 — the runloop seam supports them; the dev executor declines and the planner re-plans.

2. Trajectory is appended by the runloop after every dispatched step. Without an append, the planner's prompt was identical on every iteration — the live validation showed 30 LLM calls with byte-for-byte identical (PromptTokens, CompletionTokens). 83i's runloop append uses the per-run pointer on spec.Base.Trajectory (value-copy of spec.Base per step preserves the pointer, so mutations are visible to the next iteration's rc). Each step records Action (the planner's Decision), Observation (raw runtime result), and LLMObservation (the D-026 projection — see point 3).

3. D-026 heavy-content discipline lives in the dev executor. The first successful tool call returned a 1.5 MB JSON observation. Rendered verbatim into the next prompt, the LLM safety wrapper rejected with ErrContextLeak. 83i's executor encodes the raw result with json.Marshal, checks against the configured cfg.Artifacts.HeavyOutputThresholdBytes, and on overflow stores the encoded bytes in the artifact store + returns a small summary map ({tool, size_bytes, truncated:true, preview, artifact_ref}) as llmObservation. Small results pass through as observation == llmObservation. The artifact-store path degrades to a logged-Warn truncation summary when the store is unavailable — silent context loss is §13-forbidden, the operator must see what was elided.

4. RunContext.Emit + MemoryStore.AddTurn close the observability + multi-turn affordances. The runOne builds an Emit closure that stamps the run's identity quadruple and publishes through the bus; without it the planner's planner.decision / planner.finish / planner.repair_guidance_injected events stay in the planner's head and never reach the Console / harbor inspect-runs. On FinishGoal the driver calls memory.AddTurn(taskCtx, sessionQ, ConversationTurn{user, assistant}) so the next session turn sees prior context. Best-effort: a memory.AddTurn failure logs Warn but does NOT downgrade the run's status (the planner reached FinishGoal; the operator should see Complete).

Why. Without 83i, every Wave 13/14/15/83f-h investment in the planner band is invisible to operators — the dev binary boots, accepts a prompt, runs the planner against the LLM, but the planner can never CALL anything because the catalog projection is empty, can never make progress because the trajectory never grows, and can never persist context because memory writeback never fires. The §17.5 operator-validation audit was the surface that pinned this; live validation against mcp-youtube after 83i lands shows a 2-LLM-call end-to-end (decision: CallTool → executor runs the tool → planner sees the observation → decision: Finish).

Findings I'm departing from. None — 83i is a pure consumer phase.

Protocol additions. None — the runloop seam is internal Go; the dev executor + view are package-private; no wire shape changed.


D-153 — Phase 83n: harbor init + tiered yaml + docs/CONFIG.md drift gate + opt-in built-in tools

Date: 2026-05-23 Status: Settled (shipping with Phase 83n)

Where it lives: cmd/harbor/cmd_init.go (cobra wiring + CLIError code mapping); cmd/harbor/init/init.go (harborinit.Init engine + sentinels); cmd/harbor/init/templates/default/ (the four .tmpl files); internal/tools/builtin/ (the new package — builtin.go, clock.go, text.go, builtin_test.go); internal/config/config.go (ToolsConfig.BuiltIn); internal/config/validate.go (allowedBuiltInTools mirror + KnownBuiltInTools() + validation); internal/config/doc_drift_test.go (the CI drift gate); docs/CONFIG.md (the operator-facing reference); cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (built-in registration + D-094 mirror).

Decision. V1.1's adoption-first posture demands a real first-clone entry point. Until 83n the operator path was: read the RFC → fork a YAML example → grep godoc → guess. harbor init collapses that into one command. Four settled calls.

1. harbor init ships exactly one template (default) with a tiered yaml. REQUIRED (identity placeholders that pass validation + four commented LLM-provider example blocks for OpenRouter / Anthropic / OpenAI / NVIDIA NIM, all reachable through bifrost), COMMON KNOBS (memory / planner / tools / skills / governance — all commented with sensible defaults shown), ADVANCED (pointer to docs/CONFIG.md). The operator uncomments exactly one provider block, sets the API key env var, runs harbor validate, then harbor scaffold (83o consumes the operator-edited yaml). The choice of "tiered + commented" over "fully populated and operator-trims" is deliberate: a commented block invites editing; a populated block reads as "this is fine as-is, don't touch."

2. The framework is prescriptive about correctness, unopinionated about taste. The yaml hard-codes nothing about provider / model / reasoning_effort / budget; the four examples are equivalent starting points. The init's bias is toward "easy on-ramp" not "ideal config." This is the V1.1 mantra: prescriptive about catalog wiring, trajectory append, fail-loud (D-152's ToolExecutor seam, D-026's heavy-output discipline); unopinionated about provider / model / utility tools.

3. Built-in tools are opt-in by name through tools.built_in []string. V1.1 ships two — clock.now and text.echo. They live at internal/tools/builtin/ and register through inproc.RegisterFunc the same way an operator's custom Go function would. The yaml field is purely additive: an empty list registers nothing. The §4.4 mirror pattern (internal/config carries allowedBuiltInTools; builtin_test.go asserts the mirror) means a typo fails at harbor validate time rather than at boot, while a new built-in addition requires both surfaces to update or the mirror test fails. Phase 83o consumes the same yaml field to materialise per-built-in Go imports in the scaffolded project.

4. docs/CONFIG.md ships with a Go drift gate. Every leaf yaml path on Config{} MUST have a corresponding ### <path> heading in docs/CONFIG.md. The gate (TestConfigDoc_AllFieldsDocumented) walks the struct via reflection — the same shape walkLeaves uses for env-overrides — and fails CI when a new field lands without an entry. The test is deliberately permissive about format (any line starting with ### path satisfies the assertion, trailing text allowed). This pattern is the operator-side companion to the brief-reading rule from §16: documentation lives next to the code it documents, and CI rejects drift.

Why. Without harbor init the V1.1 framework is undiscoverable: a fresh operator clones the repo, runs harbor --help, and sees dev / scaffold / validate / console — but there is no obvious "start here." Adding init as the first surface flips the discovery: harbor init → drops a workflow-explaining README.md → operator follows it. Without docs/CONFIG.md the operator's only path to discovering knobs is to read internal/config/config.go godoc — which is the §6 "DevX is binding" failure mode brief 06 calls out. Without the drift gate, CONFIG.md rots within two phases. Without built-in tools, the smoke-test path for a fresh agent depends on the operator authoring Go code or attaching an MCP server first — neither is a zero-friction first experience.

Findings I'm departing from. None.

Protocol additions. None — harbor init is operator-side; built-in tools are catalog-side; docs/CONFIG.md is documentation.


D-154 — Phase 83o: scaffold reads operator-edited yaml + materialises per-custom-tool Go stubs + --patch preserves operator code

Date: 2026-05-23 Status: Settled (shipping with Phase 83o)

Where it lives: internal/config/config.go (ToolsConfig.Custom, CustomToolConfig); internal/config/validate.go (allowedCustomToolTypes mirror + KnownCustomToolTypes() + validateCustomTools inline in validateTools); cmd/harbor/scaffold/scaffold.go (Options.FromConfigPath, Options.Patch, Result.Skipped, ErrUpstreamConfigInvalid); cmd/harbor/scaffold/render.go (renderProject rewrite + loadUpstreamConfig + renderCustomTools fan-out + copyUpstreamYAML + the projection helpers); cmd/harbor/scaffold/templates/minimal-react/{tool.go.tmpl,tool_test.go.tmpl} (NEW); cmd/harbor/scaffold/templates/minimal-react/agent.go.tmpl (the RegisterTools function); cmd/harbor/cmd_scaffold.go (--from-config / --patch flag wiring + CodeUpstreamConfigInvalid); cmd/harbor/scaffold/scaffold_from_yaml_test.go (the engine-level coverage); cmd/harbor/cmd_scaffold_test.go (the cobra-level coverage); docs/CONFIG.md (tools.custom).

Decision. Phase 83n landed harbor init and made the operator yaml the source of truth, but harbor scaffold still rendered its own self-contained yaml and ignored what the operator just edited. 83o closes the loop. Four settled calls.

1. Scaffold reads the operator yaml by default. Explicit --from-config <path> wins; an empty flag auto-detects ./harbor.yaml in cwd; neither resolved falls through to the template-only path (so the pre-83o "scaffold without init" workflow still works for one-shot quick starts). The yaml is loaded + validated via internal/config.Load — if it doesn't pass the validator, scaffold fails closed with ErrUpstreamConfigInvalid (CLI code upstream_config_invalid). The loaded yaml is then copied VERBATIM into the output project's harbor.yaml (the operator's comments + uncommented LLM block survive; the templated harbor.yaml is the placeholder that the copy overwrites).

2. Custom tools declared in tools.custom materialise as typed Go stubs. New CustomToolConfig shape: name / description / input (map of field: type) / output (same shape). V1.1 type allowlist is intentionally flat — string / integer / number / boolean / []string — operators with complex shapes write Go by hand via inproc.RegisterFunc (the schema deriver already handles arbitrary Go types). The yaml-shorthand cap is a deliberate scope cut: every shape the scaffold supports must round-trip through a deterministic Go type, and nested objects expand the test surface faster than they pay off for V1.1. Each entry produces tools/<name>.go (typed Input/Output structs + stub Handle) + tools/<name>_test.go (round-trip happy path). The validator catches name collisions between tools.custom and tools.built_in so the catalog never sees two registrations under the same name.

3. RegisterTools(cat tools.ToolCatalog) error is the operator's wiring entry point. The generated agent.go includes one function that registers each built-in (calling builtin.Register(cat, [...])) and each custom tool (calling inproc.RegisterFunc[Input, Output] with the operator's typed Handle). The runtime does NOT auto-discover the scaffolded tools — the operator imports the generated tools/ package + calls RegisterTools from their binary's bootstrap. This stays consistent with §1 ("no magic") and §13 ("primitive-with-consumer") — the generated wiring is a consumer the operator chooses to wire, not a runtime that silently scans tools/.

4. --patch is the operator-edit-survival invariant. When set: the existing output dir is accepted (no ErrOutputDirExists), existing files are SKIPPED (listed under Result.Skipped), only new files (newly-declared tools, missing scaffolded files) are written. The skipped list surfaces in the human + JSON output so the operator sees what scaffold left alone. The semantics are deliberately conservative: scaffold NEVER merges, NEVER modifies an existing file. Operators who want diff-and-merge use git. The rationale: an in-place merge would invite the silent-degradation failure mode CLAUDE.md §13 forbids — a "smart" scaffold that re-emits agent.go with a new RegisterTools body could overwrite hand-edited registration calls. Refuse to touch existing files; force the operator to delete (or git-rebase) if they want a fresh re-emit.

Why. Without 83o the four-step workflow (init → edit → validate → scaffold → dev) collapses into "rewrite the yaml twice." The operator runs harbor init, edits the yaml, runs harbor scaffold, and the scaffold ships a fresh placeholder yaml that ignores everything the operator just edited. The result: operators distrust the framework and hand-author everything. 83o makes the operator's edit canonical end-to-end.

Findings I'm departing from. None.

Protocol additions. None — the scaffold flags + Options/Result fields are operator-side; the new yaml field (tools.custom) is internal config; no wire shape changed.


D-155 — Phase 83l: real-bifrost integration tests + production bug fix (snapshot drops CustomProviders / NetworkDefaults / Corrections)

Date: 2026-05-23 Status: Settled (shipping with Phase 83l)

Where it lives: test/integration/phase83l_real_bifrost_test.go (the scripted-server helper + two end-to-end tests); cmd/harbor/cmd_dev.go (three new projection helpers copyCustomProviders / copyNetworkDefaults / disableCorrectionsFromConfig + the snapshot wiring); harbortest/devstack/devstack.go (D-094 mirror).

Decision. The 83l integration test was supposed to be a defensive backfill — the audit-lesson hole-plug from D-151. The first run of TestE2E_RealBifrost_PlannerExecutorTrajectory_HappyPath immediately failed with bifrost: invalid provider: "83l-fake" (allowed native: …; declared custom: (none)). The test was correct; the production code was wrong. cmd/harbor/cmd_dev.go::bootDevStack constructed the llm.ConfigSnapshot by hand, copying only Driver / Provider / Model / APIKey / BaseURL / Timeout / ContextWindowReserve / HeavyOutputThreshold / ModelProfiles — silently dropping cfg.LLM.CustomProviders, cfg.LLM.NetworkDefaults, cfg.LLM.Corrections. The config validator accepted the operator's yaml (a custom_providers[] entry is structurally valid); llm.Open then rejected at boot with the misleading "declared custom: (none)" error because the snapshot it received carried no custom providers. Two settled calls.

1. Fix the bug in the same PR that surfaces it (CLAUDE.md §17.6). §17.6 is unambiguous: when an integration test surfaces a bug, fix it in the same PR, even when the root cause is in a previously-shipped phase's code. The fix lands as three new projection helpers (copyCustomProviders, copyNetworkDefaults, disableCorrectionsFromConfig) at the bottom of cmd_dev.go next to the existing copyModelProfiles, wired into the llm.ConfigSnapshot literal at bootDevStack line 490. D-094 requires the devstack mirror — same three helpers + same wiring at the matching tryAssemble call site. The fix is ~80 lines (helpers + mirror); the integration test surfacing it is ~400.

2. The fake-server pattern is scriptedLLMServer, not a stub LLM driver. The 83l tests stand up a real httptest.NewServer that mimics OpenAI's /v1/chat/completions endpoint, records every request, and replays a scripted JSON-response sequence. This exercises the FULL production path — the bifrost driver opens its HTTP client, the safety wrapper validates the request, the correction layer optionally rewrites, the retry layer handles a hypothetical failure, the response parses back through the same chain. A stub LLM driver (the path the mock takes) skips every one of those layers. Wire-level assertions are the value-prop: Model field present (the 83h V2 regression), the second request's prompt contains the first request's observation (the 83i trajectory regression), the request body parses as OpenAI-compat (the snapshot-projection bug surfaced exactly here). The scripted-server pattern scales to any wire-level invariant a future test needs to assert.

Why. The audit lesson from D-151 was specifically: "test stubs as production defaults on operator-facing seams" is the failure mode CLAUDE.md §13 forbids, read one layer over to integration tests — the mock LLM is a stub-as-default for the wire path. Wave 17 closeout cannot ship without at least one real-bifrost integration test, full stop. The bug surfaced here is the proof that the lesson was real: every Wave 13/14/15 audit "passed" against the mock; the moment a real wire-level assertion ran, the bug fell out within seconds.

Findings I'm departing from. None.

Protocol additions. None — the bug fix is internal Go; the integration test is operator-side coverage.


D-156 — Phase 83m: WARN-band cleanup (eight items, two-bucket parallel agent integration)

Date: 2026-05-23 Status: Settled (shipping with Phase 83m)

Where it lives: Bucket A: internal/tools/drivers/mcp/mcp.go (pushIdentity helper), cmd/harbor/cmd_dev.go (closer-chain appends + scopes wiring), cmd/harbor/cmd_dev_hot_reload.go (extended dbSidecarSuffixes), cmd/harbor/cmd_dev_runloop.go (extractSkillKeywords + grantedScopes plumb-through), internal/config/config.go + validate.go (ToolsConfig.GrantedScopes), internal/devdraft/devdraft.go (no-op Close). Bucket B: internal/llm/safety.go (cfg.Timeout-prefer fix), internal/tasks/tasks.go + internal/tasks/drivers/inprocess/inprocess.go + internal/tasks/conformancetest/conformancetest.go + internal/tasks/protocol/registry_projector.go (Task.ToolCount + IncrementToolCount + projector wire + conformance), internal/planner/planner.go + internal/planner/react/react.go (RunContext.OnReasoning callback), internal/runtime/steering/runloop.go (RunSpec.OnToolDispatched hook + per-step reasoning capture + Step.ReasoningTrace copy), cmd/harbor/cmd_dev_runloop.go + harbortest/devstack/devstack.go (wiring + D-094 mirror). Coordinator-owned: docs/CONFIG.md (### tools.granted_scopes), docs/plans/README.md, docs/glossary.md, scripts/smoke/phase-83m.sh.

Decision. Phase 83m closes the eight WARN-tier items the §17.5 audit + Wave 17 operator validation surfaced. The band ships them together because they share the failure mode "the surface works but a hygiene corner is dead." Three settled calls.

1. Two-bucket parallel-agent dispatch is the right scaling shape for ≥5 disjoint hygiene items. Each bucket maps to a package boundary so the agents touch disjoint files. Bucket A (cmd/harbor + tool drivers): items 1, 2, 3, 4, 6. Bucket B (internal/llm + tasks + steering + planner): items 5, 7, 8. Overlap is narrow + additive — cmd_dev_runloop.go and harbortest/devstack/devstack.go are touched by both, but the additions are in disjoint sections (Bucket A adds the keyword extractor + scopes plumb-through; Bucket B adds the tool-dispatched hook + reasoning callback). Cherry-pick integration auto-merges cleanly; the smoke script catches any wiring drift.

2. Item 8's design — RunContext.OnReasoning callback (option b), not a Decision field (option a). The agent picked the side-channel shape for sound reasons: (a) the Decision sum is the planner→runtime instruction contract every future planner concrete (Deterministic, Workflow, Plan-Execute, Supervisor, MultiAgent, HumanApproval) implements — adding a Reasoning field to most variants pollutes every consumer; (b) reasoning is per-step observation, not per-step instruction, and conceptually lives on the run context rather than the action; (c) D-025 holds because the runloop scopes a per-step stepReasoning local on the goroutine stack with a fresh closure each step (no planner-side mutable state, no stale leak from a prior step). The Decision sum stays sealed; the side-channel is opt-in (planners that don't populate reasoning leave Step.ReasoningTrace empty, exactly the pre-83m behavior).

3. Item 7's interface widening (TaskRegistry.IncrementToolCount) is acceptable because the conformance suite catches it. Adding a method to a long-lived registry interface is a contract change. Phase 83m accepts the cost because (a) the V1 inprocess driver is the only consumer today, (b) the internal/tasks/conformancetest suite gains a new subtest (TestIncrementToolCount) that every future driver must pass, and (c) the increment surface closes a wire field (prototypes.Task.ToolCount) that has been dead since Phase 73h — the Console renders 0 forever otherwise. The N=128 D-025 concurrent-reuse test in the conformance suite asserts atomic correctness.

Why. Each item is a small WARN individually but together they form a quality posture: identity reuse across MCP push events (item 1) was a multi-isolation footgun; the sqlite-main-file watcher (item 2) made harbor dev reboot-loop more often than expected; lifecycle closers (item 3) leaked goroutines; the FTS5 ranker (item 4) got bad recall on full-sentence queries; the per-call LLM timeout (item 5) ignored the operator's harbor.yaml choice; the GrantedScopes plumb-through (item 6) was a stubbed nil-pass that the catalog filter silently accepted; ToolCount (item 7) was a dead wire field; the reasoning trace (item 8) made the ReasoningReplay=text operator knob structurally ineffective. None block V1.1 individually; together they form the kind of "thousand-paper-cuts" backdrop that erodes operator trust.

Findings I'm departing from. None. Bucket A agent surfaced one MCP SDK behavior worth recording: the SDK's Client.callResourceUpdatedHandler does not propagate the per-call subscription ctx through to the registered handler today. The pushIdentity helper closes the latent multi-tenant cross-stamp bug at the boundary we control (preferring ctx-identity when present, falling back to cached default); a future SDK release that threads the subscription ctx through would land transparently with the helper unchanged.

Protocol additions. None — every item is internal. The prototypes.Task.ToolCount wire field existed pre-83m; this phase just produces a non-zero value for it.


D-157 — Phase 83k: make build + release pipeline rebuild Console; placeholder copy reframed for go install operators

Date: 2026-05-24 Status: Settled (shipping with Phase 83k)

Where it lives: Makefile (build target gains console-build dependency; new build-fast for iterative dev); scripts/release-build.sh (calls make console-build before go build); scripts/check-console-bundle.sh (NEW — staleness gate); .github/workflows/ci.yml (wires the gate into the frontend-e2e job); cmd/harbor/cmd_console.go (placeholder page copy refreshed); docs/plans/README.md + docs/plans/phase-83k-console-release-embed.md + docs/glossary.md + scripts/smoke/phase-83k.sh.

Decision. The operator-validation surfaced that cmd/harbor/consoledist/* is gitignored except for .gitkeep (committed in Phase 73m to keep //go:embed happy on a bare checkout). A fresh git clone + go build ./cmd/harbor produces a binary that embeds an empty Console — harbor console serves the synthesized placeholder page, not the real UI. Operators must remember to run make console-build before make build to get a working Console. The release pipeline (scripts/release-build.sh + .github/workflows/release.yml) skips the Console build too, so tagged releases would carry an empty bundle if not for a stale local artifact carrying over. Three settled calls.

1. make build rebuilds the Console first; make build-fast preserves the iterative-dev shortcut. The default make build invocation (operators' "I cloned the repo, let's run it" path) MUST produce a working binary. Adding console-build as a prereq of build makes the dev loop slower but correct. make build-fast is the iterative shortcut for changes that don't touch web/console/; documented in the Makefile comment block. The go build ./cmd/harbor invocation (Go's own canonical command) bypasses Make and is documented as "embeds whatever consoledist/ holds on disk — caveat operator." This is the smallest possible footprint for the binding behavior change: operators who type make build always get a working binary; operators who reach for go build directly get the documented caveat.

2. The release pipeline rebuilds Console before go build. scripts/release-build.sh runs make console-build in its new step 2 before the existing build (renumbered to step 3). The release artifact ALWAYS carries a fresh Console — a tagged release shipping an empty bundle is exactly the "test stubs as production defaults" failure mode CLAUDE.md §13 forbids, applied one layer over to deployment artifacts. The fix is identical in shape to D-093's make protocol-ts-gen-check discipline: generated artifacts that drift from source fail the build LOUDLY.

3. The placeholder page copy is reframed for the go install reality. The pre-83k copy said "run make console-build, then make build" — accurate for repo operators, useless for go install operators (who never cd Harbor and can't make-target). The new copy has three sections: "If you cloned the repo" (just make build now), "If you ran go install" (workaround: clone + make build; long-term: wait for tagged release with embedded Console), "Configuration" (pointers to harbor init + docs/CONFIG.md). Better visual hierarchy, dark mode, code blocks, real links. The placeholder is now an operator-onboarding surface, not a "build the thing" stub.

Why. Without 83k, every operator who tries the framework outside a repo checkout sees a broken Console. The Protocol surface still works (Bearer-token RPC against the dev binary) but the in-browser UI — the entire DX bet — is dark. The release pipeline + Makefile fix makes "I want to try Harbor" a one-command experience; the placeholder rewrite makes "I tried it and the UI is missing" a 30-second self-recovery instead of a GitHub-issue file.

Findings I'm departing from. None.

Protocol additions. None — 83k is build pipeline + operator-facing copy.


D-158 — Phase 83p: Settings page two-group layout closes the F1 add-runtime-form regression

Date: 2026-05-24 Status: Settled (shipping with Phase 83p)

Where it lives: web/console/src/lib/settings/state.svelte.ts (SETTINGS_SECTIONS entries gain a group discriminator; two new exported helpers consoleLocalSections() + runtimePostureSections()); web/console/src/routes/(console)/settings/+page.svelte (cards loop splits into two groups — console-local outside <PageState>, runtime-posture inside it); web/console/tests/settings-page.spec.ts (new test asserting the add-form is reachable in the disconnected state).

Decision. The post-83k visual walkthrough surfaced Bug F1: a fresh harbor console operator sees "Not connected to a Harbor Runtime · Attach one in Settings." on every page. Clicking through to Settings shows the SAME placeholder + a circular link. The + Add Runtime form (ConnectedRuntimesCard) existed in the codebase but the page template wrapped the WHOLE cards loop in <PageState status={settings.status}>. When the operator had no Runtime attached, settings.status === 'disconnected' and <PageState> short-circuited the children render — hiding the form behind the same placeholder it was supposed to help the operator escape.

The bug is purely structural — SettingsState.load()'s own docstring already pinned the intended split: "The Console-local sections (Connected Runtimes, Per-Runtime Auth, Appearance, …) do NOT depend on the runtime posture, only the four read-only posture cards do." The state-machine code is correct; the template ignored its docstring.

Two settled calls.

1. The split lives on the section definitions, not in a template branch. Adding group: 'console-local' | 'runtime-posture' to each SETTINGS_SECTIONS entry — plus the two consoleLocalSections() / runtimePostureSections() helpers — pushes the discrimination into the data model. The template iterates each subset once. A future section addition just sets its group field; the template need not change. This is the §4.4 seam-pattern read one layer over: the data shape carries the dependency, not the consumer.

2. The Settings page is the ONE page where <PageState> cannot wrap everything. Every other Console page can degrade to "disconnected — attach in Settings"; Settings itself MUST NOT degrade the Connected Runtimes section because the connection happens there. The template now reflects this: console-local sections render unconditionally; runtime-posture sections still route through <PageState> (preserving the per-page four-state contract D-121 mandates).

Why. Without 83p the Console is unusable for a fresh harbor console operator: the only path to attach a Runtime through the UI is gated by Settings, but Settings hides the gate behind a "not connected" placeholder. Operators must edit console_default.yaml by hand pre-boot — defeating the entire harbor console zero-config DX bet. This is exactly the "two parallel implementations of the same conceptual feature" trap §13 forbids, read one layer over: two state contracts (state-machine docstring vs template wrapper) drifted, and the operator paid the cost.

Findings I'm departing from. None.

Protocol additions. None — the fix is template + state-file shape only.


D-159 — Phase 83q: Playground sidebar entry + breadcrumb derives from the NAV constant

Date: 2026-05-24 Status: Settled (shipping with Phase 83q)

Where it lives: web/console/src/routes/(console)/+layout.svelte (the NAV constant — adds the Playground entry to the EXECUTION cluster); docs/design/console/CONVENTIONS.md §2 (rewrites the Playground bullet that explicitly declared it was NOT a sidebar entry); web/console/tests/harness.spec.ts + web/console/tests/wave13.spec.ts (cardinality bump + Playground entry assertion).

Decision. The post-83k visual walkthrough surfaced Bug F2 + Nit N1 — Playground route exists, but it was unreachable from the sidebar nav, and the page's breadcrumb showed lowercase playground instead of Playground. The root cause for both: the Console's (console)/+layout.svelte defines the NAV constant (cluster → items) and derives the breadcrumb's crumbLabel from the SAME NAV by matching the first URL segment to an item.href. Missing { label: 'Playground', href: '/playground' } in NAV closed both bugs simultaneously — F2 because the entry now renders in the sidebar, N1 because the breadcrumb lookup now returns "Playground" instead of falling through to the lowercase URL segment.

Why this matters. The fix is structurally satisfying — one entry in one constant closes both bugs. The pre-83q failure mode existed because CONVENTIONS.md §2 declared "Playground is NOT a sidebar entry" (a Phase 73n design call) without anyone updating the doc when the page actually shipped as a Console-bound surface. The decision is a 5-line code change + a doc-truth update.

Findings I'm departing from. None.

Protocol additions. None — Console-internal.


D-160 — Phase 83r: Disconnected-state hygiene + isDisconnected() predicate

Date: 2026-05-24 Status: Settled (shipping with Phase 83r)

Where it lives: web/console/src/lib/connection.ts (new isDisconnected() predicate + DISCONNECTED_TOOLTIP constant); web/console/src/lib/components/ui/PageState.svelte (vertical-centring CSS with min-height: 40vh on disconnected/empty/error branches); web/console/src/lib/components/ui/StatusChip.svelte + StateFacetChips.svelte (new desaturated prop that flips data-kind to neutral); web/console/src/lib/components/runtime/CostRollupCard.svelte (no synthetic $0.00 when disconnected); web/console/src/lib/components/live-runtime/run-composer.svelte (disabled textarea + buttons + tooltip when disconnected); 13 page Svelte files standardised on the predicate; web/console/tests/disconnected-state.spec.ts (new Playwright spec covering W1/W2/W3 + N5/N7/N8/N9/N2).

Decision. The post-83k walkthrough surfaced a cluster of disconnected-state failure modes: action buttons enabled with no Runtime (W2/W3), synthetic $0.00 cost data even when disconnected (W1), two stacked empty-state messages on Tools (N5), inconsistent KPI dashes between Agents and Tools (N4), full-state status chip colors when meaningless (N8), "— 0 artifacts" subtitles when no Runtime attached (N9), and empty-state placeholders hugging the top of the viewport instead of centring (N10). The pattern was the same: each page reached for its own disconnected check, sometimes none at all.

Two settled calls.

1. The predicate lives in connection.ts, not a new helper file. connection.ts already exposes resolveConnection() — adding isDisconnected() and DISCONNECTED_TOOLTIP next to it puts the predicate where consumers already look. Pages compose it via $derived(connection === null) locally. The shared tooltip constant prevents the five-different-strings drift the walkthrough pinned.

2. The shared <PageState> component stays the visual contract. 83r adds vertical-centring CSS to its disconnected / empty / error branches (min-height: 40vh) so the placeholder appears in the middle of the viewport, not hugging the top. Loading keeps min-height: auto so skeleton rows don't stretch.

Bundled production-bug fix (§17.6): during the 83r pass the agent surfaced a pre-83r ESLint break in web/console/src/routes/(console)/settings/+page.svelte line 94 — the placeholder const _ = [consoleLocalSections, runtimePostureSections]; I added in Phase 83p to keep the helper imports alive. ESLint flagged it as unused-variable. Fixed inline (void [...] instead of const _ = [...]) so the new svelte-check pass stays clean.

Findings I'm departing from. None.

Protocol additions. None.


Date: 2026-05-24 Status: Settled (shipping with Phase 83s)

Where it lives: Inline across 13 page Svelte files (canonical "Save view" button + "Save current as…" placeholder + removed inline Disconnected · no Runtime attached footers); web/console/src/routes/(console)/playground/[session_id]/+page.svelte (the playground detail page — only walkthrough-surfaced changes here are 83s-shaped); scripts/smoke/phase-83s.sh enumerates the 13 pages + asserts the canonical label + single-footer invariant per route.

Decision. The walkthrough N2 + N7 nits were both shape-consistent: the same concept (a saved-view save gesture, a disconnected indicator) drifted into eight different phrasings + two stacked indicators across pages. The pre-83s drift was real but not load-bearing — operators could still use the surfaces — but it eroded the visual contract CONVENTIONS.md §3 + §6 explicitly mandate.

Two settled calls.

1. The canonical pair is "Save view" (button) + "Save current as…" (input placeholder). Settled by enumerating the eight pre-83s phrasings and picking the shortest one that reads as an action (verb "Save", noun "view"). The eight drifted phrasings — "Save current as…" / "Save view as…" / "Save view" / "Save snapshot" / "Save filter" / "Save" / "Save current as…" / "Bookmark section" — are all derivatives or shortenings of the same gesture; the canonical pair is one form of the dominant shape.

2. The viewport-fixed ConnectionFooter is the single source of truth for the disconnected indicator. Every per-page inline copy of "Disconnected · no Runtime attached" is removed; pages now show ONE indicator per viewport (the fixed footer) instead of two stacked ones. The fixed footer's identity-aware shape (it already handles the disconnected, partial-scope, and full-attach states) is the canonical surface; the inline duplicates were vestigial.

Why now. N2 + N7 are tied to 83r's disconnected-state pass — both touch per-page footers + filter rows. Shipping them together avoids a second per-page edit pass for unrelated nits.

Findings I'm departing from. None.

Protocol additions. None.


D-162 — Phase 83v: Runtime CORS allowlist with default-deny posture + dev-only escape hatch

Date: 2026-05-24 Status: Settled (shipping with Phase 83v)

Where it lives: internal/protocol/transports/cors/ (new package — Wrap() middleware factory + Config shape + tests); internal/config/config.go (ServerConfig.AllowedOrigins []string + ServerConfig.CORSDevAllowAny bool); internal/config/validate.go (origin shape validation + * rejection unless dev flag); cmd/harbor/cmd_dev.go (wraps the protocol mux + SSE handler at bootDevStack); harbortest/devstack/devstack.go (D-094 mirror); test/integration/phase83v_cors_test.go (cross-origin preflight end-to-end); docs/CONFIG.md (server.allowed_origins + server.cors_dev_allow_any sections with production-security note); scripts/smoke/phase-83v.sh.

Decision. The round-2 walkthrough (Phase 83t) pinned F4: cross-origin requests from the Console (:18790) to a remote Runtime (:18080) were blocked at the browser CORS preflight stage. The repo-wide grep grep -rn 'Access-Control\|cors' --include='*.go' returned zero matches. The D-091 multi-process posture was advertised in the docs but structurally broken at the wire. 83v closes the gap with operator-configurable CORS that defaults to deny.

Three settled calls.

1. Default deny. Empty server.allowed_origins (the default) emits no CORS headers — same-origin only, which preserves the existing co-resident harbor console mode. Operators opt in by listing exact origins. No silent broadening of the wire's reachability.

2. Per-origin echo, never *, in production. The middleware echoes the request's Origin header verbatim after an exact-match check against the allowlist. Access-Control-Allow-Credentials: true (required for the Bearer-token + future cookie auth path) is incompatible with * per the CORS spec, which forces the per-origin shape. The validator rejects * in server.allowed_origins unless the operator ALSO sets the dev-only escape hatch.

3. Dev-only wildcard escape hatch is explicit + loud. server.cors_dev_allow_any: true is the single sanctioned * path, intended for iterative harbor dev workflows where the Console origin changes per browser tab. When enabled, every boot prints a stderr banner ([DEV-ONLY CORS WILDCARD — DO NOT USE IN PRODUCTION]). The validator + the banner + the explicit godoc warning on the field together prevent silent prod leakage.

Why now. The pre-83v state is a §13 forbidden-practice tripwire: two parallel postures (advertised "Console can attach any remote Runtime" + actual "blocked at preflight"). 83v makes the documented posture real and ships it in the same wave as the Console DB chicken-and-egg fix (D-163) so the multi-process surface works end-to-end in one cut.

Findings I'm departing from. None.

Protocol additions. None — this is a transport-layer change, not a method addition.


D-163 — Phase 83u: Console DB chicken-and-egg fix — attachConnection() helper writes localStorage first, DB upsert is best-effort

Date: 2026-05-24 Status: Settled (shipping with Phase 83u)

Where it lives: web/console/src/lib/connection.ts (new attachConnection(baseURL, opts) helper + AttachConnectionOptions interface); web/console/src/lib/settings/console_db.svelte.ts (addRuntime rewires through attachConnection() first + adds private #catchUpAddressBook() invoked from load()); web/console/src/lib/components/settings/ConnectedRuntimesCard.svelte (accepts addWarning + onaddsuccess props + renders info banner); web/console/src/routes/(console)/settings/+page.svelte (wires props + reload-on-success); web/console/src/lib/tests/connection.spec.ts (4 new unit tests); web/console/tests/settings-page.spec.ts (new test (h) — disconnected-boot → Add → reload → connected); scripts/smoke/phase-83u.sh.

Decision. The round-2 walkthrough pinned F3: console_db.svelte.ts::addRuntime called this.#db.runtimes.upsert(...) on a Console DB that required an active RuntimeConnection to derive its per-operator AES key. Operator without a Runtime → no connection → DB stays closed → addRuntime threw "Console DB not open — attach to a Runtime first". The form was reachable (Phase 83p) but structurally non-functional: operator could not attach a Runtime through the UI without first attaching a Runtime through the UI.

Two settled calls.

1. localStorage is the source of truth for the active connection; Console DB is the convenience address book. The Connected Runtimes form's two effects split cleanly: (a) "make the Console talk to this Runtime" → write harbor.runtime.* keys to localStorage (no DB dependency); (b) "remember this Runtime for later" → upsert into the Console DB's runtime_registry table (only works after the DB has unlocked via a connected operator). The form does (a) first, then attempts (b) and degrades to a non-fatal warning if the DB is still locked.

2. Page reload after attach; address-book catch-up on next DB load. A connection change requires a reload (every page subscribes to the connection on mount). The form triggers it explicitly. On the reloaded page the Console DB opens via the now-active connection, and #catchUpAddressBook() runs on load() — if the active connection is not yet in the address book, it's inserted with is_default: 1. The operator's first-attach gesture round-trips through to a persisted address-book entry without a second user gesture.

Why now. F3 is the load-bearing showstopper that blocks the multi-process posture from working at all. D-162 (CORS) makes the wire reachable; D-163 makes the form usable. Without both, the documented "Console attach to remote Runtime" flow doesn't work end-to-end. Shipping them in the same wave is the rule (§13 — no primitive without its consumer; here the consumer of the CORS allowlist is the Settings add-form).

Findings I'm departing from. None.

Protocol additions. None — this is a Console-local layering fix; no wire-shape change.


D-164 — Phase 83w: Friendly unknown_method info banner + mcp.servers.list wire surface

Date: 2026-05-24 Status: Settled (shipping with Phase 83w)

Where it lives:

  • F5 (Console side, Agent B): web/console/src/lib/components/ui/PageState.svelte (new 'info' branch added to PageStatus union); web/console/src/lib/protocol/errors.ts (new isUnknownMethod(err) helper); web/console/src/routes/(console)/live-runtime/+page.svelte + web/console/src/routes/(console)/playground/[session_id]/+page.svelte (special-case unknown_method on topology.snapshot → route to PageState info branch with "Topology view not available on this Runtime — planner/RunLoop runtime, not engine-graph" copy).
  • F6 (Go side, Agent A): cmd/harbor/cmd_dev.go::bootDevStack constructs the Phase 73k MCPSurface from the boot-time *mcp.Registry and threads it into transports.NewMux via transports.WithMCPSurface(mcpSurface); harbortest/devstack/devstack.go (D-094 mirror); internal/mcpconsole/mcpconsole.go (new NoOAuthAccessor type — read-only methods work, OAuth-flow methods fail loudly with ErrNoOAuthConfigured per §13 fail-loud); test/integration/phase83w_mcp_servers_list_test.go; scripts/smoke/phase-83w.sh.

Decision. The round-2 walkthrough pinned two wire-surface gaps that surfaced as scary red ERROR PageStates on the operator's most-used debugging surfaces (Live Runtime + Playground + MCP Connections). F5 was a Console-side error-mapping miss; F6 was a missing Go-side method handler. Both fit naturally in one phase because both are wire-surface coherence and both produce identical operator-visible symptoms (red error on a page that should render fine).

Two settled calls.

1. The 'info' branch is a first-class addition to PageState, not a per-page mapping. Agent B chose option (a) of the plan — add 'info' to the PageStatus union, mirroring the existing four states (disconnected/loading/error/empty/ready). Rationale: two existing call sites need the same shape today, and D-164-style "Runtime does not host this surface" cases are anticipated to recur. Adding ~12 LOC to the single async-state contract is cheaper than duplicating per-page mapping and preserves PageState as the canonical contract per CONVENTIONS.md §4. The info branch carries no Retry button — "Retry" makes no sense for a fundamentally-not-applicable surface.

2. The mcp.servers.list handler reuses the existing Phase 73k MCPSurface — no new package. The *mcp.Registry already exists at boot (Phase 83g); the wire-side MCPSurface dispatcher already exists from Phase 73k. F6 is wiring-only: construct MCPSurface from the boot-time registry + thread it into transports.NewMux. For the V1 harbor dev posture (no OAuth providers), mcpconsole.NoOAuthAccessor provides the read-only access pattern; OAuth-flow methods (start/finish/refresh) fail loudly with ErrNoOAuthConfigured rather than returning a stub. Fail-loud per §13.

Why now. Both gaps surface as red errors on operator-visible pages that worked fine in every prior walkthrough except real-data round-2. The cluster of "Runtime is healthy but the Console shows error" is the most damaging UX regression in the post-Phase-83p surface. F5 + F6 together close it.

Findings I'm departing from. None.

Protocol additions. mcp.servers.list — the read-only list shape (existing prototypes.MCPServerRow / prototypes.MCPServersListResponse). Identity-required; no new scope.


D-165 — Phase 83x: Per-page real-data layout polish + cross-stack created_at / session-row fixes

Date: 2026-05-24 Status: Settled (shipping with Phase 83x)

Where it lives:

  • Console side: web/console/src/lib/components/live-runtime/status-counter-strip.svelte (N14 "(now)" suffix + W10 status derivation); web/console/src/lib/components/tasks/KanbanBoard.svelte + web/console/src/lib/protocol/tasks.ts (W7 Complete column); web/console/src/lib/components/tools/ToolOverviewCard.svelte (N12 "In-flight (now)" relabel + N13 --size-col-reliability width token); web/console/src/lib/tokens.css (new column-width token); per-page +page.svelte files for agents (W11), artifacts (W5 grid layout), events (W9 driver-name copy), live-runtime (W10), memory (W4 ellipsis), overview (N11 "(now)" suffixes), tools (N12/N13).
  • Go side: cmd/harbor/cmd_dev_executor.go::projectForLLM (W6 created_at: time.Now().UTC() on heavy-tool artifact promotion); internal/protocol/artifacts.go::handlePut (W6 created_at: s.clock() on artifacts.put upload); cmd/harbor/cmd_dev.go::bootDevStack (W8 idempotent dev-session Open after registry construction — swallows ErrSessionAlreadyOpen).
  • Tests + smoke: web/console/tests/tasks-page.spec.ts (5-column kanban); scripts/smoke/phase-83x.sh (170-line static tripwire across all 12 items).

Decision. The round-2 walkthrough pinned 12 polish items (W4-W11 + N11-N14) — none individually a showstopper, together a "every page has a paper cut" backdrop that erodes operator trust. Two items (W6 + W8) span Console + Go because the symptom is on the Console but the root cause is in the Go-side data source. Per §17.6, fix both sides in the same PR.

Two settled calls.

1. Empty-state copy carries the operator hint, not a "fix it for you" code path. W9 (events) + W11 (agents) both surface "this Runtime is configured for a different posture; that's not a bug." Rather than auto-switching the events driver or auto-registering a synthetic agent row, the empty-state copy names the configuration knob (events.driver: durable) or the posture (synthetic-default agent). Operators learn the model rather than chasing a phantom bug.

2. The W10 status derivation reads the live status-counter strip, not the page-level PageStatus. A topology.snapshot failure pre-83x poisoned the session-detail right-rail with Status: error even though the task itself completed cleanly. 83x derives the session status from the strip's aggregate counts — Complete if any completed task is present, Running if any in-flight, etc. The page's own PageStatus (which reflects topology fetch outcome, not session state) no longer drives the rail's Status field.

Why now. Round-3 walkthrough validates the multi-process posture (D-162 + D-163 + D-164) end-to-end; 83x ensures the per-page surfaces it lands on read honestly under real data. Shipping all four phases (83u + 83v + 83w + 83x) in one wave gives round-3 a clean target.

Findings I'm departing from. None.

Protocol additions. None — W6's created_at field already exists on prototypes.Artifact; the change is to populate it.


D-166 — Round-7 F11: Playground multimodal artifact input — runtime inlines image bytes, per-MIME dispatcher routes the rest

Date: 2026-05-25 Status: Settled (shipping with round-7 F11)

Where it lives:

  • Wire types: internal/protocol/types/control.go::StartRequest.InputArtifactIDs (new []string field, omitempty); round-trip test in internal/protocol/types/types_test.go.
  • Tasks subsystem: internal/tasks/tasks.go::SpawnRequest.InputArtifactIDs + tasks.Task.InputArtifactIDs (persisted on the FSM record); internal/tasks/drivers/inprocess/inprocess.go folds the slice into spawnRequestContentHash + spawnRequestsEqual so idempotency keys still distinguish same-key/different-attachments correctly.
  • Tool descriptor: internal/tools/tools.go::Tool.HandlesMIME (new []string field) + Tool.MatchesMIME(mime string) helper supporting type/* wildcards.
  • Planner materialization layer: internal/planner/multimodal.goMaterializeInputContent(goal, []InputArtifactView, ToolCatalogView) llm.Content, the per-MIME dispatcher. internal/planner/planner.go::RunContext.InputArtifacts carries the pre-resolved views.
  • ReAct integration: internal/planner/react/prompt.go first-turn user message uses the materializer (replaces the unconditional textContent(userContent) wrap).
  • Run loop: internal/runtime/steering/runloop.go clears spec.Base.InputArtifacts after the first step so subsequent steps see an empty slice (no re-inlining of bytes across the run's planner loop).
  • Pre-fetch wiring: cmd/harbor/cmd_dev_runloop.go::perTaskRunLoopDriver.resolveInputArtifacts (reads task.InputArtifactIDs, calls ArtifactStore.GetRef for metadata + Get for image bytes); cmd/harbor/cmd_dev.go::bootDevStack plumbs the shared artStore into the driver; harbortest/devstack/devstack.go D-094 mirror.
  • Console: web/console/src/lib/protocol/client.ts::ControlNamespace.start gains inputArtifactIDs?: string[]; web/console/src/routes/(console)/playground/[session_id]/+page.svelte::buildChatClient.sendMessage plumbs the composer's chat-attach uploads through.
  • Tests: ten unit tests covering the materializer's per-MIME branches (image inline / pdf file-part / audio file-part / catch-all stub-text), nil-catalog defense, mixed-attachment ordering, empty-goal text-elision, and handles_mime Fetch.Tool population (internal/planner/multimodal_test.go); MIME-matcher test (internal/tools/tools_test.go::TestTool_MatchesMIME).

Decision. When the Playground operator uploads a file alongside a chat message, the runtime materializes the multimodal llm.Content BEFORE handing the prompt to the planner. The per-MIME dispatcher routes:

  • image/*llm.ImagePart{DataURL: data:<mime>;base64,<bytes>} — bytes inline so vision-capable providers actually see the image (Path 1 below).
  • application/pdfllm.FilePart{Artifact: &ArtifactStub{...}} — providers with native PDF (Anthropic) translate the ref; providers without get the canonical ArtifactStub-JSON text description.
  • audio/*llm.AudioPart{Artifact: &ArtifactStub{...}} — same graceful-degradation rule.
  • everything else → ArtifactStub text block on the user message — the LLM reads the stub JSON (ref + MIME + size + optional Fetch.Tool pointer) and routes to a matching tool via the catalog.

The Fetch.Tool annotation on every emitted ArtifactStub is populated from the supplied ToolCatalogView: the first tool whose HandlesMIME matches the artifact's MIME wins. Operators register audio.transcribe once with HandlesMIME: ["audio/*"] and the LLM gets an explicit "use this tool for this ref" hint — no LLM-side catalog-discovery guesswork.

Three settled calls.

1. Path 1 (runtime inlines image bytes) over Path 2 (driver-side resolution). Path 1 keeps the LLM driver layer unchanged — bifrost's existing translateImagePart already forwards ImagePart.DataURL to the provider via its native image block. Path 2 would have required every driver to grow an ArtifactStore handle. The trade-off: Path 1 violates D-026 ("no inline bytes") in spirit FOR INPUTS. The carve-out is deliberate: D-026 was written for the heavy-output flood that returning a 50MB tool result inlined as text would cause; operator-uploaded inputs are explicit, single-shot, bounded by the upload size cap, and the bytes have to reach the provider one way or the other. The safety net's materializeRequest STILL fires when an input DataURL crosses the heavy-output threshold (32KB default) — large image inputs get rewritten back to ArtifactStub form, preserving the safety contract. The Path 1 carve-out is therefore: small inputs inline (the common case); large inputs round-trip through the existing materializer pass (graceful degradation to ref-as-text).

2. The per-MIME dispatcher lives in the planner package, not in the run loop or LLM driver. Three reasons. (a) The planner is the unit that owns the prompt-assembly contract; routing a Content sum-type by MIME is a prompt concern. (b) The dispatcher is pure (MaterializeInputContent takes pre-resolved views; no I/O), so it stays inside the planner's synchronous prompt-assembly path. (c) Future planners (PlanExecute, Workflow, ...) reuse the same dispatcher; pushing it into the runloop would force per-planner duplication.

3. HandlesMIME is an opt-in descriptor field, not a registry mechanism. Operators register the tool with the MIME(s) it consumes; the materializer reads the catalog at prompt-build time. No registry / no global map / no init-side hook. Wildcards are bounded (type/* only — no full-*/*, no subtype glob) so an operator typo can't accidentally claim every MIME on the planet. Empty HandlesMIME keeps the legacy V1 behaviour: the LLM finds the binding via the catalog description, the Fetch.Tool annotation stays nil.

The mid-run user_message gap (deliberate V1.1 scope). The Round-6 F10 queue-vs-steer feature lets an operator inject a user_message mid-run. The user_message payload today carries only {message: string} — extending it to carry attachments would (a) require an addition to the steering verb's wire shape (internal/runtime/steering/taxonomy.go), (b) thread the artifact refs through ControlSignals.UserMessages (currently []string), and (c) materialize on the appropriate planner turn (NOT the first turn — the carry-over is later). All three are tractable but they touch the steering inbox semantics which the V1.1 round-7 cut deliberately leaves alone. The Console's sendMessage throws a clear error when the operator selects 'steer' with attachments — no silent degradation.

Why now. F11 was the deferred half of the round-6 Playground walkthrough (the F7 commit left a TODO multimodal marker in sendMessage). The user explicitly asked for it; the F10 queue-vs-steer feature shipped the composer's attach control surface; the architectural blocker (D-026 inline-bytes interpretation for inputs vs outputs) was the only design question, which Path 1 resolves cleanly.

Findings I'm departing from. None.

Pre-existing limitation, not caused by F11. The bifrost+OpenRouter+anthropic/claude-haiku-4.5 vision path returns HTTP 400 from the upstream provider — the existing bifrost/conformance_test.go::TestE2E_Bifrost_LiveSixProviderConformance/multimodal subtest fails identically against this build, with or without F11's planner-side changes. The F11 materialization pipeline is verified correct via the unit-test branches AND by the live CSV round-trip (operator uploads text/csv; LLM sees the ArtifactStub correctly and responds about the attached file). Image-input via OpenRouter for the specific haiku-4.5 model is a separate provider/driver issue worth its own bug filing; image-input via a different vision-capable provider should work because the request shape is provider-canonical.

Protocol additions. StartRequest.InputArtifactIDs []string (json:"input_artifact_ids,omitempty") — opt-in; text-only starts elide the field from the wire body entirely (the omitempty tag honors the V1 wire shape).


D-167 — Phase 107c: native provider tool-calling cutover for the React planner + deferred-loading meta-tools

Date: 2026-05-28 Status: Settled (shipping with Phase 107c)

Where it lives:

  • LLM wire surface: internal/llm/llm.go — new types ToolDeclaration{Name, Description, Schema json.RawMessage} and ToolCallStructured{ID, Name, Args json.RawMessage}; new fields CompleteRequest.Tools []ToolDeclaration + CompleteRequest.ParallelToolCalls bool + CompleteResponse.ToolCalls []ToolCallStructured; new ChatMessage.ToolCallID *string field that round-trips a provider call id back into the next-turn RoleTool message.
  • LLM safety: internal/llm/safety.go extends the heavy-output guard so a ToolCallStructured.Args payload above the heavy threshold trips ErrContextLeak the same way an oversize tool result would; new internal/llm/errors.go sentinels (ErrToolCallArgsTooLarge etc.) keep the failure mode named.
  • bifrost driver: internal/llm/drivers/bifrost/translate.go maps CompleteRequest.Tools + ParallelToolCalls → upstream tool block and assembles CompleteResponse.ToolCalls from the upstream tool_calls array; translate_test.go pins the bidirectional shape against scripted JSON.
  • Tools subsystem: internal/tools/tools.goLoadingMode enum (LoadingAlways default, LoadingDeferred) on Tool.Loading; CatalogFilter.LoadingModes defaults to [LoadingAlways] for the prompt-time view; internal/tools/catalog.go grows a Catalog.Search(ctx, query, tags, limit) method backed by the new SearchCache.
  • Search cache: internal/tools/drivers/searchcache/ — SQLite FTS5-backed driver (regex fallback for non-FTS5 builds) mirroring internal/skills/drivers/localdb/. Schema-migrated, fingerprint-deduped, refreshed on every catalog sync.
  • Built-in meta-tools: internal/tools/builtin/tool_search.go, tool_get.go, skill_search.go, skill_get.go, declarative_action.go (off by default; opt-in escape hatch dispatching through the existing repair.ActionParser), plus the always-loaded artifact_fetch.go for heavy-output recovery. All register through the existing builtin.Register seam (Phase 83n / D-153). Default-enabled four: tool_search, tool_get, skill_search, skill_get. Default-disabled: declarative_action.
  • Planner: internal/planner/planner.go — new per-run RunContext.DiscoveredTools []string + RunContext.PendingToolCalls []ToolCallDeferred + RunContext.OnPendingToolCalls callback (the runloop's stack-local bridge keeping AC-19's serialization fallback alive across steps without leaking onto the shared planner artifact, per D-025).
  • React planner: internal/planner/react/react.go swaps the JSON-from-Content ActionParser for a ToolCallProjector that reads resp.ToolCalls directly; the repair.ActionParser is retained but only fires through the declarative_action meta-tool. internal/planner/react/prompt.go (the 1k-LOC prompt assembler) drops <action_format>, narrows <available_tools> to {name, description} (schemas now live in req.Tools[]), and adds a <tool_discovery> section instructing the LLM about the deferred-loading two-turn cycle; prompt_test.go + testdata/golden_default_prompt.txt re-pin the rewritten shape. The reserved _finish discriminator is RETIRED from the prompt entirely — the model produces a Finish by returning Content with empty ToolCalls[].
  • Trajectory: internal/planner/trajectory/trajectory.goStep.Action (still any) now stores the structured planner.CallTool complete with the provider CallID; internal/planner/react/prompt.go's next-turn message builder projects each captured CallTool into the matched assistant-with-tool-calls + RoleTool message pair the providers expect.
  • Runloop wiring: internal/runtime/steering/runloop.go captures the new OnPendingToolCalls callback per step and writes the queue back into spec.Base so subsequent value-copy steps see the residue.
  • Config: internal/config/config.go adds ToolEntryConfig.LoadingMode (yaml: loading_mode); internal/config/validate.go rejects unknown values pre-boot; docs/CONFIG.md already carries ### tools.search_cache_dsn.
  • Bootstrap: cmd/harbor/cmd_dev.go::bootDevStack constructs the SearchCache + attaches it to the catalog; cmd/harbor/cmd_dev_executor.go wires the OnPendingToolCalls closure through the per-step driver; cmd_dev_executor_preview_test.go is the cross-driver regression gate.
  • Tests: internal/planner/react/projector_test.go (Decision mapping); internal/planner/react/integration_test.go (the AC-26 two-turn discovery cycle); internal/planner/react/concurrent_test.go (N=128 concurrent reuse against one planner under -race); internal/llm/drivers/bifrost/native_toolcall_integration_test.go (AC-28 live provider — SKIP without provider key); internal/llm/drivers/bifrost/translate_test.go; internal/tools/builtin/*_test.go for each new meta-tool.

Decision. Phase 107c is a deliberate cutover from prompt-engineered tool-calling (the brief 07 path — every Decision shape parsed out of resp.Content by repair.ActionParser) to native provider tool-calling for the React planner concrete. The LLM client now carries a structured Tools[] declaration on every turn and a structured ToolCalls[] array on every response; the React planner reads ToolCalls directly via a ToolCallProjector. The prompt sheds its <action_format> JSON-shape instruction, narrows <available_tools> to a quick-reference (the schemas live in the typed req.Tools[]), and grows a <tool_discovery> section the LLM uses to find deferred tools through four built-in meta-tools (tool_search, tool_get, skill_search, skill_get). The compatibility seam for providers without reliable native tool-calling is the optional declarative_action meta-tool — off by default, opt-in, deferred-loaded; when an operator enables it the LLM can fall back to the prompt-engineered {tool, args} shape exactly once per turn through a single deferred surface. Six settled calls.

1. Reverse brief 07's "LLM driver layer never touches tools=" principle — but only for the React planner concrete, and only at the bifrost mapping layer. Brief 07 was load-bearing for V1's uniformity guarantee: the planner emitted {tool, args} JSON, every provider that could speak JSON could be a Harbor target, and parallel tool calling worked uniformly through the runtime's CallParallel mechanism rather than per-provider tool-call wire shapes. The principle's value was a settled mapping layer that nothing else had to know about. After two waves of operator validation against the real bifrost+OpenRouter+Anthropic path, the cost side of the principle finally outweighed the benefit: every provider Harbor cares about has converged on a compatible structured tool-call shape, and the JSON-in-Content parser is brittle in exactly the ways native tool-calling solves by construction (no escape-character confusion, no fence/no-fence drift, no half-streamed-JSON edge cases, no <action_format> instructions to fight against RLHF). The reversal is targeted: only the React planner concrete adopts the native path; the Deterministic planner stays text-only (its Tools[] is nil, and AC-2's nil-short-circuit preserves its pre-107c behavior). The declarative_action escape-hatch tool preserves brief 07's parser path verbatim for the carve-out (local Llama / Mistral / weaker fine-tunes without reliable tool-calling) — the parser stays in tree, lifted into one tool's body instead of the planner's primary input shape. The cost of the reversal — uniformity is now a bifrost-layer concern rather than a planner-layer concern — is ~80 LOC of translate.go mapping plus translate_test.go's round-trip pins, paid once.

2. The CallParallel serialization fallback is the default V1.1.x behavior; the executor's parallel dispatch lands in a follow-up phase. The dev executor's ErrDecisionShapeUnsupported branch for CallParallel decisions is a documented post-V1.1 deferral (cmd/harbor/cmd_dev_executor.go:100); shipping the executor's goroutine-fanout + JoinSpec evaluation + per-branch identity propagation is a separate body of work outside this plan's scope. Phase 107c takes the defensive posture per the plan's "Critical scope constraint" section: when the LLM emits N>1 native ToolCalls in one response, the React planner emits CallTool for the FIRST call and records the rest on the new RunContext.PendingToolCalls. The next Next() step consumes the queue head before consulting the LLM again; the LLM perceives one call per turn, the runtime sees one CallTool per turn, the operator gets correct semantics with sequential dispatch. The plumbing is the OnPendingToolCalls callback — a stack-local closure the runloop captures and writes back into spec.Base (the planner's value-copy step boundary), so per-run state crosses step boundaries without ever touching the shared planner artifact. D-025 holds cleanly. When the executor's CallParallel branch lands (Phase 110z or equivalent), the operator opts in via planner.react.parallel_tool_calls: true and the planner emits the native CallParallel decision instead — single-line opt-out, no client-visible wire change.

3. Deferred loading travels through the catalog filter, not through prompt mode-switching. The plan's tools.Tool.Loading field plus CatalogFilter.LoadingModes (defaulting to [LoadingAlways] for the prompt-time view) were already declared as latent primitives in the tools package. Phase 107c wires them: the React planner builds req.Tools[] from the always-loaded subset + the always-loaded meta-tools + the per-run RunContext.DiscoveredTools; deferred tools are absent from the catalog unless the LLM names them through a tool_search result. The discovered set accumulates within one run and resets at run start (a fresh run rediscovers as needed). The two-turn discovery cycle is structural: turn N the LLM calls tool_search, turn N+1 the planner has appended the discovered tool to Tools[], the LLM calls it. Same-turn race (the LLM emits BOTH a tool_search AND a call to the tool it expects to find — provider rejects because the second tool isn't declared) is naturally guarded by the serialization fallback (only the head of PendingToolCalls dispatches per turn). The <tool_discovery> prompt section names the two-turn cycle explicitly.

4. The declarative_action escape-hatch is the only seam that preserves brief 07's parser, and it's a single deferred tool. Brief 15 sketched a two-planner-concretes carve-out (a react-native package alongside the existing react). Phase 107c collapses this to ONE concrete with one optional deferred meta-tool. The escape-hatch tool accepts a {tool, args} JSON body and dispatches through repair.ActionParser + the runtime tool executor, returning the dispatched tool's observation as if the LLM had called the tool natively. Operators with non-tool-calling providers opt the tool in via tools.built_in: [declarative_action] and the LLM discovers the structured-action shape when needed. The Decision sum (CallTool / CallParallel / SpawnTask / AwaitTask / RequestPause / Finish) is unchanged — brief 15 §6 "Decision-sum invariance" holds verbatim. The reserved discriminator _finish is RETIRED from the prompt and the projector; a model that wants to finish returns Content with empty ToolCalls[], which the projector maps to Finish{Goal, Payload: Content}. The _finish name survives ONLY inside declarative_action's body for the parser's backward compatibility with the brief-07 shape.

5. Tool results round-trip as provider-typed RoleTool messages, not as user-role text. Brief 07's prompt-engineered path rendered each prior tool observation as a user-role text block. Native tool-calling requires the provider's typed shape — an assistant message carrying the original ToolCalls[i] followed by a RoleTool message with matching ToolCallID. The React prompt builder projects each trajectory.Step whose Action is a CallTool AND whose Observation is non-nil into this pair; ChatMessage.ToolCallID *string (the new field) round-trips the id. trajectory.Step.Action (still any) carries the structured CallTool complete with CallID — no new trajectory wire-shape change, just field presence. The bifrost driver's existing translateMessages path handles the typed message shapes; the runtime emits tool.invoked events with the existing {name, args} payload sourced from the new structured CallTool (no wire-shape change to the event surface).

6. artifact_fetch is always-loaded — the LLM-edge heavy-output guard's recovery surface. D-026 plus the LLM-edge ErrContextLeak rewriter materialize heavy tool results to the artifact store and replace the LLM-facing observation with a short head-bytes preview + a positional footer naming the artifact_fetch built-in and the ref. Phase 107c registers artifact_fetch as an always-loaded built-in (LoadingMode: LoadingAlways) so operators who opt it in via tools.built_in: [artifact_fetch] get the recovery path without needing tool_search to find it. The tool takes {ref, max_bytes?} (default 64 KiB, hard cap 1 MiB), reads the artifact under the run's (tenant, user, session) scope, and returns {ref, mime, size_bytes, content, truncated}. Cross-tenant reads are rejected by the artifact store with a soft "not found" — the regression gate is internal/tools/builtin/artifact_fetch_test.go::TestArtifactFetch_CrossIdentity_RejectedByStore.

Why. Two failure modes Phase 107c closes by construction: (1) <action_format> instructions in the prompt fight against modern LLMs' RLHF for native tool-calling, and the JSON-in-Content parser path accumulated a growing pile of repair-loop salvage patterns (multi-action salvage, fence detection, bare-array decoding) that exist solely to handle the LLM's drift away from the prompt-engineered shape. Native tool-calling is one structural fix for all of those drifts. (2) The catalog scales beyond the prompt-budget ceiling. A V1.1.x operator with 50+ tools — common for the dev-loop + a couple of MCP servers + a few skills — was already at the edge where rendering every tool's full schema in every prompt either blows the token budget or forces operators to trim catalog ambition. Deferred loading + meta-tools collapse the catalog to its always-on essentials; everything else is one tool_search away. Brief 15's path B is what this phase implements.

Findings I'm departing from. Brief 07's settled "no tools= at the LLM driver layer" principle is the load-bearing reversal — the rationale is item (1) above and the carve-out is the declarative_action escape hatch (preserves brief 07's parser path for non-tool-calling providers). Brief 15's two-planner-concretes shape is the second departure — collapsed to one concrete with one optional escape-hatch meta-tool per item (4). Both departures are documented in the plan's "Brief findings incorporated" section as Phase 107c's deliberate scope.

Protocol additions. None — the LLM wire shape (Tools[], ToolCalls[], ToolCallID) lives in internal/llm/, not on the Protocol surface. The audit event payload (tool.invoked) carries unchanged {name, args} fields sourced from the new structured CallTool. The CLI / Console / Protocol method surfaces are unchanged.

Known limitation, named here. Same-turn N>1 native ToolCalls serialise through PendingToolCalls; the executor's CallParallel branch lands in a follow-up phase (110z or equivalent). Operator yamls today rely on the runtime's sequential dispatch — the prompt-engineered multi-action salvage Phase 47 surfaced is unreachable after Phase 107c (the parser no longer fires on the native path), but the salvage's downstream CallParallel emission becomes a no-op rather than a regression because the executor was already rejecting the shape. The follow-up phase adds the executor branch + flips the planner's default to native CallParallel emission; the serialization fallback becomes a single-knob opt-out.

Live-test coverage. AC-28 ships internal/llm/drivers/bifrost/native_toolcall_integration_test.go against a real provider — SKIP when no provider key. The integration test elicits one tool-call and asserts resp.ToolCalls is non-empty + resp.Content is the model's preamble (or empty). The two-turn discovery cycle is covered by AC-26's internal/planner/react/integration_test.go::TestReactPlanner_NativeToolCall_DiscoveryCycle (scripted streaming LLM). The N=128 concurrent-reuse test is internal/planner/react/concurrent_test.go::TestReactPlanner_NativeToolCall_NoCrossTalk (D-025 cross-package gate under -race).


D-168 — 85-band re-plan against the MCP 2026-07-28 release candidate: cut sampling / roots / original Tasks, slim logging, defer elicitation + conformance + Apps, add 85m for cross-cutting RC adoption

Date: 2026-05-28 Status: Settled (master plan updated; phase plan files retained as historical context for cut phases)

Trigger. The MCP Foundation published the 2026-07-28 release candidate on 2026-05-21 (final spec drops 2026-07-28; Tier-1 SDKs ship support within a 10-week RC window, ≈ late July–early August 2026). The RC deprecates three capabilities Harbor's 85-band was building operator-facing surface against (sampling, roots, logging), redesigns Tasks from an experimental core feature into a standalone extension with a new method set (tasks/list removed; new lifecycle around tools/call returning a task handle, then tasks/get / tasks/update / tasks/cancel), removes the protocol-level session handshake (initialize / initialized + Mcp-Session-Id), flips the resource-not-found error code (-32002-32602), restructures server-to-client requests (SSE elicitation replaced by InputRequiredResult with inputRequests / requestState + retry with inputResponses), and adds six authorization-hardening SEPs (iss validation per RFC 9207 / SEP-2468; DCR application_type / SEP-837; credential binding to issuer / SEP-2352; OIDC refresh-token docs / SEP-2207; scope accumulation during step-up / SEP-2350; .well-known suffix clarification / SEP-2351). The deprecations are annotation-only (functional for 12+ months) but committing operator surface to features on a 12-month death clock is bad investment; the breaking changes (sessions, headers, errors, schema, cache, trace) need a dedicated adoption phase.

Where it lives.

  • Master plan (docs/plans/README.md): Phase index (lines 152–162) updated — 85a / 85b / 85f marked Ready now (with 85b's scope ↑ for the RC auth SEPs and 85f slimmed to drop logging); 85c / 85e / 85h / 85i marked Cut; 85d / 85m marked Revisit after SDK-RC; 85g / 85j marked Revisit after RC-final. New row 85m added. Per-phase detail block (lines 1055–1072 post-edit) rewritten with the RC re-plan header, per-phase verdict + readiness, and an explicit revisit trigger on each. Cross-cutting references (numbering line 9, V1 critical-path paragraph line 191, V1 conclusion line 1177, post-V1 deferrals block line 1182) updated for consistency.
  • Phase plan files for cut phases retained: docs/plans/phase-85c-mcp-sampling-provider.md, docs/plans/phase-85e-mcp-roots-provider.md, docs/plans/phase-85h-mcp-tasks-wire-types.md, docs/plans/phase-85i-mcp-tasks-client.md are NOT deleted — they remain as historical context. The master plan's Status column is the canonical "do not implement" signal.
  • New phase plan to author: docs/plans/phase-85m-mcp-rc-2026-07-28.md — stub created from _template.md; absorbs the RC's cross-cutting breaking changes.
  • Lettering note: 85l was skipped to avoid l/I/1 ambiguity next to the existing 85i row; the new phase is 85m.

Decision. The 85-band re-shapes against the MCP 2026-07-28 RC as follows. Five settled calls.

1. Cut sampling (85c) and roots (85e) entirely — the RC's replacements are what Harbor already has. The RC deprecates sampling/createMessage; the replacement is "direct LLM provider API integration" — which is what llm.LLMClient already is. Building a CreateMessageHandler, a pause-gated review surface, modelPreferences resolution, multimodal mapping and tool-enabled sampling would ship operator-facing surface (config knobs, Console review UI, audit shapes) for a feature with a 12-month EOL. Servers needing an LLM bring their own provider per the RC's guidance. Similarly, the RC deprecates roots; the replacement is "tool parameters, resource URIs, or server configuration." 85e was scoped to ship a real operator-config-driven roots provider that replaced 85a's honest-empty stopgap — but the honest-empty advertisement is now the permanent posture, not a stopgap. Both plan files stay as historical context; neither phase implements.

2. Cut the original Tasks pair (85h + 85i) — the RC's redesign makes the 2025-11-25 hand-transcription wrong. 85h hand-transcribed the experimental 2025-11-25 Tasks surface into Go types (the same pattern as the A2A wire shapes), and 85i built the requestor's poll loop, tasks/list consumption, and input_required → elicitation composition on top. The RC moves Tasks from an experimental core feature into a standalone extension; tasks/list is removed; the new lifecycle is tools/call returns a task handle that the client drives with tasks/get / tasks/update / tasks/cancel; the input_required state collapses into the new InputRequiredResult mid-call retry pattern rather than a status transition. Hand-transcribing the old shape now locks in code that the extension SEP + Dockyard's Go port + the SDK update will all diverge from. The right move is to wait for the extension SEP to stabilize, then Dockyard's port, then SDK support — and refile Tasks as a NEW band, not as 85h/85i. Both plan files stay as historical context; neither phase implements.

3. Slim 85f to drop the logging slice. 85f bundled four small server-side features Harbor's client driver currently ignores: completions, logging, resource templates, progress. The RC deprecates logging/setLevel + notifications/message and points to stderr (stdio) or OpenTelemetry (structured) as replacements — both of which Harbor already has via slog + the telemetry stack. The other three (completions, templates, progress) are unaffected and ship as planned. The slim 85f stays Ready now against go-sdk v1.6.0.

4. Add 85m as the cross-cutting RC adoption phase. Seven items the RC introduces are cross-cutting and don't fit any existing 85-band phase:

  • Remove initialize / initialized handshake plumbing and Mcp-Session-Id header dependence; client info moves to per-request _meta.
  • Streamable HTTP: set Mcp-Method and Mcp-Name headers on every outbound request; assert server reject-on-mismatch.
  • Error code flip: every -32002 (resource-not-found) callsite → -32602 (Invalid Params).
  • Server-to-client request restructuring: server-initiated requests only issuable while server is actively processing a client request; SSE elicitation polling removed (composes with 85d's rewrite).
  • JSON Schema 2020-12 (SEP-2106): full draft support in tool / resource-template schema validation.
  • Cache directives (SEP-2549): respect ttlMs and cacheScope on list / resource reads.
  • W3C Trace Context propagation (SEP-414): wire Harbor's existing OTel traceparent / tracestate / baggage into MCP _meta.

These need go-sdk RC support; the plan can be authored against the RC SEPs now so implementation can start the day the SDK lands.

5. Defer 85d (elicitation) until SDK-RC; defer 85g (Apps) and 85j (conformance) until RC-final. 85d's form vs URL mode distinction and the secret-rejection rule survive the RC; the wire mechanic does not — SSE-based elicitation is replaced by InputRequiredResult with a requestState echo and retry. The pause/resume primitive integration is still conceptually right but the round-trip flow needs a redesign pass against the RC and SDK support. 85g (MCP Apps host) is SDK-independent (Console-side TS), but the RC's extension-stabilization policy may reshape _meta.ui.resourceUri or move Apps into a versioned extension; verify after RC-final (2026-07-28). 85j (conformance harness + scoped compliance statement) targets the RC, not 2025-11-25 — the statement's wording obligation (never "fully compliant" unqualified) survives, but the enumerated capabilities drop the cut areas and add 85m's items; the harness lands after the dependent phases ship.

Why. Two failure modes the re-plan closes by construction: (1) committing operator-facing surface (config knobs, Console UI, audit shapes, identity-scoping work) to deprecated capabilities is debt that has to be retracted within 12 months — better to skip the investment entirely than to ship-then-deprecate. (2) Hand-transcribing the 2025-11-25 Tasks shape now produces code that the extension SEP, Dockyard's port, and SDK support will all diverge from — three independent moving targets, none stable. Cut, wait, refile. The re-plan also tightens 85b's scope to include the six new auth SEPs in one PR rather than two — the auth-hardening changes compose; splitting them gains nothing.

Findings I'm departing from. Brief 14 §4's "biggest gaps" list cited sampling (#5), roots (#26), and the full Tasks surface as priority closures for the 85-band. The RC's deprecation/redesign of all three reverses brief 14's prioritisation for those specific phases — the gaps remain real today but invest into surface that will be deprecated before any operator can rely on it. Brief 14 §4's compliance-statement wording ("MCP 2025-11-25 core-compliant, with stdio + Streamable HTTP transports, OAuth for remote servers, Roots, Sampling, Elicitation, Tasks, and MCP Apps support") is superseded by an RC-target statement that drops the cut areas and adds 85m's items.

Protocol additions. None at this layer — the re-plan only re-shapes the master plan and adds a new phase stub. 85m, 85b's expanded scope, and 85d's rewrite will each carry their own Protocol implications when authored.

Known limitation, named here. The deprecated capabilities (sampling, roots, logging) remain functional for 12+ months per the RC's lifecycle policy. Harbor still recognises and responds correctly to server-initiated roots/list, sampling/createMessage, and notifications/message traffic through the existing go-sdk wiring — the re-plan's cut is "do not invest more operator-facing surface," not "rip out wire-level recognition." Phase 85a's honest-empty roots capability advertisement is the permanent posture. If a future RC removes the deprecated methods outright, a new phase will retract the recognition; that's beyond this re-plan's horizon.

Cross-references. The MCP RC blog post is at https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/. The re-plan was prompted by the 2026-05-28 review of that post against the in-flight 85-band plans. Authored alongside docs/plans/phase-85m-mcp-rc-2026-07-28.md (the new phase stub).


D-169 — Phase 107d: native parallel tool-calls — executor CallParallel branch, JoinAll-on-native, non-atomic setup, default flip

Date: 2026-05-28 Status: Settled (planned; shipping with Phase 107d)

Where it lives:

  • Dev executor: cmd/harbor/cmd_dev_executor.go — the case planner.CallParallel ErrDecisionShapeUnsupported reject (line ~101) becomes a real dispatch through a shared *internal/runtime/parallel.Executor; the merge layer applies the existing projectForLLM (D-026) per branch and assembles an aggregate observation keyed by each branch's CallID. cmd/harbor/cmd_dev.go::bootDevStack constructs the executor (the catalog already satisfies parallel.Resolver) and plumbs the new config knob.
  • Parallel executor: internal/runtime/parallel/parallel.go — a per-call non-atomic setup mode where a branch's Resolve miss / Validate failure becomes that branch's Result.Err instead of the whole-call ErrParallelBranchInvalidArgs abort. dispatchAll is reused verbatim as the native-path engine.
  • React planner: internal/planner/react/projector.go emits planner.CallParallel{Branches, Join: nil} for N>1 ToolCalls when the knob is on (the default); the 107c serialization fallback (RunContext.PendingToolCalls) is preserved on the off path. internal/planner/react/prompt.go::renderNativeStepPair gains a CallParallel case: one assistant message with N tool_calls + N RoleTool messages keyed by branch CallID. internal/planner/react/react.go adds WithParallelToolCalls(bool).
  • Config: internal/config/config.go adds planner.parallel_tool_calls as a pointer-bool (*bool, yaml parallel_tool_calls) defaulting to true when omitted. The key lives flat under planner: (alongside reasoning_replay, max_steps, max_tool_examples_per_tool), NOT nested under a planner.react: block — the plan's planner.react.parallel_tool_calls dotted notation is shorthand for "the React planner's knob"; the config surface has no react: sub-block and adding one would fragment the React knobs across two places. The *bool flows through the planner.PlannerConfig boundary so the react factory distinguishes "unset" (nil → keep the planner's true default) from an explicit false.
  • Docs: glossary updates (ParallelExecutor, JoinAll, RunContext.PendingToolCalls) + new parallel_tool_calls; skill drift on add-an-in-process-tool + drive-the-playground.

Decision. Phase 107c (D-167) cut the React planner over to native provider tool-calling but deliberately deferred the N>1 case: rather than emit planner.CallParallel, it dispatched the head tool-call and queued the tail on RunContext.PendingToolCalls, draining one per step, because the dev executor still rejected CallParallel. Phase 107d closes that carve-out. Five settled calls.

1. Reuse the existing parallel.Executor — do not build a second dispatcher. internal/runtime/parallel.Executor shipped at Phase 47 (D-056) with goroutine fanout, the AbsoluteMaxParallel cap, identity-from-ctx propagation, the per-branch Result shape, and an N≥128 -race reuse test — but it has carried zero production consumers since (it is referenced only by its own tests). Phase 107d makes the dev ToolExecutor its first consumer, closing the §13 primitive-without-consumer gap the executor has carried for ~60 phases. Building a second fanout in the dev executor would violate §13 "two parallel implementations of one feature"; the dispatcher's engine (dispatchAll) is consumed verbatim.

2. On the native path JoinKind collapses to JoinAll; the other kinds are re-scoped as programmatic-planner surface. The JoinSpec machinery (D-056) was designed for a planner that authors the parallel call and chooses a merge strategy — the pre-107c prompt-engineered React parsed a structured plan out of LLM content. Native provider tool-calling gives the model no channel to request JoinFirstSuccess / JoinN: the provider returns N tool_calls with no join semantics, and the provider wire contract makes "every tool_call_id is answered exactly once before the next assistant turn" a correctness requirement. JoinFirstSuccess and JoinN cancel the losers and return fewer results than branches — which orphans the unanswered tool_call_ids and malforms the next request. So the React projector always emits Join: nil (→ normaliseJoinJoinAll). The other kinds are NOT removed (that would violate the swappable-planner principle — RFC §1 property 3): a future Deterministic / Workflow / Graph planner that authors a CallParallel programmatically, without an LLM round-trip, is a legitimate JoinFirstSuccess consumer. dispatchFirstSuccess / dispatchN / JoinKeyed's ErrParallelInvalidJoin reject are all unchanged.

3. Native dispatch is non-atomic; atomic stays the programmatic default. D-056's atomic setup validation fails the WHOLE call if any one branch's args fail Validate, before any branch dispatches — a side-effect-safety guard for programmatic plans. That posture is wrong-shaped on the native path: the model chose N calls knowing it wanted all of them, and aborting the whole call orphans every tool_call_id. Phase 107d adds a per-call non-atomic mode where a branch's resolve/validate failure becomes that branch's Result.Err (→ one error RoleTool message the model reads and repairs next turn, exactly like a single failed CallTool), while valid branches still fan out. The atomic default is preserved byte-for-byte for existing (today test-only) callers. The branch-count cap and missing-identity reject stay fail-loud in both modes. This also simplifies the failure posture: a native branch failure is not a parallel-specific re-plan concept — it is just an error tool-result, handled identically to a single CallTool failure (superseding the pre-107c planner.go note that parallel-branch failures bypass the re-plan counters).

4. The default flips to native CallParallel; serialization survives as a single-knob opt-out. planner.react.parallel_tool_calls defaults to true (D-167 §2 promised this flip). false reverts to the 107c serialization fallback. The serialization path is NOT dead code at false: it also remains the same-turn-discovery-race guard (D-167 risk #4 — a tool_search plus a call to the not-yet-declared tool in one response must serialise so the second call lands on the turn after the tool is declared).

5. Reserved planner-control names are standalone — co-occurrence with another tool-call is ErrInvalidDecision (carried-over 107c silent tail-drop fix; AC-21). 107c's projectResponse switched on resp.ToolCalls[0].Name and translated a reserved control name (_finish / _spawn_task / _await_task) to its Finish / SpawnTask / AwaitTask Decision before the N>1 tail-queueing block — so a control meta-tool emitted alongside other tool-calls in one response silently honoured only the first and dropped the rest (no error, no event — the §13-forbidden silent-degradation pattern). The symmetric tail case was just as wrong: a reserved name in the tail got queued to PendingToolCalls and later drained as a literal CallTool{Tool:"_spawn_task"} that hit the catalog as an unknown tool. Phase 107d adds a guard in projectResponse that runs BEFORE the head switch: any response where a reserved control name co-occurs with one or more other tool-calls is rejected with a wrapped planner.ErrInvalidDecision naming the offending control tool. Reserved control meta-tools are terminal/standalone — they are not CallParallel branches (Branches is []CallTool, catalog tools) and not serialisable tail entries. The guard fires whether the reserved name is head or tail, and on BOTH the native-parallel path and the serialization opt-out (it is independent of parallel_tool_calls). Single-reserved-call cases are unchanged; all-regular N>1 still flows to CallParallel (on) or the serialization tail (off). This is bundled here, not a separate hotfix, because it is the same projector seam 107d reshapes for the N>1 → CallParallel mapping, and the two changes must agree on what a "batchable" tool-call is. A future one-turn batch-spawn, if wanted, is a dedicated _spawn_tasks meta-tool taking an array — never reserved names as CallParallel branches.

Why. The serialization fallback was always a defensive stop-gap (D-167 §2 named the follow-up "Phase 110z or equivalent"); it makes the LLM's single N-tool-call turn replay to the provider as N separate assistant turns — tolerated by providers but unfaithful, and it serialises genuinely-independent calls that the model wanted run concurrently. Closing the carve-out delivers the concurrency the model asked for and makes the trajectory round-trip faithful to the provider wire shape. The cost is concentrated in the trajectory→prompt round-trip (one assistant message with N tool_calls answered by N RoleTool messages) and per-branch heavy-output projection — not in the dispatch engine, which already exists.

Findings I'm departing from. Two departures from D-056, both native-path-only and both documented in the phase plan's "Findings I'm departing from": (a) atomic-setup-validation → non-atomic per-branch error disposition (call 3); (b) planner-chosen JoinKind → always-JoinAll on the native path (call 2). Neither removes the original behaviour — both re-scope it as programmatic-planner surface. brief 15 §6's "N native ToolCalls → CallParallel" mapping is implemented in full here (107c implemented the 0-and-1 cases and serialised the N case).

Protocol additions. None. CallParallel, JoinSpec, ToolCallStructured, and ChatMessage.ToolCallID all already exist; this phase wires consumers, not new wire types. The tool.invoked audit event payload is unchanged (it already carries {name, args} per branch).

Known limitation, named here. Pause mid-parallel stays fail-loud: parallel.Executor returns ErrParallelPauseUnsupported if a pause request lands mid-dispatch (the Phase 47 placeholder; Phase 50's checkpoint-atomicity was never wired because the executor had no consumer until now). Accepted for V1.1.x — dev-path branches are short tool invokes and the window is tiny. True checkpointed atomic-pause-mid-parallel is a Phase 50-extension follow-up; if an operator workload hits the limit in practice, that is the trigger to pull it forward rather than silently swallow.

Cross-references. Builds directly on D-167 (native tool-calling cutover) and D-056 (the parallel executor + JoinSpec). Phase plan: docs/plans/phase-107d-native-parallel-tool-calls.md. Informed by brief 15 §6 "Decision-sum invariance."


D-170 — Phase 107e: SpawnTask + AwaitTask dev-executor dispatch — background-task execution, synchronous-runloop join, spawn-depth cap

Date: 2026-05-28 Status: Settled (planned; shipping with Phase 107e)

Where it lives:

  • Dev executor: cmd/harbor/cmd_dev_executor.go — the case planner.SpawnTask / case planner.AwaitTask ErrDecisionShapeUnsupported rejects become real dispatch. devToolExecutor gains a tasks.TaskRegistry field + a maxSpawnDepth field (both immutable after construction — D-025). SpawnTask maps the decision into a tasks.SpawnRequest under the run's identity triple (rc.Quadruple.Identity, never a global) and calls Spawn; AwaitTask + a retain-turn SpawnTask poll Get until the task reaches a terminal status. The await/retain observation goes through the existing projectForLLM (D-026) so a heavy awaited result is artifact-stub-shaped before the LLM edge.
  • Per-task driver: cmd/harbor/cmd_dev_runloop.go — the foreground-only kind filter gains an opt-in driveBackground bool (set true by bootDevStack). With it on, the driver runs a planner sub-run for KindBackground tasks too, picked up via the same task.spawned subscription and driven through the identical MarkRunning → Run → MarkComplete/MarkFailed answer-envelope path. Recursion is bounded at the spawn site (call 4), not here.
  • Config: internal/config/config.go adds planner.absolute_max_spawn_depth int (yaml absolute_max_spawn_depth, flat under planner: — matching D-169's "no react: sub-block" call) + a SpawnDepthCap() accessor (non-positive → default 4). validate.go rejects a negative value.
  • Docs: glossary updates (SpawnTask / AwaitTask gain a dev dispatch consumer; new absolute_max_spawn_depth); skill drift on drive-the-playground.

Decision. Phase 47 (D-056) shipped the runtime machinery — the SpawnTask/AwaitTask Decision shapes, the React _spawn_task/_await_task emission (re-confirmed native by 107c/D-167), the tasks.TaskRegistry.Spawn + WatchGroup + GroupCompletion surface — but the only steering.ToolExecutor V1.1.x ships (the dev executor, 83i/D-152) rejected both shapes, and the per-task driver drove foreground tasks only (cmd_dev_runloop.go explicitly deferred background execution to "the runtime dispatch executor — a later phase"). 107e is that phase. Four settled calls.

1. SpawnTask and AwaitTask dispatch land together (§13). CLAUDE.md §13 binds the twin: "a planner that can spawn a background task but cannot join it produces orphan work the runtime cannot recover." That pinned the emission twin at Phase 47; the same logic binds the dispatch twin here. Wiring spawn without join on the dev path would orphan every spawned background task. So 107e ships both, even though the request named only SpawnTask.

2. The join is synchronous, not eager push wake-on-resolution. D-032 / the Phase 45+47 detail describe ReAct's push wake mode as the runtime re-invoking Planner.Next on GroupCompletion without an explicit AwaitTask. The steering RunLoop V1.1.x ships is synchronous — it dispatches a decision through ToolExecutor, appends one trajectory step, and re-enters the planner on the next step; there is no group-watcher re-entry path. Building eager push re-entry would be steering-runloop surgery, out of scope for a dev-executor dispatch phase. So on the dev path the realizable shapes are: a retain-turn SpawnTask blocks in-decision until the spawned task is terminal (synchronous spawn-and-join), and a non-retain-turn SpawnTask returns {task_id} immediately and is joined later by an explicit AwaitTask. Both produce concurrent background execution (background tasks run once spawned). Eager push re-entry is filed as a steering follow-up.

3. The join polls Get, not WatchGroup — because tasks.Task has no GroupID field. WatchGroup(sessionID, groupID) watches a group; AwaitTask carries a single TaskID, and the persisted Task record exposes no group id to resolve it from. The registry's own group docs (internal/tasks/groups.go) bless Get(taskID) polling as a first-class poll wake mode (cheap in-memory map lookup on the dev path). So the dev executor polls Get until the task reaches a terminal status (Complete/Failed/Cancelled), bounded by the per-step ctx, with a spawnAwaitPollInterval cadence. This is identity-safe (Get rejects cross-session/cross-tenant reads — closing the isolation case for free), needs no group-resolution, and adds no second wake mechanism (§13). A §4.3 deviation from the phase plan's original WatchGroup wording, documented in the plan's "Findings I'm departing from."

4. Background-task recursion is bounded by a spawn-depth cap at the spawn site, not by refusing to drive background tasks. The driver's original foreground-only filter existed precisely because "driving a planner against a background task would create a recursive planner loop." 107e drives background tasks (the driver — not the executor — is the right home for running a task's planner: it reuses all of 83f/83i/106's per-run wiring + the FSM bridge) but closes the recursion concern at the executor: SpawnTask reads the parent ParentTaskID-chain depth (walking Get upward, bounded) and rejects loudly — an error observation, never a silent drop (§13) — any spawn whose child would exceed planner.absolute_max_spawn_depth (default 4). The cap bounds depth, not breadth; a per-run total-spawn budget is a noted follow-up if breadth becomes a problem.

Why. Phase 47's SpawnTask/AwaitTask emitters and the background-task machinery have had no shipping binary consumer since they landed — the §13 primitive-without-consumer gap, one layer in from the D-169 parallel-executor gap. 107e closes it on the dev path: a planner that emits _spawn_task now actually spawns and runs a background sub-agent, and _await_task joins it. The synchronous-runloop join + Get-poll + spawn-depth-cap are the minimal correct wiring that fits the V1.1.x runloop without new runtime mechanisms.

Findings I'm departing from. (a) D-032 eager push wake-on-resolution → synchronous retain-turn / explicit-AwaitTask join (call 2); (b) the plan's WatchGroup join → Get-poll join, forced by Task having no GroupID field (call 3); (c) the driver's foreground-only filter → opt-in background driving with a spawn-depth cap (call 4). None removes the runtime surface — WatchGroup stays the group-fan-in mechanism for programmatic planners and a future eager-push runloop.

Protocol additions. None. SpawnTask, AwaitTask, SpawnRequest, TaskHandle, the Task FSM, and the answer-envelope TaskResult shape all already exist; this phase wires consumers, not new wire types.

Known limitations, named here. (1) Eager push wake-on-resolution is deferred (call 2) — a steering-runloop follow-up. (2) A retain-turn spawn / AwaitTask holds a planner step open until the child resolves (bounded by ctx); fine for short V1.1.x dev sub-goals, surfaces as a deadline error observation if a child never terminates — never a hang. (3) Background tasks use the in-mem driver and do not survive a harbor dev restart (Phase 87 owns durability). (4) The spawn-depth cap bounds depth, not breadth.

Cross-references. Builds on D-056 (the spawn/await emission + group surface), D-032 (wake modes), D-152 (the dev ToolExecutor seam), D-097/D-098 (the per-task driver + FSM bridge), D-167/D-169 (the native + parallel cutover this rides behind). Phase plan: docs/plans/phase-107e-spawn-await-dev-executor-dispatch.md. Informed by brief 02 (planner + steering) and brief 05 (tasks).


D-171 — Per-request session model: connection token authenticates (tenant, user); session is chosen per-request via X-Harbor-Session; create-on-first-use; boot is crash-proof

Date: 2026-05-29

Context. harbor dev minted one dev JWT whose session claim was hardcoded to "dev"; control.start folded identity from that token, so EVERY conversation wrote to session "dev". Worse, boot called sessionRegistry.Open("dev", ...): once the persisted "dev" record was Closed (idle-GC'd or operator-closed), the next boot hit ErrReopenAfterClose and crashed — restarting harbor dev against an existing state dir required deleting the state DB. There was effectively one session; you could not create new conversations or reload past ones.

Decision. The connection token is a per-backend credential, like an API key: it authenticates (tenant, user, scopes) and does NOT pin a session. The session is dynamic — chosen per-request via the X-Harbor-Session header, scoped under the token's verified (tenant, user). The token's session claim becomes a back-compat default used only when the header is absent. The multi-isolation triple (tenant, user, session) stays mandatory and enforced (CLAUDE.md §6); only the SOURCE of session changed from JWT-claim to per-request-under-token-scope.

What changed (runtime-only + one contract doc).

  • internal/protocol/auth/middleware.go — after JWT verification, the middleware re-folds the ctx identity: X-Harbor-Session (when present) REPLACES the claim's session; tenant + user stay token-verified (a header can never widen the principal). A verified token with empty tenant/user is a 500 (validator bug); a request that resolves to no session is a 401 identity_required.
  • internal/sessions — new EnsureOpen(ctx, ident) create-on-first-use entry point + a persistent per-(tenant, user) session catalog (session.catalog Kind) so a fresh process re-discovers a prior process's sessions on the read path (the StateStore has no List; the typed wrapper owns enumeration). ListSnapshots / Inspect hydrate from the catalog. EnsureOpen on a CLOSED session id fails loud (ErrReopenAfterClose) — no silent revive (RFC §6.9).
  • internal/protocolSessionEnsurer seam + WithSessionEnsurer option; dispatchStart calls it so a start on a not-yet-existing session materialises its row before spawning the task.
  • cmd/harbor/cmd_dev.goremoved the boot-time Open("dev") (the crash). The registry is constructed alongside the ControlSurface and wired with the ensurer. harbor dev now boots clean against an existing state dir regardless of session state.
  • harbortest/devstack — mirrors the production wiring (D-094): registry + ensurer + sessions.* routes.

Tests. Restart-resilience (boot twice over the same SQLite dir, second boot healthy + sessions re-discovered); multi-session isolation under one token (N concurrent sessions, no cross-talk, -race); create-on-first-use; closed-session rejection; per-request session override in the auth middleware (header overrides claim; header cannot widen tenant/user). Integration: test/integration/session_model_d171_test.go + the updated TestE2E_Phase72_BodyVsTokenIdentityMismatch (now asserts the per-request-session contract).

RFC delta flagged for coordinator audit. RFC §8 / CLAUDE.md §6/§8 describe identity as flowing "via JWT." This decision sources tenant+user+scopes from the JWT (unchanged) but sources session per-request (header), validated under the token's (tenant, user). The isolation triple and fail-closed posture are unchanged; only the session SOURCE moves. Proposed wording: "The connection credential carries (tenant, user, scopes); the session is selected per-request within the credential's (tenant, user) scope and is never client-widenable." Shipped in the dev posture now; flagged here for an RFC-text reconciliation PR.

Known limitation. sessions.list / sessions.inspect survive restart (persistent catalog + StateStore-backed records), but the task registry is in-memory and not rehydrated on boot, so tasks.list for a pre-restart session returns empty after a restart (the session row reloads; its task rows do not). Full task durability is a separate post-D-171 workstream. Documented in docs/notes/session-model-contract.md.

Cross-references. Builds on D-082 (Phase 61 auth middleware + ctx-first identity), D-122 (sessions.* Protocol surface), D-108 (SessionLister), RFC §6.9 (session lifecycle / reopen-after-close), RFC §8 (Protocol auth). Contract doc: docs/notes/session-model-contract.md.


D-172 — Deprecate Phase 85g; ship MCP Apps as the 109a–c wave under V1.1.x, scheduled right after Phase 108

Date: 2026-05-29

Context. Phase 85g ("MCP Apps host") sat in the post-V1 85-band with status "Revisit after RC-final (2026-07-28)" on the premise that MCP Apps was experimental in the 2025-11-25 spec and the RC might reshape _meta.ui.resourceUri or move Apps into a versioned extension. Two facts overturn that premise: (1) MCP Apps is already a stable, independently-versioned extension (io.modelcontextprotocol/ui, the ext-apps repo) — it is NOT gated on the July RC and the RC does not change it; (2) the extension ships an official, framework-agnostic host bridge (@modelcontextprotocol/ext-apps AppBridge), so the single largest risk the 85g plan carried — hand-rolling the postMessage JSON-RPC dialect, the ui/initialize handshake, lifecycle, and message validation — disappears (we consume it, we do not author it). A code audit also found 85g's "Apps is purely Console-side; the runtime driver is unchanged" non-goal to be factually wrong: the MCP driver does not parse _meta.ui.resourceUri (content.go has no _meta slot), tool.completed carries no result content, and the runtime's ReadResource is not exposed on the Protocol — so there is real runtime + Protocol work before any Console renderer can fetch a ui:// resource.

Decision. Deprecate Phase 85g (plan file kept as historical context, marked deprecated) and supersede it with a three-phase "MCP Apps host" wave under V1.1.x, scheduled immediately after Phase 108:

  • 109a — MCP Apps runtime + Protocol surface (internal/tools/drivers/mcp + internal/protocol + cmd/harbor): parse _meta.ui.resourceUri, recognise ui:// resources, project the app reference (resourceUri + negotiated DisplayMode + RawHTMLTrusted) onto the tool-result Protocol surface, add the mcp.servers.read_resource method (identity-scoped, D-026 heavy-content aware), negotiate DisplayModes from the server's io.modelcontextprotocol/ui capability (replacing the static registry.go placeholder), and add an app-initiated-tool-call proxy method that routes through the existing approval/OAuth/identity tool-safety path.
  • 109b — Console MCP Apps host (web/console): the sandboxed-iframe renderer in the shared chat module, the official AppBridge in manual-handler mode (see D-173), and the inline DisplayMode. Consumes 109a's surface — this is the §13 same-wave consumer for 109a's primitives.
  • 109c — MCP Apps DisplayMode layout (web/console): the Playground page-level layout state machine for fullscreen (app replaces chat + composer; multi-tab) and pip (50/50 resizable split, right rail hidden by default + toggle). inline already shipped in 109b.

The wave honours the inline-first incremental cut: 109a+109b prove the bridge + proxy end-to-end with inline rendering; 109c adds the heavier fullscreen/pip layout.

Numbering note. The wave claims integers 109a/b/c as the "MCP Apps host" band, executing right after 108. The 14-round page-by-page visual-polish series that Phase 108 opens continues from the next free integer after this band (it is not yet numbered beyond 108); this band does not displace it, it precedes it in execution order.

Dependency prerequisite (binding). 109b adds @modelcontextprotocol/ext-apps + its peer @modelcontextprotocol/sdk to web/console. Per CLAUDE.md §13 / §16 this is a dependency addition requiring an RFC §10 companion update before/with the phase. These are framework-agnostic TypeScript (core + app-bridge entry points only, never the /react entry), so they are not the forbidden React/Vue surface — but the RFC sign-off is still required and is named as a prerequisite risk in the 109b plan.

Cross-references. Supersedes the 85g detail block + plan file. Builds on D-062 (DisplayMode + Live-Runtime ≠ Sessions), D-091 (shared chat module + Console deployment posture), D-093 (protocol.ts generated), D-026 (context-window safety net), D-120/D-121 (Console renderer registry + design-system conventions). Paired with D-173 (AppBridge manual-handler mode). Plans: docs/plans/phase-109a-mcp-apps-runtime-protocol.md, phase-109b-console-mcp-apps-host.md, phase-109c-mcp-apps-displaymode-layout.md. Briefs: 14 (MCP compliance), 11 (Console/playground), 12 (Console deployment).


D-173 — The MCP Apps host integrates the official AppBridge in manual-handler mode; every app→host call is Protocol-proxied, never a direct MCP connection

Date: 2026-05-29

Context. The official @modelcontextprotocol/ext-apps AppBridge offers two integration modes: (1) auto-forward, where the bridge wraps a live MCP Client and proxies app requests straight to the MCP server; (2) manual-handler, where the host registers handlers (oncalltool, onreadresource, onlistresources, onlisttools, onrequestdisplaymode, …) and wires each itself. Auto-forward is the natural fit for a host that is itself an MCP client. The Harbor Console is not an MCP client — it is a Protocol client of the Harbor Runtime (CLAUDE.md §4.5), and the Runtime owns the MCP southbound connection, the (tenant, user, session) isolation boundary, audit redaction, and the unified pause/approval/OAuth tool-safety gates.

Decision. The Harbor MCP Apps host MUST integrate the AppBridge in manual-handler mode only. Every app→host request — tool call, resource read, resource/prompt list, display-mode change — is wired to the injected Harbor ProtocolClient (the 109a methods) → Runtime → MCP southbound. The Console never opens a direct MCP transport and never lets the AppBridge wrap an MCP Client. Concretely:

  • An app-initiated tools/call is routed to 109a's app-tool-call proxy, which enters the SAME identity + approval-gate (Phase 31) + tool-side-OAuth (Phase 30) path a planner-initiated call uses. An app call to a gated tool parks on the unified pause primitive exactly as a planner call does — no bypass.
  • An app-initiated resources/read is routed to mcp.servers.read_resource, scoped to the request identity triple, D-026 heavy-content aware.
  • postMessage origin validation is mandatory: the host accepts messages only from the expected iframe; a foreign-origin or malformed message is rejected, not executed.
  • The iframe sandbox is set with no allow-same-origin unless the projected RawHTMLTrusted state explicitly permits; strict CSP; no parent-DOM / cookie / localStorage access.

Why. If the AppBridge opened its own MCP connection (auto-forward), an in-iframe app could call tools and read resources outside the runtime's identity scope, audit redaction, and approval/OAuth gates — a direct violation of CLAUDE.md §6 (multi-isolation), §7 (security), and §13 (Console reading runtime internals / bypassing the unified pause primitive). Manual-handler mode makes the Protocol the only path, so the app is structurally confined to what the operator's (tenant, user, session) may already do. The 109b test suite asserts the Console opens no direct MCP transport and that an app call to a gated tool still parks.

Cross-references. Implements the security posture of D-172's 109b. Builds on D-091 (shared chat module — injected ProtocolClient, never a singleton), D-062 (DisplayMode), the unified pause/resume primitive (Phase 50), tool-side OAuth (Phase 30 / D-083), tool-side approval (Phase 31 / D-086). CLAUDE.md §4.5, §6, §7, §13. Plan: docs/plans/phase-109b-console-mcp-apps-host.md.


D-174 — Durable memory strategies: the SQL memory drivers delegate to the shared strategy executors; Summarizer threads through memory.Open

Date: 2026-05-30

Context. Phases 24 (memory strategies) and 25 (SQLite/Postgres memory drivers) both shipped, but their intersection did not: the truncation and rolling_summary strategies were only ever implemented in the inmem driver. The SQLite + Postgres memory drivers implement strategy=none only and return ErrStrategyNotImplemented for the rest. And the registry factory memory.Open has no Summarizer in its Deps, so harbor dev special-cases rolling_summary with a direct inmem.New(...) call and rejects every non-inmem driver at boot — the "hardwiring for devs" an operator hit when asking for durable rolling-summary memory. Net effect: durable memory with real recall did not exist; only inmem had the strategies.

Key architectural finding. The strategy algorithms are NOT in the inmem driver — they live in a driver-agnostic internal/memory/strategy/ executor package that persists through an injected state.StateStore. The inmem driver is a thin shell delegating to the executor. So the fix is NOT to reimplement truncation/rolling_summary in SQL — it is to make the SQL drivers delegate to the SAME executors with their injected state.StateStore, exactly like inmem. Durability then rides on the StateStore writes (a SQL StateStore → durable across restart, proven by a reopen-rehydration test).

Decision (Phase 25a).

  • Add Summarizer memory.Summarizer to memory.Deps; memory.Open validates it (required for rolling_summary, on every driver) and routes it to the driver factory → the executor Deps.
  • The SQLite + Postgres memory drivers delegate to strategy.StrategyExecutor (using their state.StateStore dep), gaining all three strategies; the ErrStrategyNotImplemented rejections and the Rejects*Strategy guard tests are removed.
  • cmd/harbor/cmd_dev.go collapses to a single memory.Open(ctx, cfg, Deps{State, Bus, Summarizer}) call; the rolling_summary-only-inmem error is deleted. The summariser still defaults to the agent's configured LLM (llmsummarizer.New(llmClient)) — no separate summariser model, no special-case.
  • Fail-loud preserved (CLAUDE.md §13): rolling_summary without a Summarizer errors at memory.Open on all drivers; the registry default is NEVER a stub summariser.
  • The memory conformance suite runs {none, truncation, rolling_summary} × {inmem, sqlite, postgres}. Rolling-summary snapshots keep the summary because Snapshot/Restore go through the executor's own Summary-bearing record (strategy.memoryStateRecord{Strategy, Turns, Summary}) on every driver; the exported memory.Record (turns-only, no Summary) is unchanged and remains a Console-facing turns projection — it is no longer on the persistence path, so nothing drops the summary.

Cross-references. Completes Phase 24 (strategies) × Phase 25 (SQL drivers). Builds on Phases 15/16 (SQLite/Postgres StateStore — the durable backing), the §13 primitive-with-consumer rule (Deps.Summarizer + its cmd_dev consumer land together), brief 02 (fail-loud memory), brief 13 (memory injection / recall). Plan: docs/plans/phase-25a-durable-memory-strategies.md. The dev binary's prior special-case it replaces is the one referenced by docs/plans/phase-25-memory-drivers.md.


D-175 — Per-MCP-server + per-tool tool-policy config (policy: / tool_policies: in YAML); projection via a cycle-free ProjectedToolPolicy

Date: 2026-05-30

Context. Tool retry/timeout was the hardcoded tools.DefaultPolicy() (30 s per-attempt deadline, 4 total attempts) with NO operator knob — MCPServerConfig had no policy field. A slow/throttled tool (a YouTube metadata call over uvx mcp-youtube → yt-dlp) burned ~4×30 s = ~128 s before failing, and the operator could not tune it. The MCP driver already carried an unused per-server DefaultPolicy slot (mcp.go) but it was never wired from config.

Decision. Phase 26b exposes the policy as operator YAML on each MCP server: a policy: { max_attempts, timeout_ms, retry_on, backoff_* } block (per-server default) plus a tool_policies: { <tool-name>: { … } } map (per-tool overrides). The YAML uses max_attempts = TOTAL attempts including the first (projected to tools.ToolPolicy.MaxRetries = max_attempts - 1), because operators think in total attempts, not retries. Per-field zero-value fall-through is preserved (a policy: that sets only timeout_ms keeps the default attempt count) — the projection never substitutes a default itself; tools.ToolPolicy.resolved() does at dispatch.

Two implementation subtleties (settled).

  • Import cycle → ProjectedToolPolicy. internal/config cannot import internal/tools (tools → events → config cycle). So the single config→policy interpretation seam, config.ToolPolicyConfig.ToToolPolicy(), returns a cycle-free primitive image config.ProjectedToolPolicy; the binary entry point (cmd/harbor) does the trivial primitive→tools.ToolPolicy copy. The tools.ToolPolicy struct stays the single definition (CLAUDE.md §13); there is exactly ONE interpreter of the operator fields. This is a §4.3 deviation from the plan (which placed the projection's return type as tools.ToolPolicy); justified by the hard cycle.
  • max_attempts: 1 needs an explicit-empty RetryOn. Because resolved() treats a zero MaxRetries on an otherwise-set policy as "inherit the default 3 retries", MaxRetries: 0 alone does NOT pin a single attempt. The policy shell reads an EXPLICIT empty (non-nil) RetryOn as "retry on nothing". So when the operator asks for max_attempts: 1 and names no retry_on, the projection sets a RetryOnEmpty flag and cmd/harbor materialises an empty non-nil RetryOn — making one attempt mean one attempt.

Tests. Projection off-by-one + per-field fall-through + retry_on mapping + unknown-class rejection; MCP integration: a per-tool max_attempts: 1 override makes exactly one attempt while a sibling tool uses the server default of four; concurrent-reuse (100 concurrent calls across two differently-policied tools, no cross-bleed, goroutine baseline); a "no policy → DefaultPolicy" regression. go test -race ./internal/config/... ./internal/tools/drivers/mcp/... green.

Cross-references. Builds on D-024 (tools.ToolPolicy + RunWithPolicy), the §4.4 MCP seam (Phase 28). Validation in internal/config/validate.go; example in examples/harbor.yaml; mcp/tools operator skills updated (§18). Plan: docs/plans/phase-26b-per-source-tool-policy-config.md.


D-176 — Session artifact manifest: the run loop injects a read-only <session_artifacts> block each turn so the planner stays aware of artifacts across turns

Date: 2026-05-30

Context. A user upload, or a tool result materialised above the heavy-output threshold (D-026), becomes a session-scoped artifact the model can read via the artifact_fetch builtin. But the model learns the artifact's ref ONLY on the turn it is created (from the input ArtifactStub or the heavy-result summary). On the next turn the ref is gone from context, so the model cannot iterate on an uploaded file or a prior tool output even though artifact_fetch already resolves session-scoped (the artifact is still readable — the model just doesn't know it exists). A second, latent issue: artifact provenance is keyed inconsistently — uploads set Source["source"]="user_upload", but tool/flow artifacts record the producer under Source["tool"]/Source["producer"] and leave the canonical "source" blank, so artifacts.list (and the Console Artifacts page) show blank source for them.

Decision (Phase 107f). Each planner turn, the run loop lists the session's artifacts (ArtifactStore.List scoped to (tenant, user, session) — already in scope) and pre-resolves a metadata-only manifest onto a new planner.RunContext.SessionArtifacts []ArtifactManifestEntry field (the D-166 pattern — the planner does no I/O, reads rc only). The ReAct planner renders a read-only <session_artifacts> system block (one line per artifact: ref · filename (mime, size) · provenance) with the same UNTRUSTED-metadata anti-injection framing the memory blocks use, instructing the model it may artifact_fetch <ref> to read/iterate on any of them. Empty session → NO block (no fabricated rows). Capped at 20 newest-first with an explicit +K more (use artifact_fetch by ref) line — never a silent truncation. A List error fails soft: a logged Warn and NO manifest that turn — never a fabricated or partial one (CLAUDE.md §5).

Provenance canonicalisation (the latent-bug fix). The dev tool-executor stamps Source["source"]="tool" and the flow catalog Source["source"]="flow", in addition to the producer/tool name. internal/protocol/artifacts.go::projectRow resolves the source discriminator from an else-chain ("source""tool""producer"/"flow") so even artifacts created before this phase project a correct non-blank source. The closed types.ArtifactSource enum is NOT extended: "tool" is an existing member; "flow" maps to ArtifactSourceSystem (a flow run is runtime-produced); the richer "flow: <name>" string surfaces only in the manifest provenance, never on the wire enum.

Shared builder + parity. planner.BuildArtifactManifest + planner.ResolveProvenance are the single source the run loop AND harbortest/devstack both call, so production and the harness cannot diverge (§17.6). BuildArtifactManifest imposes a deterministic newest-first (created_at desc, ID tiebreak) order because ArtifactStore.List order is interface-unspecified — keeping the prompt prefix stable across turns.

Tests. Provenance resolver table; manifest ordering + cap + empty; the read-only render framing + the artifact_fetch instruction; the Protocol source projection (a tool artifact no longer projects blank); the run-loop build with a prior-turn artifact + a user upload; identity scoping (session A's artifacts never appear for session B); the fail-soft List-error path. go test -race ./internal/planner/... ./internal/protocol/... ./cmd/harbor/... ./harbortest/... green.

Cross-references. Builds on Phases 17–19 (Artifacts + ArtifactStore.List), 33 (multimodal upload), 107c (the artifact_fetch meta-tool + the heavy-result ArtifactStub), D-026 (heavy-content routing — the manifest is metadata-only, never inlines content), D-166 (run-loop pre-resolution of RunContext inputs), brief 13 (read-only injected prompt blocks + anti-injection framing). The reserved memory.ConversationTurn.ArtifactsShown/ArtifactsHiddenRefs fields (brief 04) are NOT wired — a future "new-since-last-turn" delta optimisation. Plan: docs/plans/phase-107f-session-artifact-manifest.md.


D-177 — Live Runtime reframed to a single-runtime capability-adaptive cockpit (supersedes the topology-first composition of D-126)

Date: 2026-06-01

Context. D-126 composed the Live Runtime page topology-first: the engine graph is the hero, everything else is trim. But the topology.snapshot surface exists only on engine-graph runtimes; the dominant V1 shape is planner/RunLoop (the harbor dev dev posture, most scaffolded agents), which returns unknown_method (D-164). So on the common runtime the page's hero is the honest "topology not available" banner and the remaining surface (steer-a-run + an event table) duplicates — and underperforms — the Playground. After 108d shipped, the operator (2026-06-01) judged the page low-value and Playground-overlapping, and its negative space / lack of viewport discipline confirmed the framing was wrong, not just the polish.

Decision (Phase 108e). Reframe Live Runtime as the single-runtime operations cockpit — the Overview(fleet) → runtime drill-down, one runtime selected at a time — whose composition is a pure function of the runtime's advertised runtime.info capabilities (Phase 84a). A declarative capability→panel registry (web/console/src/lib/live-runtime/panels.ts::resolvePanels) yields an always-present spine (runtime posture · activity counters · needs-attention pauses/approvals · live event stream · active sessions) plus capability-gated panels (cost/governance, health, topology, and future multi-agent / workflow / distributed shapes — additive, no page rebuild). Topology becomes ONE gated panel, not the spine; on a planner runtime it is absent or collapsed, never an empty hero. The free-floating Start/Redirect/Inject/User-message composer is REMOVED (it duplicated the Playground, D-062); run-level steering is a drill into a session → Playground.

Supersedes. The composition half of D-126 (the topology-first spine + the page-local steering composer). D-126's per-datum data-source map and the "no Console shadow store" rule (D-061) stay intact and are reused. D-164 (honest unknown_method info state) is preserved and generalised — from "hide topology" to "gate every capability-conditional panel." D-062 (Live Runtime ≠ Sessions; chat is one panel) is reinforced, not changed.

No fabrication (CLAUDE.md §13). A gated-absent panel renders an honest "this runtime does not advertise <X>" state, never synthetic data. Node run-state (the topology legend counts + failed-node styling) stays Console-derived from the live event stream — the Protocol projection carries no per-node state — so on a runtime that emits none the legend reads zeros and nothing is styled failed.

Layout bar. The reframe carries an explicit layout-fidelity acceptance gate the 108d build missed: viewport-locked (no full-page scroll; only inner regions scroll), shared baseline grid, full-bleed, deliberate negative space — validated in the EMPTY/info state (what dev runtimes render), not only the populated one.

Tests. panels.test.ts (the pure resolver across planner / engine+posture / unknown-capability inputs); the carried-forward 108d topology-adapter.test.ts (structural graph render incl. a failed node); a rebuilt live-runtime-page.spec.ts (cockpit hydration with zero console errors, the capability matrix, scope-gated intervention verbs, disconnected→/settings). scripts/smoke/phase-108e.sh (static) guards capability-driven composition, topology-gating, and the composer/chat removal.

Cross-references. Supersedes the composition of D-126; preserves D-164, D-062, D-061, D-066. Builds on Phase 84a (runtime.info capabilities) and Phase 108d (the reused components + topology adapter + capability probe). Consumes-as-available 72f (runtime health), 72g (governance/llm posture), the 73-cluster (sessions.list/sessions.inspect, tasks.get, artifacts.list), 74 (topology.snapshot) — each honest-gated when absent. RFC §7, §7.1, §6.3, §6.13. Plan: docs/plans/phase-108e-live-runtime-capability-cockpit.md.


D-178 — Console Settings reframed to a calm sub-nav + single-section layout (supersedes the Phase 73m / D-129 paginated-cards + saved-views + detail-rail composition)

Date: 2026-06-01

Context. D-129 (Phase 73m) shipped the Settings page with three navigation models running at once — a left sub-nav rail AND scroll-to-anchor AND a 6-per-page paginator over the 12 section cards — plus a top FilterBar with a section-search input, saved-view chips, a "Bookmark section" button, a right detail rail ("Active section / Runtime / LLM mode"), and a page-level runtimes DataTable duplicating the Connected Runtimes card. The result was busy and over-engineered: the page's own spec (page-settings.md §4) had always prescribed the calm "section-nav rail + one section's content at a time" model, and brief 11 §"Settings view" describes exactly that. After 108c (Overview) and 108d/108e (Live Runtime) landed the carded .panel.card vocabulary, the operator (2026-06-01) judged the Settings page the last over-built surface and approved the simplification.

Decision (Phase 108f). Rebuild the page to a two-pane flex: the <SubNavRail> on the left (lightly grouped — Console-local sections, a hairline divider + a "Runtime" sub-heading, then the read-only runtime-posture sections) and a .section-pane on the right that renders ONLY the active section (default connected-runtimes). Each active section is a carded <section class="panel card"> with an <h2 class="panel-title"> reading the section label, copying the Overview page's (108c) vocabulary — tokens only, Svelte 5 runes (D-092). All of the 73m cruft is removed: FilterBar, saved-view chips, the Bookmark-section button, the detail rail, the paginator, the duplicate runtimes DataTable, and scroll-to-anchor. A single section is always in view, so the page is calm and viewport-friendly (the rail is sticky; the right pane scrolls internally for a long section like Keybindings — the chrome never full-page-scrolls).

Supersedes. The composition half of D-129 — the paginated-cards layout, the FilterBar + saved-view chips + Bookmark-section button, the detail rail, and the page-level DataTable. The section CARD components, the state module (state.svelte.ts), and the Console DB controller (console_db.svelte.ts) are unchanged; the auth.rotate_token method and its D-066 admin gating are untouched.

Preserved. D-158 (the console-local / runtime-posture split) is kept per active section — a console-local section renders DIRECTLY inside a settings-cards-console-local wrapper (works disconnected, the operator's only path to attach a runtime), while a runtime-posture section renders inside <PageState> inside a settings-cards-runtime-posture wrapper. D-061 (Console DB local-only, never a shadow store for runtime entities) and D-066 (rotate-token admin gating) are reinforced, not changed. $lib/settings/saved_views.svelte.ts is left in place (merely no longer imported by the page) to avoid breaking other refs.

No fabrication (CLAUDE.md §13). The single-section model removes parallel navigation affordances rather than adding any; nothing synthetic is introduced. Posture sections still render each card's honest "unavailable" state from a null when a posture read fails, and the disconnected branch shows the standard <PageState> placeholder.

Tests. A rebuilt web/console/tests/settings-page.spec.ts (hydration; sub-nav → active-section heading reads "About"; the default-section add-runtime round-trip; rotate-token disabled-without-admin / enabled-with-admin after clicking the per-runtime-auth entry; the conditional mock-mode banner after clicking the llm-posture entry; the disconnected shell; the D-158 disconnected attach path — console-local renders + the add form is reachable, then a posture section shows page-state-disconnected; the 83u disconnected-boot localStorage write). scripts/smoke/phase-108f.sh (static) guards the removal of the cruft and the preservation of the D-158 split + the load-bearing testids. npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Supersedes the composition of D-129; preserves D-158, D-061, D-066. Builds on Phase 73m (the page it simplifies), Phase 83p (the D-158 split), Phase 105 (first-attach — AttachToLocalCard), and Phase 108c (the carded vocabulary). RFC §7, §7.1; brief 11 §"Settings view"; brief 12 §"auth-storage threat model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108f-console-settings-page.md.


D-179 — Console Sessions page rebuilt + fully wired (supersedes the Phase 73c / D-122 placeholder bottom-dock + disabled-bulk composition)

Date: 2026-06-01

Context. D-122 (Phase 73c) shipped the Sessions list + detail routes on the design-system foundation, but predated the 108b app-shell chrome and the 108c carded retheme, and it left two placeholders the PAGE-POLISH bar forbids: the detail-view BottomDockTabs rendered five static descriptive blurbs (Trajectory / Events / Cost History / Control History / Interventions) instead of real data, and the list's bulk Cancel / Pause buttons were permanently disabled with a "wired with Phase 73b" tooltip. A wire investigation (2026-06-01) confirmed the shipped Protocol surface can feed every one of those affordances for real — the bulk control methods (cancel / pause) are shipped (D-047), and each dock tab is a session-filtered projection of the shipped events.subscribe SSE (the taxonomy ships planner.decision/finish/error, tool.*, task.*, llm.cost.recorded, control.received/applied/rejected, pause.*, tool.approval_*, tool.auth_*). The operator approved rebuilding both routes to the carded vocabulary the four done pages (Overview, Live Runtime, Settings, Playground) set, with zero placeholders.

Decision (Phase 108g). Rebuild /sessions + /sessions/[id] to the carded .panel.card + .panel-title vocabulary (Overview 108c), tokens only, Svelte 5 runes (D-092), HarborClient + connection.ts only. The list keeps sessions.list (cursor-paged) under a calm carded toolbar (free-text search → the sessions.list query / search.sessions; Status facet; admin-only Tenant facet per D-079; Sort; Refresh), lean registry-owned columns (Session / Status / Agent / Identity / Started / Last activity / Duration) plus an Events count enriched per visible row via events.aggregate. Bulk Cancel / Pause are wired for real: when rows are selected and the operator holds the control scope, each iterates the shipped cancel / pause method per selected session's active run (D-066 gates the affordance; absent the scope it is disabled with a tooltip naming the claim). The detail view replaces the placeholder BottomDockTabs with five real tabs, each a session-filtered projection of events.subscribe: Trajectory (the planner/tool/task lifecycle timeline), Events (the raw filtered log, reusing the events/ lib), Cost History (llm.cost.recorded summed client-side, reusing overview/cost.ts), Control History (control.*), and Interventions (pause.* / tool.approval_* / tool.auth_* + pause.list backfill, with a real Resume action invoking resume / approve / reject). The detail action set — Continue in Live Runtime (nav), Clone (start with the cloned query), Cancel session (cancel per active task), Export events (JSONL via the shipped events/export.ts) — is wired against the live Runtime.

Supersedes. The composition half of D-122 — the placeholder blurb BottomDockTabs, the permanently-disabled bulk Cancel / Pause buttons, and the pre-chrome PageHeader-led list layout. The sessions.list / sessions.inspect wire shapes, the typed SessionsProtocol, the sessions/format.ts helpers, SessionFacetChips, and IdentityCell are unchanged; D-122's "registry projection is pure, the Console enriches" stance is reinforced, not changed.

Cost column departure. brief 11 §"Sessions view" sketched a per-row cost / token column on the list. This phase departs: the Phase 08 Session registry does not model per-session cost / tokens (D-122 — the row is a pure lifecycle projection), and no shipped aggregate wire sums llm.cost.recorded per session (events.aggregate counts events by type only). Rather than fabricate a value or ship an always-empty column, the list omits Cost / Tokens; the detail's Cost History tab computes cost from the live event stream where it can be done honestly. A dedicated cost.aggregate wire is the V1.3 evolution that would restore a per-row list cost.

No fabrication (CLAUDE.md §13, PAGE-POLISH §1). Every datum the rebuilt page renders is traced to a shipped Protocol method or event and verified against live Runtime data. The genuine V1 gaps render honest states, never invented values: the scrubbing replay player and the Markdown full-transcript export need the Phase 73 state.history / state.list_trajectories surface (still Pending) and stay deferred (the static Trajectory timeline from events still ships; "Export events (JSONL)" ships in their place); Convert-to-Evaluation stays disabled with a D-064 tooltip; a session whose events have aged out of an in-memory event bus shows an honest empty dock, not a fabricated history.

Tests. New Vitest specs for the event→step Trajectory projection, the client-side cost sum, the per-row Events enrichment, and the dock tab filtering — each against a captured real wire frame (the SSE PascalCase payload gotcha). A rebuilt web/console/tests/sessions-page.spec.ts (Playwright) covers hydration on both routes, list → detail navigation, the four PageState branches, the bulk-action scope gating, the dock tab switching, and the disconnected shell. scripts/smoke/phase-108g.sh (static) guards the removal of the placeholder blurbs + disabled-bulk and the presence of the real per-tab components + the events subscription. npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Supersedes the composition of D-122; preserves D-061 (Console DB local-only), D-066 (control-scope gating), D-079 (admin tenant facet), D-107 (impersonation triplet), D-064 (Evaluations post-V1). Builds on Phase 73c (the page it rebuilds), Phase 60 / 72 / 72a (events.subscribe + events.aggregate), Phase 54 / 72e (pause.list + approve / reject), Phase 42 / D-047 (cancel / pause / resume / start), Phase 108b (chrome) + Phase 108c (the carded vocabulary). RFC §7, §7.1; brief 11 §"Sessions view", §CC-2, §CC-4; brief 12 §"The two-surface model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108g-console-sessions-page.md.


D-180 — Console Events page rethemed to the carded, viewport-locked composition (refines the Phase 73g / D-125 pre-chrome layout)

Date: 2026-06-01

Context. D-125 (Phase 73g) shipped the Events page fully wired to the shipped Protocol surface — the live events.subscribe SSE table feed, the events.aggregate per-type rate sparkline, the Console-local saved views / export / pause-stream, the admin cross-tenant fan-in, and the bus.dropped strip. It is genuinely well-wired and is NOT over-engineered (Events is the power-user event-bus investigative surface and legitimately needs its filter + sparkline + table + detail-rail composition). But it predates the 108b app-shell chrome and the 108c carded retheme: it renders a per-page PageHeader (duplicating the breadcrumb chrome) and is not viewport-locked, so the page full-page-scrolls and the table grows unbounded. A wire investigation (2026-06-01) also surfaced three honest gaps: the empty-state copy mis-describes the live SSE table feed as a persistent-buffer read; events.aggregate defaults to the caller's own session when the filter elides it, so the rate sparkline renders empty on the default view; and the right rail is blank when no row is selected (page-events.md §4 calls for the live subscription status there).

Decision (Phase 108h). Retheme the page to the carded .panel.card + .panel-title vocabulary the five done pages set (Overview, Live Runtime, Settings, Playground, Sessions), drop the per-page PageHeader (the breadcrumb is chrome), and viewport-lock it (PAGE-POLISH §6 — the Playground / Sessions pattern): the faceted filter strip and the rate sparkline are fixed-height; the events table scrolls internally behind a sticky header; the right rail scrolls internally; the document never full-page-scrolls. Keep the rich query composition unchanged in behaviour — every Runtime read stays on the unified HarborClient via the EventsPageState controller; the phase ships NO new Protocol method. Fill the three audited gaps honestly (PAGE-POLISH §1): reframe the empty copy to the live-stream reality, scope the sparkline aggregate to the active facet set so it reflects what the table shows, and render the live subscription status (cursor sequence, dropped count, stream state) in the idle right rail.

Supersedes. The composition/layout half of D-125 — the PageHeader-led, non-viewport-locked layout and the misleading empty copy. The data layer (EventsPageState, the events lib — subscription / aggregate / filters / taxonomy / export / sparkline / saved-views), the events.subscribe / events.aggregate wires, and the Console-local D-061 saved-views / export / pause semantics are unchanged.

No fabrication (CLAUDE.md §13, PAGE-POLISH §1). Every datum the page renders is traced to events.subscribe / events.aggregate and verified against live Runtime data. The genuine V1 gaps render honest states: runtime-side search.events (Phase 72c / [wave-13-extends]) is absent, so the search box stays a Console-local substring match over the loaded page with honest copy; the trace deep-link (D-073 traceparent) stays disabled-with-tooltip (post-V1); a quiet window on an in-memory event driver shows an honest empty table (the SSE streams live events forward; no historical backfill), never fabricated rows.

Latent production bugs the live verification surfaced (§17.6). The PAGE-POLISH §3 live-wire pass found the Events table was effectively non-functional in production — it never showed live events (the always-empty behaviour previously attributed to the inmem driver). Three reactivity/wiring bugs, masked by the unit/e2e harness (whose mock EventSource factory dispatches synchronously inside open(), before the first render), were fixed: (1) the EventsPageState.subscription / .aggregator fields were plain, not $state, so the async load() assignment (in onMount, after first render) never triggered the reactive re-read that surfaces streamed events — the table stayed bound to the initial null; (2) the subscription opened with the empty default eventTypes, but the SSE transport needs a NAMED addEventListener per subscribed type (the runtime emits event: <type> frames), so an empty list registered no listeners and ingested nothing — defaulting to the full taxonomy when no type facet is set fixes it; (3) the status field was set once at load (0 events → empty) and never recomputed, hiding the table behind the empty-state while events streamed — a derived displayStatus now flips empty↔ready on the live count. Additionally the Session facet only re-scoped the aggregate, not the table feed; a backward-compatible subscribeURL session override now re-scopes the live table to the pinned session, and the rate sparkline is re-fetched (throttled) as the cursor advances so it tracks the stream. Verified live against the YouTube validation agent: the table fills with real planner.decision / tool.* / task.* / llm.cost.recorded events, the sparkline renders, the Event Details rail shows the typed payload + identity + quick actions, and the idle rail shows the live subscription status (Stream open / cursor / dropped / loaded), all with zero console errors.

Packing pass (operator review). A follow-up review flagged that the multi-card Event Details rail overflowed into a page-level scroll, the Event Rate read as a plain stacked bar, and the page carried dead vertical space. Reworked to the mock's denser composition: the Event Details is now ONE packed .panel.card (severity header + close ✕ + Identity / Source / Payload-with-Copy-JSON / Quick Actions) that fills the right column and scrolls INTERNALLY — fixing the page-level scroll; the Event Rate is now a per-category multi-line chart (tool.* / task.* / llm.* / planner.* / …) with a Type / Rate / Total legend on the right of the same card; and the page padding / gaps are tightened so the whole surface packs into one viewport with no full-page scroll even with the detail open (verified scrollHeight == innerHeight).

Tests. An updated web/console/tests/events-page.spec.ts (Playwright) covers hydration, the carded regions, the four PageState branches, row-select → detail rail, the idle rail subscription status, the pause toggle, and the disconnected shell; the events lib vitest suites stay green (extended for the sparkline-facet-scoping change). scripts/smoke/phase-108h.sh (static) guards the PageHeader removal + the carded vocabulary + the load-bearing testids. npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Refines the composition of D-125; preserves D-061 (Console DB local-only), D-079 (admin fan-in gate), D-026 (heavy-payload by-reference), D-073 (traceparent post-V1), D-074 (durable event log). Builds on Phase 73g (the page it rethemes), Phase 60 / 72 / 72a (events.subscribe + events.aggregate), Phase 108b (chrome) + Phase 108c (the carded vocabulary) + Phase 108g (the DataTable alignment + viewport-lock pattern reused). RFC §7, §7.1; brief 11 §"Events view"/§LR-5/§CC-2/§CC-4; brief 12 §"two-surface model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108h-console-events-page.md.


D-181 — Console Tasks page rebuilt to a carded, viewport-locked single-page mode-switch + the per-task dock wired to the live run-scoped event stream (supersedes the Phase 73d / D-123 pre-chrome layout + the placeholder detail tabs)

Date: 2026-06-01

Context. D-123 (Phase 73d) shipped the Tasks page as the task-granularity counterpart to Sessions — a kanban board (Pending / Running / Paused / Complete / Failed) over tasks.list, a list-mode DataTable toggle, a per-task detail with a bottom-dock tab strip, and the bulk + per-task control verbs (Phase 54, control-scope gated per D-066). The board, the list, the filters, the saved views, and the control wiring are genuinely real. But the page predates the 108b app-shell chrome and the 108c carded retheme: it renders a per-page PageHeader (duplicating the breadcrumb chrome) and is not viewport-locked, so the board + the selected-task detail bar + the bottom dock stack into a document that full-page-scrolls. More importantly, the per-task detail tabs (TaskDetailTabs) are SHALLOW — the Events / Logs / Control History / Interventions tabs render placeholder prose ("Live task.* event deltas surface on the kanban board") instead of the live event-bus data the mock (docs/rfc/assets/console-tasks-page.png) and spec §3 / §5 / §12 call for. A live-wire investigation (2026-06-01, against the YouTube validation agent) also pinned three honest gaps: tasks.get.cost comes back all-zero (cost is only real from the llm.cost.recorded event stream); tasks.get.parent_session comes back sparse from the registry (empty agent/status/started); and tasks.list rows carry no agent_name, so the mock's per-card "Agent: Research Agent" line has no wire source.

Decision (Phase 108i). Rebuild the page to the carded .panel.card + .panel-title vocabulary the six done pages set (Overview, Live Runtime, Settings, Playground, Sessions, Events), drop the per-page PageHeader (the breadcrumb is chrome), and compose it as a single viewport-locked mode-switch (PAGE-POLISH §6 — the Playground / Sessions pattern). The mock crams board + selected-task detail bar + bottom dock + right rail simultaneously, which cannot fit one viewport without a page scroll; the operator-signed composition (STEP-0 AskUserQuestion, 2026-06-01) is therefore: board/list is the default mode (the faceted filter strip + the board columns / the list table filling the viewport + a right-rail live board summary); clicking a card swaps the SAME page's main region to detail mode (a compact task header + the real per-task action bar + the bottom-dock tab strip in one internally-scrolling card) and swaps the rail to Summary / Parent Session / Cost; a ← Board affordance returns. No route navigation — it stays one page. Wire the per-task bottom dock to the live events.subscribe SSE (the Sessions 108g BottomDockTabs pattern), RUN-scoped: the dock opens ONE subscription scoped to the task's parent session and filters to the task's run. The phase ships NO new Protocol method — the page stays a pure consumer of tasks.list / tasks.get / events.subscribe / pause.list + the shipped Phase 54 control verbs + the Console DB (CLAUDE.md §13).

Run-match is payload.TaskID, not just e.run (live-wire finding, §17.6). The PAGE-POLISH §3 live-wire pass found that the top-level run field is populated on llm.cost.recorded and planner.decision events but is NULL on the task.* lifecycle events (task.spawned / task.started / task.completed), which carry the id in the PascalCase payload.TaskID instead. A naive e.run === taskID per-task filter (the obvious first cut) would therefore silently drop every lifecycle event from the Events tab and mis-render the task's timeline. The dock's per-task predicate is e.run === taskID || payload.TaskID === taskID || payload.Identity.RunID === taskID, locked by a run-events vitest against a captured real SSE frame. This is the Tasks-page instance of the recurring SSE casing/shape gotcha (a decoder reading the wrong shape silently drops every value) that bit the Events page (D-180).

Cost is from the event stream, by token type (operator sign-off). tasks.get.cost is all-zero on the validation runtime, so the Summary cost/tokens figures and the right-rail Cost Breakdown are aggregated CLIENT-SIDE from the run-scoped llm.cost.recorded events (payload.Cost.TotalCost, payload.Usage.TotalTokens, payload.Cost.InputTokensCost / OutputTokensCost / ReasoningTokensCost — all PascalCase, verified live), the same projection Sessions / Overview cost uses. The mock's Cost Breakdown card shows LLM / Tools / Embeddings / Overhead rows, but no such category split exists on the wire; per the operator's STEP-0 sign-off the card renders by TOKEN TYPE (Input / Output / Reasoning / Total — the real payload.Cost fields), which keeps the mock's four-row visual while staying 100% wire-real, never inventing a Tools/Embeddings/Overhead split.

Supersedes. The composition/layout half of D-123 — the PageHeader-led, non-viewport-locked board+detail stack — and the placeholder TaskDetailTabs (Events / Logs / Control History / Interventions prose blurbs), which is deleted in favour of the live TaskBottomDock. The data-read layer of D-123 (the tasks.list / tasks.get wire types, the cursor pagination, the kanban-column model, the Console-local D-061 saved filters) and the Phase 54 control wiring are unchanged.

No fabrication (CLAUDE.md §13, PAGE-POLISH §1). Every datum is traced to a live wire and verified against real Runtime data. The genuine V1 gaps render honest states, never fabricated values: the Logs tab needs the Phase 73 state.history surface (still Pending) and renders an honest empty state pointing at it (and the Events tab for the live log); runtime-side search.tasks (brief 11 §CC-4 / [wave-13-extends]) is absent, so the search box stays a Console-local substring match over the loaded page with honest copy; the per-card agent line is replaced by the parent session id + the query snippet (no agent_name on the row); the Parent Session card shows the real session_id + link and for the sparse registry fields; the board drag-to-transition gesture maps only where a real control verb exists (running→paused = pause, paused→running = resume, running→failed = cancel) and is otherwise a no-op toast — the board is primarily click-to-select (D-065 / spec §10 keep priority on the explicit prioritize verb, not a drag).

Tests. A new web/console/src/lib/tasks/run-events.test.ts (Vitest) covers the eventBelongsToRun predicate against a captured real frame (a task.completed whose run is null but payload.TaskID matches IS included; a foreign-run event is excluded) and the trajectory / control / interventions / cost projections. An updated web/console/tests/tasks-page.spec.ts (Playwright) covers hydration, the carded board, the four PageState branches, the board→detail mode-switch, the dock tab strip, the action-bar control gating, and the disconnected shell. scripts/smoke/phase-108i.sh (static) guards the PageHeader removal, the carded vocabulary, the TaskBottomDock import, the load-bearing testids, and the TaskDetailTabs deletion. npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Refines the composition of D-123; preserves D-061 (Console DB local-only), D-066 (control claim), D-065 (no session-level priority; task-level Prioritize stays), D-047 (TaskRegistry state machine the kanban columns mirror), D-026 (heavy-payload by-reference), D-079 (admin scope gate), D-171 (session header / blank-session default). Builds on Phase 73d (the page it rebuilds), Phase 54 (the control verbs), Phase 60 / 72 / 72e (events.subscribe + pause.list), Phase 108b (chrome) + Phase 108c (the carded vocabulary) + Phase 108g (the Sessions BottomDockTabs run-scoped pattern + the DataTable sticky-header / clickable-row fix) + Phase 108h (the viewport-locked carded Events pattern). RFC §7, §7.1; brief 11 §"Tasks view" / §"Per-task detail pane" / §CC-4; brief 12 §"two-surface model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108i-console-tasks-page.md.


D-182 — Console Background Jobs page rethemed to the carded, viewport-locked composition + the right rail deepened to full mock fidelity + the king file refactored to a controller (supersedes the Phase 73h / D-128 pre-chrome layout)

Date: 2026-06-02

Context. D-128 (Phase 73h) shipped the Background Jobs page as the focused queue projection of tasks.list with kinds: ['background'] — the queue table, the Console-side AwaitTask orphan detector, the planner-progress mini-bar, the Console-DB saved filters, and the bulk Phase-54 control verbs are genuinely real and wired. But the page predates the 108b app-shell chrome and the 108c carded retheme: it renders a per-page PageHeader (duplicating the breadcrumb chrome), is not viewport-locked, and the right rail is SHALLOW — its Details / Progress / Logs / Approvals / Artifacts / Related tabs render placeholder prose ("events for this job stream on the Events page…", "Artifacts… are listed via artifacts.list?task_id=…") instead of the live data the mock (docs/rfc/assets/console-background-jobs-page.png) and spec §12 call for. The +page.svelte had also grown to an ~801-line king file mixing the controller, the loaders, the bulk verbs, the saved-view CRUD, and the rail wiring in one monolith.

Decision (Phase 108j). Retheme the page to the carded .panel.card + .panel-title vocabulary the seven done pages set (Overview, Live Runtime, Settings, Playground, Sessions, Events, Tasks), drop the per-page PageHeader (the breadcrumb / ⌘K / footer are 108b chrome), and compose it as the viewport-locked Events-108h shape (operator STEP-0 sign-off, 2026-06-02): TABLE-primary on the left (the queue fills the viewport and scrolls internally behind a sticky <thead>) + a right-rail detail on the right (the table stays visible; the rail shows the selected job or an idle "select a job" hint) — NOT a Tasks-style mode-switch. Deepen the right rail to full mock fidelity: a header (short hash + kind + status + copy + close), the tabs Details | Progress | Events | Control History, and the sections Artifacts-for-this-Job / Parent task / Related Sessions, all packed into ONE internally-scrolling rail card. Wire Events / Control History to a RUN-scoped events.subscribe projection by REUSING the Tasks-108i lib/tasks/run-stream.svelte.ts (TaskRunStream) + run-events.ts (eventBelongsToRun / filterControlEvents / …) — never a fork (CONVENTIONS.md §3). The phase ships NO new Protocol method — the page stays a pure consumer of tasks.list / tasks.get / artifacts.list / events.subscribe + the shipped Phase 54 control verbs + the Console DB (CLAUDE.md §13).

King-file refactor. The ~801-line +page.svelte is decomposed to today's standard (the Tasks-108i pattern): the controller / async-state logic moves into a BackgroundJobsPageState class in lib/background-jobs/state.svelte.ts (mirrors EventsPageState, with $state fields for everything an async load assigns after first render — the D-180 lesson); the pure projections live in unit-testable .ts modules (derive.ts for ETA / type / state-timeline + the kept orphan-detector.ts); the right rail splits into focused components (JobDetailRail.svelte + JobProgressTab.svelte). The pre-chrome RightRail.svelte is deleted.

ETA, type, and timeline are derived honestly (operator sign-off, PAGE-POLISH §1). There is no dedicated ETA / type wire field, so each is a Console-local derivation that NEVER fabricates: the ETA is the planner's own task.progress hint projected over elapsed wall time (remaining = elapsed × (1−progress)/progress), labelled an estimate, and reads "Unknown" when no hint was emitted; the type badge is the first planner-emitted tag, else a keyword match over the spawn description (Indexer / Report / Long Poll), else the generic "Job"; the Progress state-transition timeline is the run's REAL task.* lifecycle events from the run-scoped SSE. The run-scoped SSE is live-only (no backlog on a fresh connect), so the Events / Control History tabs + the timeline render honest-empty states for a quiet or already-finished run — the same constraint the Sessions / Tasks docks carry (D-181). Artifacts-for-this-Job is artifacts.list filtered by scope.task = the job's run id (verified live: rows are {ref:{id,mime_type,size_bytes,filename}, tags, source}), honest-empty when the job produced none; spawned-by shows the parent_task_id or "—" (no fabricated agent); Related Sessions is tasks.list?group_id siblings, honest-empty when not grouped.

Live-wire verification (PAGE-POLISH §3). The empty / filtered-empty states were verified against the live YouTube validation harbor dev runtime (it spawns no background work, so tasks.list {kinds:['background']} is honestly empty). The populated queue + the rail were verified against a HARBOR_DEV_SEED_FIXTURES=1 runtime (two seeded background jobs — "Background index rebuild" / "Background summary job"), confirming the queue rows, the Details tab, the honest Progress (no seeded progress hint → "Unknown" + indeterminate bar), and the honest-empty Artifacts / Parent / Related / Events / Control sections. Zero browser console errors; scrollHeight == innerHeight (no full-page scroll) and zero horizontal table overflow at the supported width.

Supersedes. The composition/layout half of D-128 — the PageHeader-led, non-viewport-locked layout + the shallow placeholder right rail (RightRail.svelte, deleted). The data-read layer of D-128 (the kinds: ['background'] queue projection, the Console-side orphan detector, the planner-progress mini-bar, the Console-local D-061 saved filters) and the Phase 54 bulk-control wiring are unchanged.

Tests. A new web/console/src/lib/background-jobs/derive.test.ts (Vitest) locks the ETA / type / timeline projections against their honest states (no-progress → "Unknown"; no-signal → "Job"; newest-first events → oldest-first timeline; group events excluded). The existing orphan-detector.test.ts is unchanged. A rewritten web/console/tests/background-jobs-page.spec.ts (Playwright) covers hydration, the carded filter strip, the background-kind queue, bulk-select → toolbar, row → rail tab navigation, the orphan dialog, control-scope gating, the viewport-lock (no full-page scroll), and the disconnected redirect. scripts/smoke/phase-108j.sh (static) guards the PageHeader removal, the carded vocabulary, the BackgroundJobsPageState / derive.ts / JobDetailRail / JobProgressTab files, the RightRail.svelte deletion, the load-bearing testids, and the preserved Save-view contract (phase-83s / disconnected-state N7). npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Refines the composition of D-128; preserves D-061 (Console DB local-only), D-066 (control claim), D-065 (no session-level priority; task-level Prioritize stays), D-047 (SpawnTask / AwaitTask — the orphan detector surfaces the pairing), D-026 (heavy-payload by-reference — artifacts by ref, never inline bytes), D-079 (admin scope gate), D-171 (session header / blank-session default), D-128 (the orphan detector / progress-bar / saved-filter data layer). Builds on Phase 73h (the page it rebuilds), Phase 73d (tasks.list / tasks.get), Phase 54 (the control verbs), Phase 73l (artifacts.list), Phase 60 / 72 (events.subscribe), Phase 108b (chrome) + Phase 108c (the carded vocabulary) + Phase 108h (the viewport-locked carded Events pattern) + Phase 108i (the reused TaskRunStream / run-events data layer + the DataTable sticky-header / clickable-row fix). RFC §6.8, §7; brief 11 §"Background Jobs view" / §CC-4; brief 12 §"The two-surface model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108j-console-background-jobs-page.md.


D-183 — Console Tools page rethemed to the carded, viewport-locked composition + the right rail deepened to full mock fidelity + the king file refactored to a controller (supersedes the Phase 73f / pre-chrome layout)

Date: 2026-06-02

Context. D-116 (Phase 73f) shipped the Tools page as the registered-tool-catalog browser — the seven tools.* Protocol methods (tools.list / tools.get / tools.describe / tools.metrics / tools.content_stats + the admin tools.set_approval_policy / tools.revoke_oauth), the Console-DB saved filters, the faceted catalog table, the per-tool descriptor tabs, and the right-rail stat cards are genuinely real and wired. But the page predates the 108b app-shell chrome and the 108c carded retheme: it renders a per-page PageHeader (duplicating the breadcrumb chrome), is not viewport-locked (the table → detail tabs → rail cards stack vertically and full-page-scroll), and the +page.svelte had grown to an ~847-line king file mixing the controller, the loaders, the admin writes, the saved-view CRUD, the export, and the rail wiring in one monolith.

Decision (Phase 108k). Retheme the page to the carded .panel.card + .panel-title vocabulary the eight done pages set (Overview, Live Runtime, Settings, Playground, Sessions, Events, Tasks, Background Jobs), drop the per-page PageHeader (the breadcrumb / ⌘K / footer are 108b chrome), and compose it as the viewport-locked Events-108h / Background-Jobs-108j shape (operator STEP-0 sign-off, 2026-06-02): a filter card + a layout of TABLE-primary on the left (the catalog fills the viewport and scrolls internally behind a sticky <thead>) + a right-rail detail on the right (the table stays visible; the rail shows the selected tool's full detail or — with nothing selected — the catalog overview idle state) — NOT a Tasks-style mode-switch. Deepen the right rail to full mock fidelity in ONE packed internally-scrolling card: a descriptor header (name + transport/scope + side-effect / OAuth / approval badges + copy + close), a tab strip (Manifest | Inputs | Outputs | Recent invocations | Approval), then a Statistics card (tools.metrics error-rate gauge + status pill + window toggle), a Content size & display-mode card (tools.content_stats), a Source-provenance card, and a Run-history strip. The phase ships NO new Protocol method — the page stays a pure consumer of the shipped seven tools.* methods + the Console DB (CLAUDE.md §13).

King-file refactor. The ~847-line +page.svelte is decomposed to today's standard (the Background-Jobs-108j pattern): the controller / async-state logic moves into a ToolsPageState class in lib/tools/state.svelte.ts (mirrors BackgroundJobsPageState, with $state fields for everything an async load assigns after first render — the D-180 lesson); the pure projections live in a unit-testable lib/tools/derive.ts (lastUsed / oauthKind / approvalKind / statusKind / toPageError / displayStatus) with a derive.test.ts; the catalog table and the right rail split into focused components/tools/ToolCatalogTable.svelte + ToolDetailRail.svelte. The pre-chrome ToolDetailTabs.svelte is deleted (superseded by the rail). The existing body-only stat cards (StatusErrorRateCard / ContentSizeCard / RunHistoryStrip), the ToolOverviewCard (now the idle-rail content), the ToolFacetChips, and lib/tools/export.ts are reused, not rewritten.

"Try this tool" is omitted (operator sign-off, 2026-06-02). The mock gestures at a developer "Try this tool" form that depends on a tools.invoke Protocol method. A live probe of the validation runtime confirmed only the seven canonical tools.* methods exist — tools.invoke is NOT shipped at V1 (D-132, unchanged). The Phase 73f page surfaced a disabled-with-tooltip tools-try-tool affordance naming the deferral; per operator STEP-0 decision the 108k rebuild omits the affordance entirely rather than carrying a disabled stub. The tools.invoke deferral itself is unchanged; the affordance simply does not render on the rebuilt page. This supersedes the tools-try-tool half of D-132's Console treatment (the method-level deferral stands); docs/design/console/page-tools.md §3/§13 is updated to record the omission.

No fabrication (CLAUDE.md §13, PAGE-POLISH §1). Every datum is traced to a live wire and verified against real Runtime data. The genuine V1 gaps render honest states, never fabricated values: tools.metrics / tools.content_stats are all-zero / empty for a never-invoked tool (the live probe confirmed every catalog tool in the validation agent starts so), and the stat cards render their own honest "no invocations / no recent invocations recorded" copy, never a fabricated rate/latency; the OAuth / Approval badges are real from the descriptor ("n/a" when neither); the last_used_at Go zero time renders an honest "never"; recent invocations stream from the tool.* Events surface (no durable read-back here — the rail points at it); free-text search is a Console-local tools.list search facet (brief 11 §CC-4 — no runtime search.tools); the Approve / Reject + bulk Revoke OAuth call the REAL admin methods and render disabled-with-tooltip without the admin claim (D-079), never a fabricated success.

Live-wire verification (PAGE-POLISH §3). Every method was probed against the live YouTube validation harbor dev runtime, whose catalog is genuinely populated (7 real tools: artifact_fetch + the MCP youtube_* family). Pinned shapes: tools.list returns the rows + {total:7, active:0, pending_approval:0, awaiting_oauth:0} aggregates; tools.describe returns the real args_schema / out_schema JSON + side_effect:external + loading_mode:always; tools.metrics is all-zero with status:Healthy; tools.content_stats is {histogram:[], heavy_threshold_bytes:0, heavy_count:0}; tools.set_approval_policy round-trips 200 {id,policy}; an unknown id → 404 not_found; a no-identity request → 401 identity_required. The loaded / selected / admin-round-trip states were verified populated; empty / filtered-empty via a tight facet; error via the unknown-id path; disconnected via no connection (the idle overview renders placeholders, not fabricated zeros). Zero browser console errors; scrollHeight == innerHeight (no full-page scroll) and zero horizontal table overflow at the supported width.

Supersedes. The composition/layout half of D-116 — the PageHeader-led, non-viewport-locked table+tabs+rail stack + the standalone ToolDetailTabs.svelte (deleted) — and the tools-try-tool disabled-affordance half of D-132's Console treatment (the tools.invoke method deferral stands). The data-read layer of D-116 (the seven tools.* wire types + the page-based pagination + the Console-local D-061 saved filters + the admin-write wiring) is unchanged.

Tests. A new web/console/src/lib/tools/derive.test.ts (Vitest) locks the pure projections against the real wire shapes (the Go zero time → honest "never"; the StatusChip mappings exhaustive over the wire enums; toPageError keeps the Protocol code; displayStatus derives ready/empty live from the loaded-row count). A rewritten web/console/tests/tools-page.spec.ts (Playwright) covers hydration, the carded catalog table + mockup columns, a facet toggle re-render, the row → rail tab navigation, the real-admin-or-disabled Approve control, and the disconnected redirect. scripts/smoke/phase-108k.sh (static) guards the PageHeader removal, the carded vocabulary, the ToolsPageState / derive.ts / ToolCatalogTable / ToolDetailRail files, the ToolDetailTabs.svelte deletion, the load-bearing testids, the scoped DataTable override, and the preserved Save-view contract (phase-83s / disconnected-state N7). scripts/smoke/phase-83x.sh N13 grep is retargeted from +page.svelte to ToolCatalogTable.svelte where the Reliability column width token now lives (§17.6). npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Refines the composition of D-116; supersedes the tools-try-tool affordance half of D-132 (method deferral stands). Preserves D-061 (Console DB local-only — saved views + export), D-066 (control claim), D-024 (ToolPolicy reliability shell — the Manifest tab), D-026 (heavy-content threshold — the Content-size card), D-062 (MCP-Apps DisplayMode via the canonical registry), D-079 (admin scope gate for the writes), D-083 (auth.BindingScope — the OAuth badge), D-086 (tool-side approval gates), D-132 (tools.invoke deferral), D-171 (session header / blank-session default), D-180 (derive display-state live). Builds on Phase 73f (the page it rebuilds), Phase 26–31 / 64a (the tool catalog + transports + OAuth + approval the descriptors project), Phase 108b (chrome) + Phase 108c (the carded vocabulary) + Phase 108h (the viewport-locked carded Events pattern) + Phase 108j (the table-left + right-rail-detail twin + the scoped DataTable override + the controller refactor). RFC §6.4, §6.5, §7; brief 11 §"Tools view" / §CC-4 / §PG-3; brief 12 §"The two-surface model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108k-console-tools-page.md.


D-184 — Agents fleet-control Protocol surface landed + the Console Agents page rebuilt to the carded, viewport-locked three-column canvas with live control + activity (supersedes the D-132/F4 control-verb deferral + the Phase 73e / D-124 pre-chrome layout)

Decision (Phase 108l). Land the five agent fleet-control verbs as real Protocol methods AND retheme both Agents routes to the page-polish standard, in one PR (the primitive + its consumer in the same wave — CLAUDE.md §13). Runtime half: agents.{pause,drain,restart,force_stop,deregister} mount at POST /v1/agents/{verb} (the same one-shot shape as the eight read methods), each wrapping the shipped in-process registry.* control verb (D-066) through a Controller seam on the agents Protocol service. They are admin-gated: the handler computes controlScoped from the verified auth.ScopeAdmin claim and the service fails closed (ErrControlScopeRequired403 identity_scope_required) before attaching registry.WithControlScope(ctx) and invoking the registry. Request/response shapes are AgentControlRequest{identity, id, reason} / AgentControlResponse{agent_id, command, status}. Console half: the detail route's ControlButtons call the real methods (control-scope gated — disabled-with-tooltip without admin, never a fabricated success), the previously-placeholder AgentActivityFeed projects a live agent.* events.subscribe stream filtered to the agent, both routes are rebuilt as carded .panel.card viewport-locked pages (the detail as the three-column main canvas — tabbed detail / topology / activity+tools+memory — NOT a right rail, per page-agents.md §4), and both ~500-line king files are refactored into AgentsListPageState / AgentDetailPageState controllers + a pure, unit-tested lib/agents/derive.ts.

Honest re-read status (CLAUDE.md §13 — no fabrication). The V1 registry Health enum is {unknown, healthy, degraded, draining, stopped} — there is NO "paused". So a control response re-reads the agent's ACTUAL post-command status rather than echoing the command's intent: pause and restart emit their agent.paused / agent.restart_requested event but leave the observable status active; only drain→drained, force_stop→force_stopped, and deregister→deregistered (record removed) transition it. The Console surfaces the returned status truthfully and treats the emitted event (observed live on the activity feed) as the observation of pause/restart — it never claims a "paused" state the registry did not produce.

Live-wire verification (PAGE-POLISH §3). The five verbs were probed against a live harbor dev runtime: each returns admin-gated 404 not_found on an unknown id (route mounted + admin-gated), a no-identity request → 401, and a non-admin token → 403 identity_scope_required. Populated page states were verified against a HARBOR_DEV_SEED_FIXTURES=1 runtime (the plain validation runtime's registry is empty — the synthetic default agent is not a registered row): the list hero rollup + cards, the detail three-column canvas + each of the six tabs, a control round-trip (drain → status flips to drained + agent.drained on the activity feed; pause → status stays active + agent.paused event shown), and the disabled-without-admin control surface. Empty via the plain runtime, error via a bad token, disconnected via no connection. Zero browser console errors; scrollHeight == innerHeight (no full-page scroll) and zero horizontal overflow at the supported width.

Supersedes. The D-132/F4 agent fleet-control deferral (the previous ControlButtons rendered disabled-with-tooltip "no registry.* Protocol surface exists" regardless of scope) — the surface now exists and the buttons are live, admin-gated. AND the composition/layout half of the Phase 73e / D-124 Agents page (the PageHeader-led, non-viewport-locked rollup + cards + the DetailRail-based detail). The data-read layer of D-124 (the eight agents.* read wire types + the page-based pagination + the Console-local D-061 saved filters) is unchanged.

Tests. Go: internal/runtime/registry/protocol/control_test.go (gating, honest re-read, sentinels) + internal/protocol/transports/stream/agents_handler_test.go (control gating + routing through the real registry controller, identity propagation, 404/403/401 failure modes, under -race); the single-source / conformance count assertions bump 71→76. Console: web/console/src/lib/agents/derive.test.ts (Vitest — StatusChip mappings exhaustive over the wire enums, displayStatus derives ready/empty live from the row count (D-180), projectActivity filters by the payload AgentID and stays empty when quiet, controlResultMessage reports the honest re-read status); web/console/tests/agents-page.spec.ts is updated from the D-132/F4 disabled-stub assertion to the control-scope degradation contract. scripts/smoke/phase-73e.sh adds the five control-verb live assertions; scripts/smoke/phase-108l.sh (static) guards the carded vocabulary, the controllers + derive.ts, the preserved testids (incl. the disconnected-state N7 Save-view contract), and the no-hand-rolled-fetch rule. npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Supersedes D-132/F4 (control deferral) + the layout half of D-124. Preserves D-059 (agent_id is a registration identity, NOT an isolation principal — control scopes by the tuple, never by agent_id), D-060 (Agent Registry in-process per-runtime), D-061 (Console DB local-only), D-062 (Agents ≠ chatbots), D-066 (control-scope claim), D-079 (admin scope gate), D-083 (tool-side OAuth — the binding rows deep-link, no parallel flow), D-171 (session header default), D-180 (derive display-state live). Builds on Phase 53a (Agent Registry), Phase 73e (the Agents read surface it rebuilds), Phase 108b (chrome) + Phase 108h/108j/108k (the carded viewport-locked page-polish pattern + the controller refactor). RFC §6.16, §6.4, §7.2; brief 11 §"Agents view" / §CC-4; brief 12 §"Open architectural questions … resolved here"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108l-console-agents-page.md.


D-185 — Console MCP Connections page rethemed to the carded, viewport-locked master-detail composition + the right rail deepened to full mock fidelity + the king file refactored to controllers (supersedes the Phase 73k / D-119 pre-chrome layout)

Decision (Phase 108m). Retheme the Console MCP Connections page to the Phase 108 page-polish standard. It is a PURE Protocol consumer — the mcp.servers.* surface shipped in Phase 73k / D-119; this phase builds NO new Protocol method. The page becomes the carded (.panel.card), viewport-locked master-detail composition modelled on Tools-108k (D-183): a filter card (saved-view chips + state facets + Console-side search) + the servers TABLE on the left (scrolling internally behind a sticky <thead>) + a right-rail server detail on the right (or the catalog-overview idle state when nothing is selected). The separate tabbed-detail route (/mcp-connections/[server]) is REMOVED — the rail is the single detail surface (CLAUDE.md §13 — no two parallel implementations of one feature). The rail is deepened to full mock fidelity: header (name + state badge + transport + endpoint + last-discovery + tool/resource/prompt/OAuth counts + Refresh discovery / Test connection / raw-HTML toggle), a five-tab strip (Tools | Resources | Prompts | OAuth bindings | Policy), a LIVE Recent-events card, and a binding-scope summary. The king file is refactored into McpListState + McpDetailState controllers + a pure, unit-tested derive.ts (folding in the former status.ts) + the focused ServersTable / McpDetailRail / McpOverviewCard / McpRecentEvents components.

Honest wiring (CLAUDE.md §13 — no fabrication). Refresh discovery calls the real mcp.servers.refresh_discovery and re-loads the detail so the rendered counts + last-connect reflect the runtime; Test connection calls the real mcp.servers.probe and surfaces the ACTUAL outcome (Reachable — round-trip N ms, or the transport error) — never a faked OK. The raw-HTML trust toggle calls the real mcp.servers.set_raw_html_trust and is admin-gated (D-079, D-066): disabled-with-tooltip without the admin claim, and a real re-read reflects the new trust (a wiring gap is closed here — the pre-chrome McpDetailState.isAdmin was never set, so the toggle was permanently disabled regardless of scope; it now derives from hasScope(connection, 'admin')). OAuth Connect/Reconnect/Revoke deep-link to the Tools binding surface + wire refresh_binding / revoke_binding directly (no parallel OAuth path, §13 / D-083). The Recent-events card is a LIVE-only events.subscribe projection (mcp.resource_updated + tool.auth_required + the transport-error subset of tool.failed), honestly empty until events stream in (no durable read-back); mcp.resource_updated / tool.auth_required are attributed to the server by the payload Source id, and tool.failed (which carries no server field) by membership in the server's owned tool-name set (tools.list owner === name) — an event that cannot be attributed is dropped, never mislabelled. The youtube validation server advertises 0 resources / 0 prompts / 0 bindings, so those tabs + the summary render their HONEST empty copy.

Live-wire verification (PAGE-POLISH §3). Verified against a live harbor dev runtime configured with a REAL MCP server (youtube via stdio, 6 tools — mcp.servers.list returns a populated catalog with no fixture seeding). Each mcp.servers.* method was curl-probed and the real payload captured (protocol_version 0.1.0, url_or_command uvx mcp-youtube, tool_policy {30000, 3, 0}, empty resources/prompts/bindings/health, probe {ok:true, latency_ms:2}). The recent-event decoders are unit-pinned against the REAL PascalCase SSE payload fields (Source, ToolName, URI, ErrorClass) — the GET /v1/events projection marshals exported Go fields UNTAGGED, so a snake_case decoder silently drops every value (§3 casing gotcha). Browser-truth: every state was screenshotted at 1512×945 — the list + idle overview, the detail rail (each tab), a Test-connection round-trip (Reachable — round-trip 6 ms), a Refresh-discovery re-read (header flips to "last connect just now"), the admin raw-HTML toggle (flips to "trusted" + a header badge, then back), the honest-empty OAuth/Resources/Prompts tabs + binding-scope summary, filtered-empty (non-matching search), error (bad token → auth_rejected: jwt rejected: token_malformed + Retry), and the disconnected → /settings redirect (Phase 105). Zero browser console errors on a clean load; scrollHeight == innerHeight (no full-page scroll) and zero horizontal overflow.

Supersedes. The Phase 73k / D-119 pre-chrome MCP Connections layout: the PageHeader-led, non-viewport-locked list + the DetailRail-summary + the separate /mcp-connections/[server] tabbed-detail route. The data-read layer of D-119 (the mcp.servers.* wire types + the MCPServersNamespace client + the Console-local D-061 saved filters) is unchanged. The six-tab detail (with a Health tab) collapses to the five-tab rail (Tools | Resources | Prompts | OAuth bindings | Policy) per the page-polish scope; mcp.servers.health stays a shipped method, unconsumed by this page.

Tests. Console only (no Go change). web/console/src/lib/mcp-connections/tests/derive.test.ts (Vitest — the mcpStatusKind / mcpStateLabel mappings exhaustive over the wire enum; relativeTime renders the Go zero time as never; serverStateCounts; displayStatus derives ready/empty live from the row count (D-180); extractEventSource / extractEventToolName read the real PascalCase payload; summarizeMcpEvent + projectServerEvents attribute by Source / owned-tool-name and drop unattributable events). The existing web/console/src/lib/mcp-connections/tests/state.svelte.spec.ts (four-state contract + control-surface routing) stays green against the extended controllers. web/console/tests/mcp-connections-page.spec.ts is rewritten for the rail-based master-detail (the row-select → rail, the five tabs paint in place, the Tools deep-link, the admin-disabled raw-HTML toggle, the disconnected redirect). scripts/smoke/phase-108m.sh (static) guards the removed [server] route + the removed PageHeader + the carded vocabulary + the controllers + derive.ts (status.ts folded in) + the four components + the five tabs + the real action wiring + the isAdmin gate + the live EventsSubscription + the Save-view N7 contract + the no-hand-rolled-fetch rule. npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Supersedes the layout half of D-119 (Phase 73k MCP Connections). Preserves D-061 (Console DB local-only — saved-view chips), D-062 (MCP-Apps renderer registry — the Resources tab inventories read-only, no bespoke renderer), D-065 (no session priority — none rendered), D-066 (control-scope claim — Refresh / Test / admin verbs), D-079 (admin scope gate — raw-HTML + OAuth admin verbs), D-083 (tool-side OAuth — the bindings deep-link, no parallel flow), D-119 (the mcp.servers.* surface), D-121 (CONVENTIONS.md foundation), D-171 (session header default), D-180 (derive display-state live). Builds on Phase 73k (the MCP read surface it rebuilds), Phase 73f (tools.list), Phase 73g (events.subscribe), Phase 105 (the disconnected redirect), Phase 108b (chrome) + Phase 108k / D-183 (the carded viewport-locked master-detail pattern + the controller refactor it mirrors). RFC §6.4, §7; brief 11 §"MCP Connections view" / §PG-3 / §"Open architectural questions" #8; brief 12 §"The two-surface model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108m-console-mcp-connections-page.md.


D-186 — Console Memory page rethemed to the carded, viewport-locked master-detail composition + the king file refactored to a controller, AND the real memory.strategy_trace read + admin-gated memory.put / memory.delete mutation pair landed (supersedes the Phase 73j / D-118 read-only layout + the deferred mutation/trace surfaces)

Decision (Phase 108n). Bring the Console Memory page to the Phase 108 page-polish bar AND land the Protocol surface it needed to stop deferring features — the primitive + its consumer in one wave (CLAUDE.md §13, the D-184 pattern). Three new methods route through the existing memory stream handler: memory.strategy_trace (read, no admin), memory.put (admin-gated "add a turn"), memory.delete (admin-gated "evict a turn by key"). All three compose the ALREADY-SHIPPED MemoryStore interface (Phases 23–25): strategy_trace = GetLLMContext + Health; put = AddTurn; delete = Snapshot → drop-the-keyed-turn → Restore. NO driver-seam change — all three V1 drivers (inmem / sqlite / postgres) back them with zero per-driver work, so no conformance-parity churn. Console half: a carded .panel.card viewport-locked master-detail rebuild (filter card + records table-left + a stacked right rail of health / strategy-trace / live-events / add-memory / selected-item), the ~728-line king file refactored into a MemoryPageState controller + a pure unit-tested derive.ts + the focused MemoryTable / MemoryEventsCard / StrategyTraceCard / AddMemoryComposer components, the per-page PageHeader dropped.

Honest scope — strategy_trace is a real projection, promotions stays deferred (§13 / PAGE-POLISH §1). The mock's "strategy debugger" asks for a per-step selection trace with rejected items. The rolling_summary strategy SUMMARISES (it does not select-and-reject candidates), so memory.strategy_trace ships the honest, real form: the strategy's LIVE GetLLMContext output — the rolling-summary text (the compaction OUTPUT), the verbatim-turn count, the token estimate — plus Health. It is real runtime state, never a fabricated rejection list; an empty session projects an empty trace. The memory.promotions viewer is NOT shipped: cross-session memory promotion is UNIMPLEMENTED in the runtime (only a doc comment in types/memory.go; a code audit found zero implementation), so a method would be a hollow always-empty stub — forbidden. It stays an honest finding until the promotion subsystem lands (an RFC-level addition the user explicitly deferred).

Wire-shape bug fixed (PAGE-POLISH §3) + lossless delete. memory.get's Value is a Go []byte, which encoding/json marshals as BASE64; the pre-chrome value viewer JSON.parsed the raw base64 and rendered gibberish. Fixed in derive.decodeMemoryValue (base64 → UTF-8 → pretty JSON; multibyte-safe). The memory.delete read-modify-write decodes the snapshot Record, drops the keyed turn, and re-marshals — so memory.Record gained a Summary field (additive, matching the strategy's persisted memoryStateRecord json tag) to round-trip the rolling summary LOSSLESSLY rather than silently dropping it. The mutations are admin-gated at the handler edge (auth.ScopeAdmin strictly — console:fleet is a cross-runtime OBSERVATION claim, never a write entitlement) and emit memory.item_put / memory.item_deleted audit events (SafePayload — the hashed key + operation only, NEVER the turn text). The deferred event-feed card (D-132/W5) is upgraded to a LIVE events.subscribe projection (memory.identity_rejected + memory.health_changed + memory.recovery_dropped), honest-empty when quiet.

Live-wire verification (PAGE-POLISH §3). Verified against a live harbor dev runtime with real sqlite / rolling_summary memory (no fixture seeding — the validation agent persists turns across runs). memory.strategy_trace returned the REAL rolling summary (the actual compacted conversation), 4 verbatim turns, 643 estimated tokens, health healthy. memory.put appended a turn and returned its resolvable key; memory.get round-tripped it (base64 value decoded to the JSON). memory.delete evicted by key (remaining_turns decremented); an unknown key → not_found; an empty key → 400; a no-bearer request → 401. Browser-truth at 1512×945: the carded list + the stacked rail (health + the live strategy summary + live events), a real UI evict (selected-item "Evict turn" → memory.delete → the table dropped 3→2 + an honest "Evicted — 2 turn(s) remain" result line), and the selected-item value viewer rendering DECODED pretty JSON (not base64). Zero browser console errors; scrollHeight == innerHeight; no horizontal overflow.

Supersedes. The Phase 73j / D-118 read-only Memory layout (the PageHeader-led list + the __HARBOR_PROTOCOL_CLIENT__ global-injection seam + the disabled-with-tooltip bulk bar + the deferred event-feed placeholder + the base64-rendering value viewer). The data-read layer of D-118 (the memory.{list,get,health} wire types + the Console-local D-061 saved filters) is unchanged; the closed-set IsMemoryMethod predicate + the methods count grow from three to six.

Tests. Go: internal/memory/protocol/mutate_test.go (StrategyTrace projection; Put appends + returns a resolvable key + emits the audit event; Delete evicts by key + emits the audit event + the Record.Summary lossless round-trip; not-found; identity-required) + internal/protocol/transports/stream/memory_handler_test.go (the 3 routes through the real handler over a real MemoryStore — 200 happy paths, 401 no-bearer, 403 non-admin mutation, 400 empty-key — under -race); the single-source / conformance count assertions bump 76→79. Console: web/console/src/lib/memory/tests/derive.test.ts (the base64 decodeMemoryValue incl. a multibyte UTF-8 value, the PascalCase summarizeMemoryEvent / projectMemoryEvents, the Go-zero-time honest rendering); web/console/tests/memory-page.spec.ts rewritten for the carded structure + the live event feed + the admin-gated mutation surface. scripts/smoke/phase-108n.sh (live-server) exercises the 3 new methods + the static Console guard. npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Supersedes the layout + read-only half of D-118 (Phase 73j Memory). Preserves D-001 (identity-mandatory — the new methods validate the triple, fail closed), D-026 (heavy-value-by-reference — memory.get unchanged), D-033 (memory.identity_rejected — the live feed surfaces it), D-035 (memory.recovery_dropped), D-061 (Console DB local-only — saved views), D-065 (no priority/pin dimension — no fabricated bulk buttons), D-066 / D-079 (admin scope gates the mutations), D-118 (the read surface), D-121 (CONVENTIONS.md), D-171 (session header default), D-180 (derive display-state live), D-184 (the primitive+consumer-in-one-wave pattern this follows). Builds on Phase 23–25 (MemoryStore), Phase 73j (the read surface it rebuilds), Phase 73g (events), Phase 105 (disconnected redirect), Phase 108b / 108k / 108m (the carded master-detail pattern + controller refactor). RFC §6.6, §7; brief 11 §"Memory view"; brief 12 §"The two-surface model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108n-console-memory-page.md.


D-187 — Console Artifacts page rethemed to the carded, viewport-locked master-detail composition + the king file refactored to a controller, AND the real admin-gated artifacts.delete mutation landed (supersedes the Phase 73l / D-120 read-only layout + the deferred delete surface)

Decision (Phase 108o). Bring the Console Artifacts page to the Phase 108 page-polish bar AND land the admin mutation it had been deferring — the primitive + its consumer in one wave (CLAUDE.md §13, the D-184/D-186 pattern). The new artifacts.delete method routes through the existing ArtifactsSurface control dispatcher and composes the ALREADY-SHIPPED ArtifactStore.Delete (Phases 17–19) — NO driver-seam change, so all V1 drivers (inmem / fs / sqlite / postgres / s3) back it with zero per-driver work and no conformance-parity churn. Console half: a carded .panel.card viewport-locked master-detail rebuild (filter card + records table-left + a stacked right rail of preview / actions / metadata / tags), the ~916-line king file refactored into an ArtifactsPageState controller + a pure unit-tested derive.ts + the focused ArtifactsTable component, the per-page PageHeader dropped, and the page's prominent Delete affordances (the row action, the bulk bar, and the rail) turned from disabled placeholders into the REAL admin mutation.

Honest gating + scope (§13 / PAGE-POLISH §1). artifacts.delete gates STRICTLY on the verified admin scope claim (D-079 / page-artifacts §9 — Delete is a mutating verb, strictly more than the read scope; unlike the cross-tenant READ gate, a mutation does NOT admit console:fleet, which is an observation claim). It is identity-mandatory (full triple), idempotent (deleting an absent id returns deleted=false with no error — matching the store contract, never a fabricated CodeNotFound), and emits an artifacts.deleted audit event ONLY on an actual eviction (SafePayload — the content-addressed artifact id only, never any bytes — D-026). The Console gates the row/bulk/rail Delete disabled-with-tooltip without the admin claim. artifacts.usages (the "Where used" cross-reference) and the Set retention bulk action stay deferred: usages needs a state-store join that is not a cheap pure-consumer read, and retention is the immutable-V1 carve-out (§10) — honest findings, not stubs. The preview pane is unchanged and still dispatches through the canonical renderer registry ($lib/chat/renderers, brief 12) — no bespoke per-mime renderer.

Heavy bytes by reference (D-026), preserved. The catalog rows stay metadata-only; preview + download route through the artifacts.get_ref presigned URL; the CSV export is metadata-only. artifacts.delete carries only the scope + the content-addressed id — no bytes cross the wire on the mutation either.

Live-wire verification (PAGE-POLISH §3). Verified against a live harbor dev runtime booted with HARBOR_DEV_SEED_FIXTURES=1 (the seeder writes three text artifacts — research-notes.txt / triage-report.txt / thread-summary.txt — under the dev triple; the inmem artifact store is otherwise empty on a fresh boot). The catalog rendered the three seeded rows; selecting one resolved a preview through the ArtifactPreview registry path (the inmem driver does not presign, so the preview legitimately resolves the presign-unsupported branch — proving the registry dispatch is wired); an artifacts.put upload appeared in the catalog; and a real admin artifacts.delete evicted a row (the table count dropped + an honest result line). artifacts.delete of an unknown id returned {deleted:false}; a no-bearer request → 401. Browser-truth at 1512×945: zero browser console errors; scrollHeight == innerHeight; no horizontal overflow. The validation sqlite state was backed up before seeding and restored afterward (the seed writes memory turns to it; the artifact store is ephemeral inmem).

Supersedes. The Phase 73l / D-120 read-only Artifacts layout (the PageHeader-led list + the globalThis-injection seam removed in W6 + the disabled-with-tooltip Delete / Set-retention bulk placeholders + the row Delete deferred stub). The data-read layer of D-120 (the artifacts.{list,put,get_ref} wire types + the surface + the Console-local D-061 saved filters) is unchanged; the closed-set IsArtifactsMethod predicate + the methods count grow from three to four.

Tests. Go: internal/protocol/artifacts_delete_test.go (admin evicts + emits the artifacts.deleted audit event + the store no longer holds the id; non-admin → scope_mismatch; idempotent on an absent id; missing-identity / empty-id failure modes; under -race); the single-source / conformance count assertions bump 79→80. Console: web/console/src/lib/artifacts/tests/derive.test.ts (fmtSize / sourceKind / displayStatus / the Go-zero-time relative label / previewFamily); web/console/tests/artifacts-page.spec.ts updated from the PageHeader + disabled-Delete assertions to the carded structure + the admin-gated Delete contract. scripts/smoke/phase-108o.sh (live-server) exercises artifacts.delete + the static Console guard. npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Supersedes the layout + read-only half of D-120 (Phase 73l Artifacts). Preserves D-001 (identity-mandatory — the delete validates the triple, fails closed), D-021 (multimodality — upload V1 input unchanged), D-022 / D-026 (ArtifactRef canonical + heavy bytes by reference — the delete carries only the id), D-061 (Console DB local-only — saved views + CSV export), D-065 (no priority dimension), D-066 / D-079 (admin scope gates the mutation), D-120 (the read surface), D-121 (CONVENTIONS.md), D-171 (session header default), D-180 (derive display-state live), D-184/D-186 (the primitive+consumer-in-one-wave pattern this follows). Builds on Phase 17–19 (ArtifactStore), Phase 73l (the read surface it rebuilds), Phase 105 (disconnected redirect), Phase 108b / 108k / 108n (the carded master-detail pattern + controller refactor). RFC §6.10, §7; brief 11 §"Artifacts view" / §PG-4; brief 12 §"The shared chat / playground library"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108o-console-artifacts-page.md.


D-188 — Console Flows page rethemed to the carded, viewport-locked composition + the king files refactored to controllers; the detail route is deliberately retained against the rail-as-detail pattern; the empty-state copy is corrected to the flows-as-tools truth (supersedes the Phase 73i / D-117 pre-chrome layout)

Decision (Phase 108p). Bring the Console Flows page to the Phase 108 page-polish bar (PAGE-POLISH-PROCEDURE.md). This is a Console-ONLY pass — the six flows.* Protocol methods (flows.list / flows.describe / flows.runs.list / flows.runs.describe / flows.run / flows.metrics) already shipped in Phase 73i, so the page is already fully real-wired; 108p closes the foundation gap. Both routes — the /flows catalog and the /flows/[flow_id] detail — adopt the carded .panel.card, viewport-locked composition (a non-scrolling filter card / action row; only the table / graph / rail scroll internally — PAGE-POLISH §6); the per-page PageHeader is dropped on both (the breadcrumb / ⌘K / footer are app-shell chrome, 108b); and the king files are refactored into FlowsListState ($lib/flows/state.svelte.ts) + FlowDetailState ($lib/flows/detail.svelte.ts) controllers + a pure unit-tested $lib/flows/derive.ts (re-exports the shipped format.ts projections, adds toPageError / displayStatus / successKind / health) + a focused FlowsTable component — the Tools-108k / Memory-108n / MCP-108m / Artifacts-108o pattern. No new Protocol method; no Go change.

Deviation 1 — the /flows/[flow_id] detail route is RETAINED, against the 108m/n/o rail-as-detail pattern. The other Phase 108 pages collapse detail into a right rail; Flows does not. The flow detail surface is a read-only engine-graph DAG canvas (EngineGraphCanvas, the Live-Runtime-topology renderer family) plus a run-history table and a per-run summary — it needs full width and does not fit a --size-rail rail. The mock (page-flows.md §4) itself specifies a distinct Detail Mode with a full-bleed graph; the rail pattern is the wrong shape here. Both routes still adopt the carded + viewport-locked shell, so the divergence is in topology only, not in the design vocabulary. The list-mode interaction stays mock-faithful (page-flows.md §6): a row-click opens the detail route; a per-row Metrics affordance loads flows.metrics into the list rail; Run flow ▶ opens the inline runner — the artifacts-style row-click-selects model was considered and rejected because it contradicts the mock.

Deviation 2 — the empty-state copy is CORRECTED, departing from the mock (§17.6 fix-what-you-find). The pre-chrome catalog empty-state (and page-flows.md §7) read "No flows registered — flows are defined in agents whose planner is Graph / Workflow / Deterministic." That is factually wrong. A flow is a composable engine-graph DAG (internal/runtime/flow) that is registered as a tool (flow.RegisterAsTool, Transport: TransportFlow — D-023) and is invocable DIRECTLY via flows.run with no planner in the loop (the run Trigger is one of user / planner / system). PlannerFamily ("graph" / "workflow" / "deterministic") is CATALOG METADATA, never a runtime gate, and the flow engine is a general primitive (a future runtime could embed it). Rendering the planner-family claim would teach the operator a false mental model. The copy is rewritten to the flows-as-tools, future-open truth; page-flows.md §7 is reconciled in the same PR. The mock prose is the lower authority (PAGE-POLISH §0 authority chain); the runtime truth wins.

Honest scope, preserved. Run flow stays the page's ONLY mutating action, scope-gated on auth.ScopeAdmin (D-079) and degrading to disabled-with-tooltip, never vanishing (D-066) — the runtime is the authoritative gate, the UI gate is advisory. The graph canvas is view-only (D-063): Add node / Delete edge / Save graph / New flow do not render (absent, not disabled). Saved-view chips + the snapshot / compare-versions affordances are Console-local (D-061) — never a shadow source of truth for flow entities. Heavy run outputs surface by reference (D-026) — RunSummaryPanel renders an Open artifact link, never inline bytes.

Live-wire verification (PAGE-POLISH §3). Verified two ways against a live harbor dev runtime. (1) The EMPTY path against the real validation agent (no registered flows): flows.list with a bearer returned {"flows":[],...} and the catalog rendered the CORRECTED flows-as-tools empty-state copy (not the old planner-family wording); no-bearer → 401 (fail closed). (2) The LOADED path against an ISOLATED seeded runtime (HARBOR_DEV_SEED_FIXTURES=1 on a throwaway /tmp/harbor-flowseed data dir + --no-hot-reload, so the validation agent's durable sqlite was never touched — confirmed byte-identical to a pre-run backup afterward; the seeder logged flows=2). Every datum traced end-to-end: the catalog rendered the two seeded flows ← flows.list; the per-row Metrics affordance loaded the rail (2 runs, 24 sparkline buckets) ← flows.metrics; a row-click opened the detail route rendering the read-only graph (1 node), the Healthy health pill, the real Source: cmd/harbor/devseed.go, and the 2-row run history ← flows.describe + flows.runs.list; selecting a run loaded the per-node timeline + honest "This run produced no output." ← flows.runs.describe; the admin token enabled Run this flow ▶ with the correct tooltip. All four PageState branches seen: Loaded, Empty (corrected copy), Loading (skeleton), and disconnected (clearing the connection redirected /flows/settings, Phase 105). Browser-truth at 1512×945 on BOTH routes: scrollHeight == innerHeight (945 == 945), no horizontal overflow, and zero console errors on a clean hard-reload with a valid token (a direct-URL load of the detail route re-rendered the graph + runs from the runtime — hydration confirmed).

Supersedes. The Phase 73i / D-117 pre-chrome Flows layout (the PageHeader-led list + detail, the inline page state, the format.ts-only projection split, and the inaccurate planner-family empty-state). The data layer of D-117 (the flows.* wire types + the surface + the Console-local D-061 saved filters) is unchanged.

Tests. Console: web/console/src/lib/flows/tests/derive.test.ts (displayStatus ready/empty derived live — D-180; toPageError ProtocolError vs unknown; successKind / health thresholds; the re-exported format.ts projections). web/console/tests/flows-page.spec.ts updated for the carded structure (testids preserved). scripts/smoke/phase-108p.sh carries the static Console guard (PageHeader gone, carded vocabulary, the controllers + derive.ts + FlowsTable exist, no hand-rolled fetch, the corrected empty-state copy) + an optional live flows.list route probe (404/405/501 → SKIP). npm run check 0/0, npm run lint clean, npm run test green.

Cross-references. Supersedes the layout half of D-117 (Phase 73i Flows). Preserves D-001 (identity-mandatory — the consumed flows.* surface validates the triple), D-023 (Flow-as-Tool — the corrected empty-state names it), D-026 (heavy outputs by reference), D-061 (Console DB local-only — saved views + snapshot/compare), D-063 (view-only — no authoring affordances), D-065 (no priority dimension), D-066 / D-079 (admin scope gates flows.run), D-117 (the data surface), D-121 (CONVENTIONS.md), D-171 (session header default), D-180 (derive display-state live), D-184/D-186/D-187 (the controller / derive.ts king-file refactor pattern). Builds on Phase 73i (the flows.* surface it rebuilds), Phase 75a (the flow fixtures), Phase 105 (disconnected redirect), Phase 108b / 108k / 108m / 108n / 108o (the carded composition + controller refactor). RFC §6.1, §6.2, §7; brief 11 §"Flows view"; brief 12 §"The two-surface model"; CONVENTIONS.md (D-121) + PAGE-POLISH-PROCEDURE.md. Plan: docs/plans/phase-108p-console-flows-page.md.


D-189 — Multimodal attachment handling is split into disposition-policy (84b) + provider-native mechanism (84c) + embedding client & semantic retrieval (84d); disposition is policy, not mechanism; embeddings target semantic memory / skill retrieval

Decision (planning). The superseded phase-84b-bifrost-multimodal-v13.md ("Bifrost extended multimodal") conflated three separable concerns and, in doing so, forced behavior: it hardcoded a per-MIME disposition map in internal/planner/multimodal.go::materializeOne and auto-routed attachments (e.g. a PDF) to provider-native understanding, with no way for a developer — least of all a Protocol or third-party client — to say "don't send this to the provider; I'll process it myself with a tool / retrieval." The work is split into three phases:

  • 84b — Multimodal attachment disposition policy. The mechanism becomes declared policy. An AttachmentDisposition enum (ref / inline / provider_native / tool:<name>) is resolved per-attachment caller hint (carried by the Protocol input-artifact disposition field, or set directly on InputArtifactView by a headless library consumer) > per-agent policy map (carried by harbor.yaml, or constructed programmatically as planner.DispositionPolicy) > runtime default. The policy core — the enum, the policy type, and the pure precedence resolver — lives in internal/planner; the dev run loop (and its devstack mirror) is a thin caller, never the home of the precedence logic. The default is ref — byte-for-byte the behaviour shipped today (the ArtifactStub + Fetch.Tool hint the planner already drives via native tool-calling, 107c). The materializer becomes a policy consumer, not the policy author. 84b ships no provider mechanism and no embeddings.
  • 84c — Provider-native multimodal mechanism. Implements the provider_native disposition (opt-in via 84b, never the default): the bifrost driver uploads an over-threshold attachment via Bifrost.FileUploadRequest (already on core@v1.5.15) and rewrites the content part to a file_id reference, performed inside Complete so LLMClient stays one method (RFC §6.5). The driver is the only seam — the run loop never pre-uploads, the part-level ProviderNative flag is settable by any CompleteRequest builder (headless consumers included), and the file_id cache + lifecycle (TTL/evict + Close-time cleanup) are driver-owned with identity read from ctx; observability is the llm.provider_file.uploaded event, not a task field. Priority order is deliberate — image / audio / video first (the perception modalities; regain capability the stub path loses), PDF / documents last (the ref/tool + 84d path is the preferred document route). ArtifactStub stays the universal degradation. Also completes the streaming-with-multimodal residual Phase 107's row forward-referenced (107 shipped text streaming; 84c proves multimodal inputs combine with it, in 107's req.Stream + llm.completion.chunk vocabulary — NOT the non-existent CompleteStreaming the old plan named).
  • 84d — Embedding client + semantic retrieval. Adds Harbor's first embeddings capability — an Embedder §4.4 seam wired to bifrost's EmbeddingRequest — required for the "process it myself" path (a developer keeping a doc as a ref and retrieving over it needs embeddings). Per the §13 primitive-with-consumer rule, its in-wave consumers are semantic memory retrieval and semantic skill retrieval — the direction set by the project owner, NOT a standalone RAG tool. Both are opt-in modes composing with (not replacing) rolling_summary memory and token-savvy skill retrieval. The Embedder is a standalone, factory-constructible primitive usable à la carte (memory/skills are its first consumers, not its gatekeepers); injection into both consumers is via explicit Deps with fail-loud guards (mirroring memory.Deps.Summarizer), so the modes are constructible in Go with no config file, and identity is mandatory at the Embed edge. Requires a §6.5 RFC addendum (the Embedder seam) landed in the same PR.

Why policy, not mechanism (the load-bearing principle). Disposition is a choice that belongs to the developer / operator / planner, not a fact the runtime hardcodes. Harbor already has the "process it myself" seam — every ArtifactStub can carry a Fetch.Tool hint (internal/llm/llm.go::StubFetch, populated at multimodal.go:176), and with native tool-calling (107c) the planner elects the tool path turn-by-turn. The bug in the old plan was preempting that seam by auto-uploading. Making provider_native an opt-in disposition behind a policy (default ref) preserves the developer's control across Playground, Protocol, and third-party clients, and is strictly less code than the auto-routing it replaces.

RFC alignment. Adding optional ProviderFileID / DocumentType fields preserves §6.5's "exactly one of URL/DataURL/Artifact" invariant (additive, backward-compatible). The Embedder is a separate interface, not a method on the one-method chat LLMClient — bifrost itself separates Embedding / Speech / Transcription, and §6.5's "one method" rule is about not bolting tools= onto chat, not about other capabilities; the §6.5 addendum sanctions the seam. The ErrContextLeak LLM-edge guard (D-026) gains a precise exemption: a file_id-only part (no inline bytes) is legal over-threshold.

Numbering. This entry (D-189) records the split + the principles. D-190 is reserved for Phase 84c (provider-native mechanism) and D-191 for Phase 84d (Embedder + semantic retrieval); each is logged in full when its phase ships.

Cross-references. Supersedes the scope of the old phase-84b-bifrost-multimodal-v13.md (renamed to phase-84b-multimodal-disposition-policy.md; provider-native content moved to 84c). Builds on D-026 (context-window safety net), D-166 (F11 multimodal happy path), D-167 (native tool-calling, 107c). Preserves D-001 (identity-mandatory — disposition + file_id cache + embeddings are all identity-scoped). RFC §6.4, §6.5, §6.6, §6.7, §6.10, §11 Q-3; brief 03 (tools-and-llm), brief 04 (memory-and-skills), brief 08 (llm-client-validation), brief 11 (console-feature-surface). Plans: docs/plans/phase-84b-multimodal-disposition-policy.md, phase-84c-provider-native-multimodal.md, phase-84d-embedder-semantic-retrieval.md.


D-192 — RunLoop dispatches decision execution on a per-step goroutine and drains APPROVE/REJECT mid-step, closing the approval-gated CallTool deadlock

Date: 2026-06-09 Status: Settled (shipping with this PR)

Where it lives: internal/runtime/steering/dispatch.go (the new RunLoop.dispatchDecision — the per-step dispatch goroutine + mid-step inbox drain); internal/runtime/steering/apply.go::applier.routeApprovalControl (the mid-step entry into the D-097 bridge); internal/runtime/steering/runloop.go (the default: decision-execution case now calls dispatchDecision; the new carryEvents per-run local merges mid-step-deferred controls ahead of the next boundary's drain); internal/runtime/steering/dispatch_test.go (mid-step approve / reject / defer-once / cancel unit tests against the REAL gate + Coordinator); test/integration/approval_midstep_test.go (the end-to-end choreography: real deterministic planner → real catalog + Phase 64a builder WrapWithApproval → real approval.ApprovalGate → real pauseresume.New Coordinator → real in-mem bus → steering.RunLoop, with APPROVE / REJECT enqueued on the steering Inbox — the Protocol edge's path).

The bug (verified by the SDK-friction audit, every line confirmed). Approval-gated CallTool deadlocked the run loop. The chain: the run-loop goroutine called spec.ToolExecutor.ExecuteDecision SYNCHRONOUSLY (runloop.go, the default: case) → the executor invoked the descriptor inline (cmd_dev_executor.go::callTool) → the approval wrapper called gate.RunGuarded inside Invoke (catalog.go::WrapWithApproval) → RunGuarded blocked on the per-pause resolve channel until ResolveApproval or ctx cancel (gate.go) → but the steering Inbox was drained ONLY at the step boundary (runloop.go), so an APPROVE / REJECT control enqueued on the inbox could never reach the D-097 bridge (apply.go::advancePauserouteThroughGate) while the step was blocked. A planner-dispatched gated tool hung the run until ctx cancellation — in EVERY deployment shape (headless and harbor dev + Console; the Protocol edge's approve method enqueues onto the same inbox). HITL approval is one of the four canonical reasons the unified pause/resume primitive exists (RFC §6.3; pauseresume.go reason set), yet no working choreography existed anywhere in the tree for the canonical shape. The D-097 entry's "common shape (planner idle, gate's pause is independent)" assumption was only ever satisfied by test choreography that invoked the gated tool on a separate test goroutine.

Decision. RunLoop.Run's decision-execution case dispatches ExecuteDecision on a per-step goroutine (under a stepCtx derived from the run ctx) and, while the execution is in flight, keeps draining the steering inbox — routing ONLY the approval-bridge-eligible controls mid-step: an APPROVE / REJECT whose wire payload carries a gate-minted token that one of the configured gates owns (exactly what the D-097 bridge handles). The bridge implementation is NOT duplicated: the mid-step path enters through the new applier.routeApprovalControl, a thin eligibility pre-check (type + gates + wire-token extraction) in front of the SAME routeThroughGate the step-boundary advancePause uses (scope elevation, gate iteration, ResolveApproval — one implementation, two entry points).

ALL other controls (PAUSE / RESUME / CANCEL / REDIRECT / INJECT_CONTEXT / USER_MESSAGE / PRIORITIZE — and any APPROVE / REJECT no gate owns) keep their existing step-boundary semantics: drained mid-step, they are deferred verbatim into a per-run carryEvents local and merged AHEAD of the next boundary's fresh drain (FIFO preserved), where they get the full applyEvent treatment exactly once — including their control.received / control.applied lifecycle emits and control-history record (emitted at apply time, never duplicated). This is identical to when they would have acted under the synchronous dispatch: the step was in flight; the boundary was the first point they could ever apply. A control consumed mid-step is consumed — it records the same lifecycle + history footprint the boundary path produces and is NOT re-applied (a re-apply would fail loud with ErrNoOutstandingPause).

Invariants preserved:

  • Join-before-return. The per-step goroutine is joined on every path — happy, run-ctx-cancelled, retired-inbox, and mid-step-bridge-error (where stepCtx is cancelled first so a parked RunGuarded waiter unblocks before the join). The per-iteration WaitForEvent waiter goroutine is likewise always joined (1-buffered result channel, received on every path). No goroutine outlives the step; the goroutine-baseline tests stay green.
  • Cancellation semantics unchanged. The execution runs under a child of the run ctx; cancelling the run still aborts an in-flight gated decision (RunGuarded honours ctx) and the next step boundary surfaces ctx.Err() exactly as the pre-D-192 synchronous path did.
  • D-025. Nothing lands on the RunLoop struct: carryEvents, the done / waitRes channels, and stepCtx all live on the run's own goroutine stack. The RunLoop stays a compiled artifact; the existing N=120 concurrent-reuse test passes unchanged.
  • Fail-loud. A substantive mid-step gate error (scope mismatch, gate closed, coordinator error) records the failure in the history + lifecycle events and fails the run — the same posture as a step-boundary apply failure. No silent degradation.

Alternative considered — an exported ApprovalResolver component subscribed per-run (rejected). A standalone resolver (subscribing to the bus or owning a side-channel the Protocol edge dispatches APPROVE / REJECT into, calling ResolveApproval off-loop) would also unblock the gate. Rejected because: (a) it is D-097's already-rejected shape 3 (ApprovalDispatcher) re-litigated — it duplicates the inbox-drain and creates a SECOND consumption path for the same control vocabulary, violating §13's "two parallel implementations of the same conceptual feature"; (b) it bifurcates ordering and audit: controls would race the boundary drain, the control-history record, and the scope checks the inbox already performed at Enqueue; (c) it widens the public surface (a new exported component every embedder must wire) where the mid-step drain fixes the one goroutine that already owns the inbox, the gates map, and the history — the smallest semantic delta. No hard blocker for the mid-step drain was found.

Tests. Unit (in-package, real gate + real Coordinator + real in-mem bus via the bridge fixtures): TestRun_MidStepApprove_UnblocksGatedDecision (the canonical resume + consumed-once history assert), TestRun_MidStepReject_SurfacesRejectionObservation (tool body never runs; error-shaped observation), TestRun_MidStepDrain_DefersNonApprovalControls_AppliedOnceAtNextBoundary (deferred controls applied exactly once — no drop, no double-apply), TestRun_CancelWhileGatedMidStep_AbortsCleanly (ctx cancel aborts the parked dispatch; goroutine baseline restored). Integration (test/integration/approval_midstep_test.go, §17.3 real-drivers-on-every-seam): TestE2E_ApprovalGatedCallTool_ApproveUnblocksRun (real deterministic planner + real catalog/builder/gate/coordinator/bus; identity propagation asserted through every layer; original args round-trip), TestE2E_ApprovalGatedCallTool_RejectSurfacesRejection (the failure mode; tool.rejected on the bus), TestE2E_ApprovalGatedCallTool_CancelWhileGated_AbortsCleanly (no orphaned goroutine). The one stand-in is the test-local ToolExecutor shim: the production executor is unexported in cmd/harbor (package main) — a known audit finding (Pattern 1 / P1); the shim mirrors devToolExecutor.callTool's resolve-then-Invoke shape so the REAL approval wrapper + gate sit in the invocation path, and the test carries a top-of-file note naming the production gap per §17.6. Promoting the executor into an importable runtime package is a separate wave.

Cross-references. Fixes the deadlock latent in D-097 (the bridge itself is unchanged — this entry gives it its first reachable production trigger for the planner-dispatched shape); preserves D-090 (AppliedGates is still the one gates map), D-096 (typed resume Decision), D-098 (the per-task driver's FSM bridge is untouched), D-025 (concurrent-reuse contract), D-067 / D-070 (the pause/resume + steering primitives), brief 02 §6 (step-boundary semantics retained for all non-bridge controls). RFC §6.3 (steering + the unified pause primitive), §6.4 (approval gates). CLAUDE.md §5 (concurrency), §11 (testing), §13 (fail-loud, no parallel implementations), §17.3/§17.6.


D-193 — The SDK re-homing program: Harbor must hold as a headless Go SDK; production semantics move out of cmd/harbor; half-shipped primitives get first consumers or formal deferral

Date: 2026-06-09 Status: Settled (planning — the program structure; each phase logs its own entry when it ships)

Where it lives: docs/notes/sdk-friction-audit.md (the §17.5-pattern ad-hoc audit this program answers: 10 seam investigators + adversarial verification, 62 findings, 0 refuted); docs/plans/phase-110{a,b,c,d}-*.md (Wave B — re-homing) and docs/plans/phase-111{a..f}-*.md (Wave C — finish the primitives); docs/plans/README.md 110-band + 111-band sections.

Decision. RFC §1's "ships as a Go module" is a product property, not an aspiration: a Go consumer who embeds the Runtime headless — never running harbor dev, never serving the Protocol, never opening the Console — must reach every runtime capability through exported, constructible seams. The audit found the capability layer SDK-clean but a stratum of production semantics living only in package main (with an already-diverged D-094 devstack mirror) and a band of primitives whose only consumers are tests. The program:

  • Wave A (shipped as PR #277 / D-192 + PR #278): the correctness bugs + honesty fixes — the approval-gated CallTool deadlock, the unwired session-GC RunningProbe, devstack parity drifts, fail-loud on dead config knobs, lying godocs.
  • Wave B (Phases 110a–110d; D-194–D-197 reserved): mechanical promotion of cmd-only production semantics into reusable internal/ packages — the ToolExecutor (110a), the RunContext population + event closures (110b), the config→snapshot projections + defaults + the blank-import aggregator (110c), the config→stack assembly fan-out + MCP/OAuth attach helpers (110d). Each promotion's §13 consumer is cmd/harbor AND harbortest/devstack converting to thin callers in the same phase, collapsing the D-094 hand-maintained mirror. Staging: 110a∥110c, then 110b∥110d.
  • Wave C (Phases 111a–111f; D-198–D-203 reserved): every shipped-but-consumerless primitive gets its first production consumer or a recorded deferral — governance enforcement (111a), tool-OAuth completion (111b), durable pauses + pause lifecycle (111c), the skills canonical surface + ingestion verb (111d), trajectory compression (111e), telemetry assembly + the approval-gate authorizer seam (111f). Mutually independent; parallelize after Wave B Stage 1.
  • Wave D (not yet planned): the external-module facade — promoting the verified inventory (identity, events, tools, llm, stores, planner/tasks/steering, assembly) out of internal/ so external teams can import what the scaffold templates and recipes already pretend is public. This is an RFC-level decision (the Phase 71 harbortest/ precedent) and is deliberately gated on Wave B (you cannot facade what lives in a binary); it is NOT covered by the 110/111 plans.

The two principles the program enforces. (1) Re-homing over mirroring: logic needed by more than one assembly (cmd, devstack, a consumer's loop) lives in an exported internal/ package; D-094-style verbatim mirrors are a deprecated pattern — every Wave B phase deletes one. (2) The §13 primitive-with-consumer rule is re-read against the current tree: a primitive whose only consumer is a test is in violation TODAY regardless of what wave shipped it; Wave C is the repayment schedule, and the honest-godoc rule (a seam's docs state its real consumption status) guards against recurrence.

Direction rule (recorded for future phases). Runtime packages may import internal/protocol/types (pure-data projection vocabulary — the Protocol is the canonical contract) but never protocol auth / methods / transports (behavior). The one standing violation (internal/tools/approval importing internal/protocol/auth) is repaid by 111f.

Numbering. D-194–D-197 reserved for 110a–110d; D-198–D-203 reserved for 111a–111f; each logged in full when its phase ships.

Cross-references. Builds on D-189 (the 84-band split — the SDK-consumer lens's first application), D-192 (the deadlock fix), D-149/Phase 83f (the seam-without-wiring class the audit generalized), D-155 (config-projection drift), D-094 (the devstack mirror this program collapses), D-044 (governance latent default — preserved by 111a), D-025/D-026 (the contracts the promoted code carries with it). CLAUDE.md §13 (primitive-with-consumer, two-implementations, silent degradation), §17.5 (the audit pattern), §17.6 (fix-both-sides), §17.7 (the wave cadence the staging follows). RFC §1, §6.3, §6.4, §6.5, §6.15. Plans: docs/plans/phase-110a…110d, phase-111a…111f; findings: docs/notes/sdk-friction-audit.md.


D-194 — The production ToolExecutor is promoted to internal/runtime/dispatch; the answer envelope + terminal error codes are exported in internal/planner; the catalog→planner view is tools.NewPlannerView; the devstack degraded executor is deleted

Date: 2026-06-09 Status: Settled (shipping with Phase 110a)

Where it lives: internal/runtime/dispatch/dispatch.go (the promoted executor — dispatch.NewToolExecutor(cat, store, taskReg, opts...) steering.ToolExecutor with WithHeavyThreshold / WithMaxSpawnDepth / WithLogger functional options, plus the exported dispatch.HeavyTruncationSummary shape contract); internal/planner/answer_envelope.go (planner.AnswerEnvelope, planner.TaskErrorCodeRunLoopError / planner.TaskErrorCodeCancelled, planner.TaskErrorCodeForFinish); internal/tools/planner_view.go (tools.PlannerView + tools.NewPlannerView); thin callers at cmd/harbor/cmd_dev.go / cmd_dev_runloop.go and harbortest/devstack/devstack.go; the converted D-192 E2E at test/integration/approval_midstep_test.go.

Decision. The only production steering.ToolExecutor lived unexported in package main (cmd/harbor/cmd_dev_executor.go, ~660 lines) — the SDK friction audit's Pattern 1 / P1 finding (D-193 Wave B). Phase 110a promotes it verbatim into internal/runtime/dispatch behind an exported constructor; the int constructor parameters (heavyThreshold, maxSpawnDepth) become functional options with the SAME normalization (non-positive → 32 KiB floor / depth 4), so config-value passthrough behaviour is unchanged. Three companion exports close the adjacent audit findings in the same stroke:

  • planner.AnswerEnvelope + terminal error codes (P3). The Phase-106 {answer, finish_reason, tool_calls_seen} shape and the Finish.ReasonTaskError.Code mapping were an implicit cmd↔cmd wire contract (one cmd file marshalled what another parsed). Now one named type + constants, homed in internal/planner on import-direction grounds: the envelope is the projection of planner.Finish, and homing it in internal/tasks would force a new tasks→planner edge (tasks is planner-free). A golden test pins the encoding byte-for-byte against the Phase-106 map-literal shape. The dispatch executor's taskOutcomeObservation keeps its parse generic (json.Unmarshal into any) — a TaskResult.Value is not guaranteed to be an envelope, and a typed parse would silently drop unknown fields (§13) — with the godoc naming planner.AnswerEnvelope as the documented shape and a round-trip test consuming the typed struct.
  • tools.NewPlannerView (P5). The catalog→planner view adapter moves to internal/tools as an exported concrete satisfying planner.ToolCatalogView STRUCTURALLY — internal/tools cannot name the interface because internal/planner imports internal/tools; the compile-time assertion lives in internal/planner's tests. The per-run, never-cached construction discipline and its cross-tenant warning move from a package-main comment into the exported godoc; the constructor copies GrantedScopes so callers cannot mutate a constructed view.
  • dispatch.HeavyTruncationSummary (the "savoring" finding). internal/planner/react/prompt.go pattern-matched the heavy-truncation map shape while citing cmd_dev_executor.go::heavyTruncationSummary as its source of truth — an internal/ package documenting package main as its contract. The shape builder is now the exported identifier the prompt renderer cites; the dependency arrow points the right way (the citation is comment-level — react cannot import internal/runtime/... per the planner import-graph lint, and dispatch imports react for the reserved tool names).

§13 consumers in the same phase. cmd/harbor AND harbortest/devstack convert to thin callers of the same constructors; cmd_dev_executor.go + cmd_dev_catalog_view.go are deleted; the devstack degraded devStackToolExecutor (CallTool-only, no D-026 promotion by its own admission — the brief-03 "two parallel modes" smell) and devStackCatalogView are DELETED, giving devstack D-026 heavy-result promotion, CallParallel, and SpawnTask/AwaitTask parity in one stroke. No second executor implementation survives anywhere (§13 two-implementations rule); the D-094 mirror shrinks.

§17.6 fix bundled. The D-192 HITL E2E shipped with a test-local executor shim and a top-of-file KNOWN PRODUCTION GAP note (the production executor was unreachable from outside package main). The E2E now drives the REAL promoted dispatch.NewToolExecutor (over a real inmem ArtifactStore + real inprocess TaskRegistry); the shim and the gap note are deleted.

Behaviour preservation (the promotion bar). Dispatch semantics, D-026 thresholds and provenance stamps (source: tool, created_at, and the verbatim-preserved producer: dev-tool-executor value pre-110a consumers may key on), preview heuristics, error strings, spawn-depth defaults, poll cadence, and the envelope's JSON byte-shape are unchanged — pinned by the moved parity tests plus the new golden/degradation tests. The executor's D-025 posture carries over: immutable after construction, per-run state on ctx/RunContext, with the mandatory N≥100 concurrent-reuse tests (CallTool + CallParallel + spawn/await) and a new parallel-cancel cross-talk test.

Cross-references. Implements Wave B item 1 of D-193 (the re-homing program); builds on D-152 (Phase 83i — the original executor), D-169 (107d CallParallel), D-170 (107e SpawnTask/AwaitTask), D-026 (heavy-content safety net), D-025 (concurrent reuse), D-176 (provenance discriminator), D-156 (granted-scopes filter), D-192 (the E2E this phase converts), D-094 (the mirror this phase shrinks). CLAUDE.md §13 (primitive-with-consumer, two-implementations), §17.6 (fix-both-sides), §4.3 (deviations: none — behaviour moved verbatim). RFC §6.2, §6.4, §6.5. Plan: docs/plans/phase-110a-tool-executor-promotion.md; findings: docs/notes/sdk-friction-audit.md §2 (P1, P3, P5).


D-195 — Phase 110b: RunContext population promoted to internal/runtime/runctx; per-run event closures become events.IdentityStampingEmitter + llm.NewChunkPublisher; devstack gains Emit/OnChunk + answer-envelope parity

Date: 2026-06-09 Status: Settled (shipping with Phase 110b)

Where it lives: internal/runtime/runctx/runctx.go (the five promoted helpers + golden parity tests in runctx_test.go); internal/events/emitter.go (IdentityStampingEmitter); internal/llm/chunk_publisher.go (NewChunkPublisher + the envelope-identity regression gate in chunk_publisher_test.go); thin callers at cmd/harbor/cmd_dev_runloop.go and harbortest/devstack/devstack.go (every duplicate deleted); test/integration/phase110b_runctx_parity_test.go (the devstack-parity E2E); scripts/smoke/phase-110b.sh.

Decision. The five RunContext-population helpers — the runtime's half of the planner contract per brief 02 ("the planner never imports runtime internals; everything it sees arrives through RunContext") — lived as unexported package main functions, hand-duplicated in devstack (the D-094 mirror tax; the SDK friction audit's verifier found a THIRD drifting copy of the keyword shaper). Phase 110b promotes them verbatim into the direction-safe package internal/runtime/runctx (runtime/* may import planner/memory/skills/artifacts; internal/planner gains NO new imports — its import list stays memory-free): ProjectMemoryBlocks (D-149 llm_context projection), ProjectSkillsContext (D-149), ExtractSkillKeywords + its stopword set and 10-term cap (the D-156 FTS5/BM25 query shaping), ExtractAssistantAnswer (D-152), and ResolveInputArtifacts (the D-166 identity-scoped GetRef + image-byte-inlining + ref-only-fallback policy, now a function over its explicit dependencies). The two per-run event-emission closures become ~20-line constructors on their owning packages: events.IdentityStampingEmitter(bus, q, logger) func(Event) (stamps the run quadruple on identity-less events, Warns loudly on publish failure) and llm.NewChunkPublisher(bus, q, taskID, logger) func(delta, done, kind string) (identity on the Event ENVELOPE — encoding the trap that once produced 280+ bus-rejected chunks per task; the kind parameter is string because planner imports llm, with a one-line planner.ChunkKind adapter at the call site). Settled calls:

1. §13 consumers in the same phase. cmd/harbor AND harbortest/devstack convert to thin callers; the five cmd-local helpers, the two cmd-local closures, and ALL devstack duplicates (devStackProjectMemoryBlocks, devStackProjectSkillsContext, devStackExtractSkillKeywords + stopwords, devStackExtractAssistantAnswer, the devstack resolveInputArtifacts method) are deleted — grep-asserted in the smoke. No second copy of any projection survives (§13 two-implementations rule); the D-094 mirror shrinks again.

2. Devstack parity closure (§17.6 — the mirror gains what it was MISSING, not just what it duplicated). The kit's RunSpec wired neither Emit nor OnChunk, so planner telemetry (planner.decision/planner.finish) and token streaming (llm.completion.chunk) were silently dead on the official test surface — devstack validated weaker semantics than production ships. Both are now wired via the SAME promoted constructors production calls. Its MarkComplete carried an empty tasks.TaskResult{} (the audit's "empty result" drift); it now marshals the 110a-exported planner.AnswerEnvelope byte-identically to production. The integration test pins all three: a devstack-run task produces decision + chunk events on the bus under the run's quadruple (N=10 concurrent runs, no cross-run bleed), the completed task's result parses as a non-empty envelope, and a bus closed mid-run produces loud Warns (never silent drops). Note (2026-06-09, Wave B checkpoint audit): one parity gap was recorded rather than closed — cmd registers per-run trajectories and wires the tasks-protocol Enricher (cmd_dev.go's devEnricher over runLoopDriver.TrajectoryByTaskID), while devstack discarded the trajectory and wired no enricher, so devstack tasks.get reads carried no trajectory enrichment. RESOLVED (2026-06-10, program follow-ups chore): DevStackRunLoopDriver gains the mutex-guarded per-task trajectory map + TrajectoryByTaskID accessor and devstack's tasks projector wires the mirrored devStackEnricher (harbortest/devstack/enricher.go) when a run-loop driver exists — parity, not promotion: the driver shell stays per-caller per D-197 call 4. Pinned by test/integration/devstack_trajectory_enrichment_test.go (a devstack tasks.get wire read carries the trajectory enrichment).

3. ExtractSkillKeywords promoted WITH a deletion notice (owner decision 2026-06-09, the amended 110b scope). The 111d Directory wiring (D-201) replaces the raw-Search injection path and deletes this helper + its call sites. It is promoted anyway — the mirror collapse must not wait on 111d and landing order is not guaranteed — but its godoc carries "scheduled for deletion by Phase 111d (D-201); add no new consumers", and the deletion rides 111d regardless of which phase lands first.

4. Publish-context bridge. The constructors' returned closures publish under context.Background() with a documented rationale: planner.RunContext.Emit/OnChunk are ctx-less by contract (the CLAUDE.md §5 "documented bridge across an unmanaged async boundary" case), bus drivers detect their own closure internally (ErrBusClosed), and the failure path is the loud Warn — behaviour against a closed bus is unchanged from the pre-110b d.subCtx shape (the inmem driver reads the ctx only for redaction). Correction (2026-06-09, Wave B checkpoint audit): the recorded justification is complete only for the inmem driver — the DURABLE driver drives store.Save with the publish ctx, so under events.driver: durable the old d.subCtx cancellation-on-Close semantics changed: persistence is now bounded only by the driver's closed.Load() check + bus Close, no longer by caller-ctx cancellation. Threading a real caller ctx through the emit constructors is a recorded Wave-C-candidate follow-up. Resolved (2026-06-10, D-207): events.IdentityStampingEmitterContext + llm.NewChunkPublisherContext carry a caller-supplied base ctx; both run-loop drivers (cmd + devstack) pass their driver-lifetime d.subCtx, restoring the pre-110b cancellation-on-Close semantics on the durable bus (whose Publish now also honours ctx.Err() up front, so the bound holds even over the ctx-blind inmem StateStore). The ctx-less constructors remain as the documented Background bridge for callers with no lifetime ctx.

5. The D-196 call-4 handoff one-liner rides this phase. internal/runtime/dispatch's spawn-depth clamp now references config.DefaultSpawnDepthCap (the duplicated defaultMaxSpawnDepth = 4 literal is deleted) — the single-line cross-region unification deferred from the Stage-1 merge because 110a/110c built in parallel worktrees.

Behaviour preservation (the promotion bar). Keyword shaping (stopwords, 1-char drop, dedupe-preserving-order, 10-term cap, all-stopwords→caller-falls-back-to-raw-query), memory/skills projection shapes, assistant-answer fallbacks, and the input-artifact policy (nil-store Warn, GetRef-miss skip, image-bytes ref-only fallback, identity-scoped reads) move verbatim — pinned by golden tables (including the keyword table moved verbatim from the cmd test). D-025: the five projections are pure functions; the two constructors allocate no shared mutable state (per-run closures over one concurrent-safe bus), gated by N≥128 concurrent-run stress tests under -race asserting no cross-run identity bleed and goroutine-baseline restoration.

Findings I'm departing from. None.

Protocol additions. None — pure Go re-homing; no new event types, no chunk-payload shape change, no wire changes.

Cross-references. Implements Wave B item 2 of D-193 (the re-homing program; Stage 2, parallel with 110d); builds on D-194 (110a — the exported planner.AnswerEnvelope devstack's parity consumes), D-196 (110c — the config.DefaultSpawnDepthCap single source this phase wires into dispatch), D-149 (83f population), D-152 (83i emit closure + answer extraction), D-156 (83m keyword shaping), D-166 (input-artifact policy), D-176 (the BuildArtifactManifest promotion precedent), D-094 (the mirror tax shrunk), D-025/D-026 (the contracts carried over), D-201 (the 111d deletion the keyword helper's godoc names). CLAUDE.md §13 (primitive-with-consumer, two-implementations), §17.6 (fix-both-sides — the parity closure), §6 (envelope identity). RFC §6.2, §6.5, §6.13. Plan: docs/plans/phase-110b-runcontext-population-promotion.md; findings: docs/notes/sdk-friction-audit.md §2 (P2, P4).


D-196 — Phase 110c: exported FromConfig projections + config.Defaults()/ValidateCore + the internal/drivers/prod aggregator close the config-duality seam

Date: 2026-06-09 Status: Settled (shipping with Phase 110c)

Where it lives: internal/llm/from_config.go (SnapshotFromConfig + the absorbed copy helpers + the reflection field-parity gates in from_config_test.go); internal/memory/from_config.go (SnapshotFromConfig); internal/skills/from_config.go (SnapshotFromConfig); internal/planner/from_config.go (ConfigFromOperator + HintsFromConfig + the B3-closing reflection parity test); internal/governance/from_config.go (ConfigFromOperator); internal/config/loader.go (defaults() → exported Defaults()); internal/config/validate.go (ValidateCore — the headless profile that skips only the validateIdentity JWT ceremony; Validate semantics unchanged); internal/config/config.go (SkillsContextMaxResolved() + exported DefaultSkillsContextMax / DefaultSpawnDepthCap — the deduped knob defaults); internal/drivers/prod/prod.go (the production blank-import aggregator); cmd/harbor/main.go + cmd/harbor/cmd_dev.go + cmd/harbor/cmd_dev_runloop.go (thin-caller conversion; every local duplicate helper deleted); harbortest/devstack/devstack.go (ALL mirror duplicates deleted; aggregator imported; cfg-driven skills open added; AssembleOpts overrides now fall back to the cfg-projected values); test/integration/phase110c_config_projections_test.go (the B3 wiring regression gate + wrapper-chain seating + fail-loud unknown driver + identity-isolation stress); scripts/smoke/phase-110c.sh; AGENTS.md/CLAUDE.md §3 + §4.4 + §13 (the aggregator named as the single sanctioned blank-import home); docs/glossary.md ("FromConfig projection", "production driver aggregator").

Decision. Five subsystems (llm, memory, skills, planner, governance) deliberately decouple from internal/config via snapshot/config types — but every config→snapshot projection was an unexported package main helper, hand-duplicated in devstack. That mechanism shipped D-155 (the snapshot dropped CustomProviders/NetworkDefaults/Corrections) and carried the audit's live B3 drift (devstack's planner projection dropped ExtraGuidance/ReasoningReplay/MaxToolExamplesPerTool/ParallelToolCalls despite its own "MUST track production field-for-field" comment). Phase 110c exports ONE projection per owning package — llm.SnapshotFromConfig(cfg, art), memory.SnapshotFromConfig(cfg), skills.SnapshotFromConfig(cfg), planner.ConfigFromOperator(cfg) + planner.HintsFromConfig(cfg), governance.ConfigFromOperator(cfg) — converts cmd + devstack to callers, deletes every duplicate, and pins each projection with a reflection field-parity test (every config field is projected or carries an explicit exclusion naming its real consumer; a new field without either fails the build). Settled calls:

1. Import direction: the subsystem imports internal/config additively; config stays a leaf. internal/config has zero internal imports (verified); many subsystems (state, artifacts, audit, tasks…) already take config.XConfig at Open. The snapshot decoupling is preserved because FromConfig is optional sugar on the side — Open(ctx, snapshot, deps) signatures are unchanged and snapshot-first construction remains the headless golden path.

2. The parity gate found a THIRD live field-drop, fixed in the same PR (§17.6). Both copyModelProfiles copies (cmd + devstack) silently dropped LLMModelProfileConfig.CostOverrides and .Corrections: an operator's per-model cost_overrides: / corrections: yaml validated cleanly and then did nothing (the Phase 34 corrections layer read a zero-valued CorrectionsProfile; the Phase 36a cost accumulator never saw the override table). llm.SnapshotFromConfig maps both; the sub-struct parity tests pin them. Three field-drop instances from one mechanism (D-155 shipped, B3 live, this one latent) is the full proof the mechanism — not the discipline — was the bug.

3. config.Defaults() exported; ValidateCore is subtractive-minimal. The loader-private defaults() meant a hand-built config got a different baseline than a YAML-loaded one. Defaults() is the one documented baseline; Load still calls it; security-relevant fields stay intentionally absent. ValidateCore() runs EVERY section validator except validateIdentity (the Protocol-server JWT ceremony a headless embedder never serves); anything ambiguous stays in core (fail-closed bias). Full Validate() order and semantics are byte-identical (shared runValidators walk). The documented headless recipe: config.Defaults() → set LLM.Provider/Model/APIKeyValidateCore()<pkg>.SnapshotFromConfig(...)Open(...).

4. Planner-adjacent knob defaults are single-sourced on internal/config. skills_context_max's zero→5 lived as TWO run-loop literals (cmd + devstack); now config.PlannerConfig.SkillsContextMaxResolved() + exported config.DefaultSkillsContextMax are the one source (the driver constructors keep a defensive clamp referencing the same constant). The spawn-depth default deduped to exported config.DefaultSpawnDepthCap (referenced by SpawnDepthCap()); the executor-side clamp (promoted to internal/runtime/dispatch by the parallel Phase 110a) references this constant — the coordinator wires that one-line reference at Stage 1 merge since 110a/110c built in parallel worktrees. planner.HintsFromConfig re-homes the YAML→PlanningHints projection out of the run loop.

5. internal/drivers/prod is the single sanctioned blank-import home (§4.4/§13 amended). The aggregator's doc-commented blank imports register everything main.go's ~30-line block registered (drivers + the corrections/downgrade/retry/governance LLM wrapper hooks + the notifications event-type registration). main.go and devstack both collapse to one import — closing the audit's §7 trap (devstack composed the LLM client without the full wrapper chain because its hand-curated list drifted; Wave A patched three wrappers in, 110c makes the class structurally impossible). The aggregator concentrates — not widens — the blank-import privilege; the mock LLM driver stays OUT (D-089's gated boundary). New drivers add their import to the aggregator, never to main.go.

6. The Wave A devstack parity test is superseded by a strictly stronger gate. TestPlannerConfigFromConfig_FieldParityWithProduction (and its export_test.go alias) guarded the devstack duplicate; with the duplicate deleted, parity holds by construction. The replacement gates: the reflection field-parity test at the owning package (internal/planner/from_config_test.go) + the integration-level B3 wiring gate (test/integration/phase110c_config_projections_test.go registers a capture planner driver and proves the config that reaches planner.Resolve through a devstack assembly carries every operator-set field and equals planner.ConfigFromOperator's output).

7. Devstack consumes every projection, with AssembleOpts overrides falling back to cfg. Assemble now opens skills from cfg.Skills via skills.SnapshotFromConfig (production mirror; AssembleOpts.SkillStore still wins), and the per-task driver's memory / skills-cap / planning-hints wiring falls back to the cfg-projected values when the opts overrides are unset — D-094 parity by construction instead of by comment.

Why. RFC §1's "ships as a Go module" requires a Go consumer with a *config.Config to reach subsystem snapshots without transcribing five unexported package-main helpers — and both existing transcriptions had shipped silent field-drop bugs. After 110c the config-duality seam moves from "partial" to "yes" on the audit's scorecard, and it is the substrate 110d's Assemble composes.

Findings I'm departing from. None.

Protocol additions. None — pure Go re-homing; zero schema or wire changes (no config field changed semantics; the examples load byte-identically).

Cross-references. D-193 (the re-homing program; Wave B Stage 1 = 110a ∥ 110c), D-155 (the recurrence class closed), D-094 (the mirror tax this deletes), D-149 (the planner-adjacent knobs), D-169/D-170 (the planner fields + spawn-depth default in scope), D-089 (the mock stays gated), D-044 (governance latent default unchanged). CLAUDE.md §4.4 + §13 (amended), §10, §17.6 (the copyModelProfiles cross-fix). RFC §6.5, §6.6, §6.7, §9, §10. Plan: docs/plans/phase-110c-config-projection-exporters.md; findings: docs/notes/sdk-friction-audit.md §1 (B3) + §6 + §7.


D-197 — Phase 110d: the assembly entry point (assemble.Assemble) — the D-094 subsystem-wiring mirror collapses to thin callers; MCP attach / OAuth assembly / deps-aware events factory promoted; the headless recipe ships acceptance-gated

Date: 2026-06-09 Status: Settled (shipping with Phase 110d)

Where it lives: internal/runtime/assemble/assemble.go (Assemble / Stack / Options, the closer chain, the partial-failure contract) + assemble_test.go (golden boot, forced-failure table, Skip knobs); internal/tools/drivers/mcp/attach.go (mcpdrv.Attach + the exported ProjectToolPolicies config→tools.ToolPolicy projection) + attach_test.go; internal/tools/auth/build_providers.go (auth.BuildProviders + the KEK resolver) + build_providers_test.go; internal/events/openwith.go (events.OpenWith / Deps / RegisterWithDeps) + openwith_test.go; internal/events/drivers/durable/durable.go (the deps-aware durable registration) + openwith_test.go; cmd/harbor/cmd_dev.go (bootDevStack thin-wrapper conversion; applyToolCatalogWiring / resolveOAuthTokenKEK / attachDevMCPServer / projectMCPToolPolicies / toolPolicyFromProjected / cloneStringMap deleted); harbortest/devstack/devstack.go (tryAssemble and its hand-mirrored fan-out deleted; assembleWith is the thin core; attachDevStackMCPServer deleted); docs/recipes/embed-harbor-headless.md (+ index entry); test/integration/phase110d_assemble_test.go (the recipe-path E2E + concurrency stress); test/integration/phase83g_mcp_dev_consumer_test.go (the devstack policy-projection regression gate, §17.6); scripts/smoke/phase-110d.sh (+ re-pointed assertions in phases 83g/83i/83l/83m/83n/107c smokes); docs/glossary.md ("assembly entry point").

Decision. The config→stack fan-out existed in exactly two places — cmd/harbor/cmd_dev.go::bootDevStack (package main) and devstack's unexported, *testing.T-gated tryAssemble — and the copies had drifted (SDK friction audit §2 P6–P8, §5). Phase 110d promotes ONE exported, error-returning assemble.Assemble(ctx, cfg, opts) (*Stack, error); both callers are thin wrappers; the official external surface stops being "a test fixture wearing the assembly-entry-point's clothes". Settled calls and reconciliations (production wins where the copies disagreed; each difference recorded here):

1. Drift reconciliations the conversion closed (§17.6 fix-both-sides). (a) Devstack's MCP attach silently DROPPED the Phase 26b ToolPolicy projection — the promoted mcpdrv.Attach carries it; the regression is pinned twice (the attach SSE-server E2E asserts the projected policy lands on the Registry; the phase83g integration test now declares a policy: block against the real stdio fixture and asserts the registry view). (b) Devstack never constructed cfg-declared OAuth providers (cfg.Tools.OAuthProviders was ignored); the assembly runs auth.BuildProviders for every caller, with Options.OAuthProviders entries overriding same-named cfg-built ones (caller owns injected lifecycles). (c) Devstack's posture Counters passed a nil SessionLister with a stale "devstack assembles no session registry" comment (false since D-171) — now stack.Sessions. (d) Devstack gains the Agent Registry (production had it; the kit did not).

2. One deliberate ordering change: State opens BEFORE the event bus. Pre-110d production opened bus→state; the durable event-log driver could therefore only share the runtime's StateStore via cmd-side direct construction (it never did — it opened a PRIVATE store from events.state_driver). The assembly opens state first and calls events.OpenWith(ctx, cfg.Events, red, Deps{State}), so events.driver: durable with no state_driver now shares the runtime's store (the store outlives the bus; closers run in reverse). Precedence in the deps-aware durable factory: explicit events.state_driver wins (dedicated, owned store — operator intent); else a non-nil Deps.State is shared (not owned); else fail loud naming both ways out (PR #91's §13 posture carried forward). events.OpenWith is a PARALLEL entry point — the registered Factory signature, Open, and deps-ignorant drivers are byte-identical; a third deps-aware driver post-V1 reopens the Factory shape as an RFC follow-up, not this seam.

3. §4.3 signature refinements from the plan. auth.BuildProviders(ctx, config.ToolsConfig, BuildDeps) (map[string]OAuthProvider, error) returns the provider map ONLY — approval gates were never built by applyToolCatalogWiring itself but by the catalog Builder's AppliedGates output, which the assembly invokes; returning gates from BuildProviders would have required auth → catalog (an import cycle: catalog already imports auth). applyToolCatalogWiring is deleted outright rather than "reduced to a thin call" — its body IS the assembly's catalog band. mcpdrv.Attach(ctx, ms, AttachDeps{Catalog, Registry, Bus, Logger, DefaultIdentity, Closers}) uses a deps struct over the plan's 7-arg positional sketch (godoc'd per-field; DefaultIdentity is the Phase 83m transport-event fallback both callers previously hardcoded).

4. Scope boundary: Assemble ends where the network surface begins. Protocol surfaces (ControlSurface, posture, search, per-page services), transports/mux/CORS, dev auth, draft store, devseed, and listeners stay in cmd/harbor; the test-kit conveniences (signer, httptest-able Handler, draft temp-dir) stay in devstack. The per-task run-loop DRIVER (the task.spawned subscriber) deliberately stays per-caller — it is Phase 110b's seam (the population helpers are shared; the subscriber shell is not), and a headless embedder drives Stack.RunLoop.Run directly (the recipe's shape). (The residual driver-shell divergence this leaves — devstack's missing trajectory registration + tasks-protocol Enricher — is recorded in D-195's 2026-06-09 dated note.) Options is the union of what the two real callers need today (Logger, LLMSnapshot, PlannerOverride, SkillStore, OAuthProviders, PreRegisterTools, MCPDefaultIdentity, MetricsOptions, the three Skip knobs) — no speculative embedder wishlist (§13 options-creep guard). One widening: a PlannerOverride now yields a RunLoop even without an LLM client (pre-110d devstack also required LLMClient != nil); the override IS the planner, so the LLM gate was incidental.

5. Lifecycle contract. On error Assemble returns the PARTIAL *Stack (devstack's tryAssemble contract, kept — the caller's deferred Close drains whatever opened); Stack.Close(ctx) error runs the closers in reverse, joins errors, and is idempotent via sync.Once. cmd seeds its rollback list with stack.Close so cmd-only legs close first, the assembled core last — the same effective order as the pre-110d flat list. The governance identity-tiers honesty warning and the notification.* subscriber moved INTO the assembly (one home; both callers previously duplicated them).

Why. Brief 01 §5: two hand-ordered copies of the fan-out are two modes of the same feature, and they had already diverged on a security-adjacent surface (tool policy). Brief 06 §5 one layer down: when the only assembly prior art is package main + a *testing.T fixture, every embedder re-implements boot. After 110d the audit's "reachable headless: no" flips to yes for in-module consumers — docs/recipes/embed-harbor-headless.md is honestly writable because test/integration/phase110d_assemble_test.go executes it (Defaults → ValidateCore → prod import → Assemble → one goal through planner/runloop/executor → AnswerEnvelope → Close), plus durable-store sharing, identity propagation, two failure modes, N=10 concurrent Assemble/Close cycles and N=100 concurrent runs against one stack under -race. Wave D (the external facade RFC) inherits a promotable entry point instead of a binary.

Findings I'm departing from. None.

Protocol additions. None — pure Go re-homing; zero schema or wire changes.

Cross-references. D-193 (the re-homing program; this is Wave B's capstone), D-194 (110a executor — Stack.Executor), D-196 (110c projections/Defaults/ValidateCore/aggregator — the substrate Assemble composes), D-094 (the mirror this collapses), D-150 (Phase 83g MCP attach), D-090/D-095 (catalog wiring + OAuth assembly), D-074 (durable-log degradation posture), D-089 (mock stays gated; validateLLMProvider remains cmd-side policy), D-171 (sessions create-on-first-use), D-025 (the Stack as compiled artifact), D-026 (heavy-content threshold threaded into the executor). CLAUDE.md §4.3 (recorded deviations), §4.4, §13 (primitive-with-consumer: both callers convert in the same phase), §17.3/§17.6/§17.7 (wave-end E2E + fix-both-sides). RFC §6.4, §6.13, §9, §10. Plan: docs/plans/phase-110d-assembly-promotion.md; findings: docs/notes/sdk-friction-audit.md §2 (P6–P8) + §5.


D-198 — Phase 111a: governance enforcement assembled from config — NewSubsystemFromConfig + SetFactory's first production caller; the SetFactory-vs-per-Open decision; Wrap documented as the multi-runtime escape

Date: 2026-06-10 Status: Settled (shipping with Phase 111a)

Where it lives: internal/governance/assembly.go (NewSubsystemFromConfig) + assembly_test.go (D-044 latent pin, nil-deps fail-loud, behavioural compose-order pin, reject-short-circuit zero-provider-calls pin, shared-store persistence, N=128 concurrent reuse); internal/runtime/assemble/assemble.go (the delimited Phase 111a band: eager build → SetFactory BEFORE llm.Open, ClearFactory on empty tiers and on Stack.Close); internal/governance/registry.go (SetFactory godoc: the process-global multi-runtime limitation + the Wrap escape; the wrapper hook's factory-error path now warns loudly instead of silently passing through); internal/governance/wrap.go (the headless composition godoc); test/integration/phase111a_governance_test.go (the three-enforcer E2E + latent golden + cross-session isolation + missing-identity + N=100 concurrent reuse); docs/recipes/embed-harbor-headless.md ("Enforce governance headless") + docs/recipes/run-harbor-dev.md (tiers-now-enforce note); internal/config/config.go / examples/harbor.yaml / docs/CONFIG.md / docs/skills/define-the-agent-yaml / docs/skills/validate-and-package (posture-only phrasing flipped to enforcement, §18 same-PR rule); scripts/smoke/phase-111a.sh; docs/glossary.md ("Governance enforcement assembly").

Decision. Governance enforcement was a fully-built primitive with zero production consumers: SetFactory's only caller was a test, and a populated governance.identity_tiers map drove ONLY the read-only posture surface — clean validation, silent no-op (SDK friction audit §1+§3; a §13 primitive-without-consumer standing violation). Phase 111a ships the assembly and settles three calls:

1. The exported assembly entry. governance.NewSubsystemFromConfig(cfg, store, bus) (Subsystem, error) composes NewCompound(NewMaxTokensEnforcer, NewRateLimiter, NewCostAccumulator) in the documented cheapest-reject-first order. Empty IdentityTiers(nil, nil), preserving the D-044 latent default exactly (the wrapper hook treats a nil Subsystem as pass-through; the one sanctioned "no enforcement" state, visible in posture). Non-empty tiers with a nil store or nil bus → wrapped ErrInvalidConfig — enforcement without persistence or observability is a misconfiguration, not a degraded mode.

2. The production consumer (§13) + the eager-build shape. assemble.Assemble (D-197) calls NewSubsystemFromConfig EAGERLY at boot whenever cfg.Governance.IdentityTiers is non-empty — a construction failure fails the boot loud — and installs the already-built Subsystem via SetFactory BEFORE llm.Open composes the wrapper chain, so the in-wrapper factory-error fallback stays unreachable in production (it now logs a loud Warn for the only callers that can reach it: test-installed factories). Empty tiers call ClearFactory, and the stack registers ClearFactory as a closer: the factory's state always reflects the LAST Assemble's governance config, and a stale factory from a prior stack in the same process can never wrap a stack that declared no tiers (the cross-stack-bleed shape the eager closure would otherwise allow). The Wave A posture-only boot warning (PR #278) is deleted — its condition can no longer occur — and every operator-facing surface that said "enforcement not yet wired" (config godoc, example yaml, CONFIG.md, two skills, the prod aggregator comment, cmd/devstack comments) flips to enforcement phrasing in the same PR.

3. SetFactory-vs-per-Open — evaluated, decided: keep SetFactory global. The seam is process-global; two stacks with different tier maps in one process collide (second SetFactory wins). The binary assembles exactly one stack, llm.Open already consults the seam, and D-044 settled the shape — so no new per-llm.Open option is minted for a consumer that doesn't exist. The multi-runtime escape ALREADY exists and is now documented as the SDK path: governance.Wrap(client, sub) with a per-stack NewSubsystemFromConfig (governance stays outermost per D-043) — godoc'd on SetFactory + Wrap and recipe'd in embed-harbor-headless.md.

Why. Brief 03's two-parallel-modes smell one layer up: the config knob shipped two behaviours (display vs. enforce) and silently delivered only one — populated tiers now mean enforcement, full stop. Brief 08 pinned that bifrost reports real USD cost (Usage.Cost.TotalCost), so there was no missing-data excuse for latency. The E2E proves a configured tier actually gates a real assembled stack: cost ceiling (ErrBudgetExceeded + governance.budget_exceeded), a 1-call rate bucket (ErrRateLimited + governance.rate_limited), and an over-cap MaxTokens request (ErrMaxTokensExceeded + governance.maxtokens_exceeded), each with identity propagation asserted on the emitted event; a governance reject emits NO provider-side request (the wrapped inner client's counter is pinned); one session exhausting its budget never gates a sibling (§6 rule 10); the latent default is golden-tested (marker-bounded zero-governance.* assertion); and the wrapped client + Compound carry D-025 (N=100 integration + N=128 unit concurrency under -race, goroutine baseline restored after Close).

Findings I'm departing from. None. One §4.3 plan correction recorded in the plan file: the Wave A posture-only warning lived in assemble.go (one home post-110d), not internal/config/validate.govalidateGovernance never carried a warning, so the removal lands in the assembly.

Protocol additions. None — no new event types (the three governance.* rejection events shipped with 36a/36b), no wire changes; governance.posture is unchanged.

Cross-references. Implements Wave C item 1 of D-193 (the 111 band); builds on D-044 (the latent default this preserves), D-043 (governance outermost), D-196 (110c ConfigFromOperator consumed), D-197 (110d assemble.Assemble — the wiring site; one home, cmd + devstack thin callers), D-081 (the tier-only config surface), D-025 (concurrent reuse), D-089 (mock-LLM E2E driver, explicitly gated). CLAUDE.md §13 (primitive-with-consumer; no silent degradation), §6 rule 10 (cross-session isolation), §17.3 (real drivers at the seam), §18 (skill same-PR updates). RFC §6.15, §6.5, §6.11. Plan: docs/plans/phase-111a-governance-enforcement-assembly.md; findings: docs/notes/sdk-friction-audit.md §1 + §3.


D-199 — Phase 111b: the tool-OAuth completion leg — auth.CallbackHandler is CompleteFlow's production caller; the flow record is the callback's identity source; denied authorizations resume-with-rejection (DenyFlow); one steer-and-resume recipe

Date: 2026-06-10 Status: Settled (shipping with Phase 111b)

Where it lives: internal/tools/auth/callback.go (CallbackHandler / CallbackOption / WithCallbackLogger / WithSuccessPage / CallbackPath = /v1/tools/oauth/callback / CallbackRoutePattern) + callback_test.go (mappings, no-secret assertions, replay, the D-025 N=128 concurrent pin); internal/tools/auth/auth.go (PendingFlowInfo; the OAuthProvider interface gains PendingFlow + DenyFlow; the RedirectURI godoc re-pointed at the now-real handler); internal/tools/auth/provider.go (Provider.PendingFlow returns the info projection; Provider.DenyFlow; the CompleteFlow godoc honesty edit superseded by truth); internal/tools/auth/drivers/oauth2/oauth2.go (passthroughs + the ErrMissingRedirectURL message naming the mount); cmd/harbor/cmd_dev.go (the unauthenticated GET /v1/tools/oauth/callback mount over assemble.Stack.OAuthProviders); harbortest/devstack/devstack.go (the mirrored mount — thin-caller parity per D-197); test/integration/phase111b_oauth_completion_test.go (the full choreography E2E + expired-flow + replay legs); docs/recipes/steer-and-resume-a-run.md (+ index entry); scripts/smoke/phase-111b.sh; docs/glossary.md ("OAuth callback handler").

Decision. Provider.CompleteFlow — the resume half of the tool-OAuth pause — had ZERO production callers; no route anywhere exchanged (state, code), and the godocs referenced a callback handler that did not exist (SDK friction audit §3). Phase 111b ships the real thing and closes the §13 primitive-without-consumer pair (InitiateFlow/CompleteFlow are the SpawnTask/AwaitTask of OAuth). Settled calls:

1. The handler shape. auth.CallbackHandler(providers map[string]OAuthProvider, opts ...CallbackOption) http.Handler — a plain handler with no Protocol-server / dev-server / cmd dependency. State→owner lookup across the provider map via PendingFlow; CompleteFlow on the owner; sentinel→status mapping ErrFlowNotFound→404, ErrFlowExpired→410, ErrStateMismatch→400, upstream exchange/discovery/registration failure→502, ErrProviderClosed→503, missing params / upstream error→400; success → a static HTML page. No token / code material in any response or log line (§7; pinned by test). harbor dev mounts it at GET /v1/tools/oauth/callback (the documented default RedirectURI shape) BEFORE the /v1/ catch-all; devstack mirrors; both read the same assemble.Stack.OAuthProviders (D-197) so the flow records the catalog wrapper parks are the records the route completes.

2. The flow record is the callback's identity source. The provider redirect carries no Harbor JWT — it CANNOT. The handler rebuilds the completing ctx from the provider's OWN flow record (PendingFlowInfo.Identity, pinned at initiation) and, for ScopeAgent flows, restores the control scope whose admin gate already fired at InitiateFlow. The unguessable one-time 256-bit state nonce is the bearer capability (standard OAuth state semantics). The handler adds zero identity logic of its own — CompleteFlow's identity cross-check and the Coordinator's resume-scope check verify against the same record (brief 09). The route is therefore mounted WITHOUT auth middleware, deliberately and documentedly.

3. §4.3 interface refinements. Provider.PendingFlow(state) bool (zero non-test callers) becomes PendingFlow(state) (PendingFlowInfo, bool) — the bool alone could not locate an owner or rebuild identity; PendingFlowInfo exposes Source / BindingScope / Identity / ExpiresAt and deliberately NOT the PKCE verifier or the pause Token (a bare Coordinator.Resume without CompleteFlow re-parks the run immediately — the trap the plan names). Both methods land ON the OAuthProvider interface (no Supports* ceremony — §4.4: every implementation implements everything; the oauth2 driver passes through).

4. Denied authorizations resume-with-rejection. Upstream error=access_denied → the handler answers 400 with the audit-safe reason AND consumes the flow via the new Provider.DenyFlow(ctx, state, reason), which resumes the pause with the typed DecisionReject marker (D-096). The run fails loud instead of hanging to flow-TTL; the denial is observable as pause.resumed{Decision: reject} (no new event type — the coordinator's emission is the signal). Composes with 111c's sweeper (independent, mutually reinforcing).

5. Run re-entry: one automatic leg, one steered leg. The OAuth pause's resolution is fully automatic (callback → CompleteFlowCoordinator.Resume, Decision: resume). The RUN-level re-entry — a planner that parked the run with RequestPause{ExternalEvent} — rides the EXISTING steering surface: a RESUME control on the run's inbox (the Protocol resume method / Console intervention queue / an in-process bus watcher), the same path HITL approval already uses. The E2E proves the observable contract the plan pins (run re-enters; the re-dispatched tool succeeds USING the freshly-minted token — the tool body fetches the bearer via provider.Token exactly as the HTTP/MCP drivers do) without inventing a parallel resume path (§13). An automatic completion→run-resume bridge is a recorded candidate follow-up; the recipe documents the honesty note.

6. One recipe, not a per-reason recipe. The OAuth completion choreography ships as a section of docs/recipes/steer-and-resume-a-run.md alongside the HITL-approval trigger — HITL and tool OAuth ride the SAME primitive (RFC §3.3); fragmenting per-reason would re-teach the four-parallel-implementations mistake.

Why. A flow you can start but never complete is indistinguishable from a hang (brief 09 — the reference machinery is an explicit initiate/complete PAIR). The pause producer had been live since Phase 30; every operator hit a wall at the redirect. With the handler mounted by default, harbor dev operators get working tool-OAuth with zero ceremony, and headless embedders mount one http.Handler.

Findings I'm departing from. None.

Protocol additions. None — the callback is an OAuth wire endpoint (RFC 6749 redirect target), not a Protocol method; no wire types, no method names, no error codes added.

Cross-references. D-083 (Phase 30 — the provider pair this completes), D-067 (the one Coordinator), D-096 (typed Decision markers — resume on completion, reject on denial), D-097 (the gate bridge whose direct-Resume path the steered re-entry leg preserves), D-192 (mid-step drain — re-entry reachable while a dispatch is in flight), D-193 (the re-homing program; 111b is Wave C), D-197 (Stack.OAuthProviders — the one provider assembly both mounts read), D-025 (the handler as compiled artifact). CLAUDE.md §7 (secrets), §13 (primitive-with-consumer — closed in-phase), §4.3 (the recorded signature refinements), §4.4 (no optional-capability ceremony). RFC §6.4, §3.3, §6.3. Plan: docs/plans/phase-111b-tool-oauth-completion.md; findings: docs/notes/sdk-friction-audit.md §3.


D-200 — Phase 111c: durable pauses + the pause lifecycle — trajectory threaded into the production pause path, WithCheckpointStore wired in the assembly, the max-park sweeper as DecisionTimeout's first producer, timeout-is-terminal

Date: 2026-06-10 Status: Settled (shipping with Phase 111c)

Where it lives: internal/runtime/steering/runloop.go (requestPause trajectory threading; awaitResumeSignal — the parked run's timeout wake — + pauseTimedOut / timeoutFinish and the ErrAlreadyResumed-lost-to-timeout race carve-out) + runloop_timeout_test.go; internal/runtime/pauseresume/coordinator.go (WithMaxParkDuration; derived expiresAt stamping at Request + re-stamp on rehydrate; Status.Decision); internal/runtime/pauseresume/sweeper.go (newRunSweeper / WithSweepInterval / WithSweeperLogger, sweepOnce, ErrSweeperMisconfigured) + sweeper_test.go; internal/runtime/assemble/assemble.go (the ONE Coordinator now constructed WithBus + WithCheckpointStore(stack.State) + WithMaxParkDuration; the config-gated sweeper goroutine on the closer chain) + assemble_pauseresume_test.go; internal/config (PauseResumeConfigpauseresume.max_park_duration / pauseresume.sweep_interval — Defaults + validatePauseResume); examples/harbor.yaml + examples/dev.yaml + docs/CONFIG.md; test/integration/phase111c_durable_pause_test.go; docs/recipes/steer-and-resume-a-run.md (durability + expiry section; file created here — sibling 111b owns the steering-control half); scripts/smoke/phase-111c.sh; docs/glossary.md ("Max park duration", "Pause sweeper").

Decision. The pause/resume primitive shipped durability machinery nothing turned on and a lifecycle with no end (SDK friction audit §3; D-193 Wave C item 3). Phase 111c closes all three gaps:

1. Trajectory threading. steering.RunLoop.requestPause hands the run's LIVE trajectory (RunSpec.Base.Trajectory; planner.Trajectory = trajectory.Trajectory) into the PauseRequest — the Trajectory: nil + "later-phase concern" comment are gone. A non-serialisable trajectory leaf fails the run loud at Request time with trajectory.ErrUnserializable (§11 mandatory test on the production path); nothing is half-persisted.

2. Checkpoint-store wiring (§13 primitive-with-consumer). The ONE Coordinator construction in assemble.Assemble (the merged D-197 assembly site) now passes WithCheckpointStore(stack.State) — cmd + devstack inherit as thin callers, so the D-094 hand-mirror failure mode is closed by construction. Every pause checkpoints through the runtime's own StateStore (D-067 — no parallel persistence seam); the durability E2E proves the restart shape end-to-end (real RunLoop pause → byte-stable trajectory in the format_version: 1 envelope → NEW Coordinator over the SAME store → Resume → run continues) and re-asserts the destructive-Resume contract (resumed ⇒ checkpoint deleted ⇒ ErrPauseNotFound).

3. Pause lifecycle — the sweeper. WithMaxParkDuration(d) stamps a DERIVED expiry (PausedAt + d; zero = never, the default) — deliberately never persisted, so the format_version: 1 envelope is untouched and a restarted Runtime applies its OWN ceiling to rehydrated pauses. pauseresume.RunSweeper(ctx, coord, opts...) reaps expired pauses by calling the public Coordinator.Resume(token, DecisionTimeout, auditFacts) under each pause's OWN identity — DecisionTimeout's first producer (D-096's reserved value goes live; its "no producer yet" godoc is corrected). Reaping deletes the checkpoint; cancel-while-paused stops orphaning records (the sweeper-at-deadline backstop is the shipped floor; no eager cancel-time release — the cancel path stays untouched). The assembly starts the sweeper config-gated (max_park_duration > 0), cancellable + joined on Close (§5; goroutine-baseline test green). Per-record reap failures (e.g. ErrToolContextLost) log loud and do NOT halt the pass; losing the race to a legitimate Resume (ErrAlreadyResumed / ErrPauseNotFound) is the documented benign loser outcome (exactly-once pinned under -race, N=100).

4. Timeout is terminal (the plan's settled semantic). A timed-out pause finishes the waiting run with Finish{ConstraintsConflict} (metadata steering_reason: pause_timeout) — the D-071 REJECT posture applied to deadlines; never a silent unpark-and-continue (the planner is not re-entered), never a park-forever. The parked RunLoop observes the out-of-band reap through two channels: the canonical pause.resumed bus event (primary; identity-scoped subscription while parked) and a coarse Coordinator.Status re-check (the delivery-independent backstop and the only channel on a bus-less RunLoop) — Status gains an additive Decision field so the observer can distinguish timeout from a legitimate out-of-band resume without parsing payloads. A legitimate RESUME control that loses the race (its Coordinator.Resume surfaces ErrAlreadyResumed because the sweeper won) yields the honest timeout-terminal Finish, not a run error; a non-timeout ErrAlreadyResumed still fails loud (the carve-out never widens into silent swallowing). Non-timeout out-of-band resumes (e.g. OAuth completion) deliberately do NOT wake the park — those flows re-enter via steering controls exactly as before (no collision with 111b).

5. §4.3 deviation — the sweeper scan is registry-internal, not Coordinator.List. The plan sketched the sweeper "over the existing List surface"; its Risks section anticipated the conflict and it materialised: List is §6-identity-scoped by design (empty TenantIDs = caller's own tenant; cross-tenant filters must NAME tenants under AdminScoped) — there is no "all tenants" wildcard, and a maintenance actor cannot enumerate tenants it has never seen. Rather than widening §6 with a wildcard or minting an elevated List shape, the sweeper lives in the pauseresume package and snapshots the registry directly (value copies under the mutex — the same discipline List itself uses), while every MUTATION goes through the public Resume under the pause's own triple: scope check, handle re-attach, checkpoint delete, and event emit run unmodified. No storage-level identity filter is bypassed. Consequence (recorded limitation): the V1 sweeper reaps pauses live in the process registry; checkpoints orphaned by a PROCESS CRASH are rehydrated on demand (Status/Resume) but not proactively scanned — state.StateStore has no scan-by-kind surface, and adding one is a §9 RFC conversation, not a quiet widening. Filed as the known V1 boundary in the plan's deviations note. Resolved (2026-06-10, D-207): the §9 conversation happened — RFC §6.11 gained the ONE explicitly-elevated maintenance scan (StateStore.ListKind(ctx, ListScope{MaintenanceScoped: true}, kindPrefix), all three drivers + conformance suite), and every sweep pass now rescues crash-orphaned pauseresume.checkpoint: rows into the registry (rescanCrashOrphans) so the unchanged expired-scan + public-Resume path reaps them at deadline.

Why. Brief 02's whole durability premise ("the planner can pause … get serialised to a state store, and be resumed in a different process") was false on every production path — both assemblies constructed the Coordinator storeless with the store in scope, and even with a store the pause carried no trajectory. And a lifecycle with only an entry edge (Resume was the ONLY checkpoint-deletion path) leaks by construction. Closing both in one phase keeps the §13 pairing honest: the durability machinery gets its production consumer, and the reserved timeout Decision gets its producer, in the same wave.

Findings I'm departing from. None (the List-vs-registry-scan resolution follows the plan's own Risks instruction; recorded above and in the plan).

Protocol additions. None — pause.resumed with decision: timeout was already typed on the wire (D-096); it now occurs in production. The runtime-internal pauseresume.Status gains the additive Decision field (not a wire type).

Cross-references. D-193 (Wave C item 3), D-197 (the one assembly site this wires), D-067 (StateStore as the checkpoint seam), D-069 (format_version: 1 + fail-loud serialise), D-096 (the typed Decision marker; first producer delivered), D-071 (REJECT-is-terminal posture mirrored for deadlines), D-110 (the §6-scoped List the sweeper deliberately does not widen), D-025 (Coordinator/RunLoop stay compiled artifacts; store-backed concurrent-reuse extension), D-192 (the re-entry path the E2E's "run continues" leg rides). CLAUDE.md §5 (fail-loud, ErrUnserializable), §6 (identity-scoped reaping), §11 (pause-serialization + goroutine-leak + concurrent-reuse tests), §13 (primitive-with-consumer ×2). RFC §3.3, §6.3, §6.11. Plan: docs/plans/phase-111c-durable-pause-lifecycle.md; findings: docs/notes/sdk-friction-audit.md §3.


D-201 — Phase 111d: the canonical skills surface — builtin skill_* delegate to the Phase-38/41 handlers; harbor skill import/rm ship over importer.ImportAndStore; the Directory is wired as the <skills_context> producer (owner decision, 2026-06-09)

Date: 2026-06-10 Status: Settled (shipping with Phase 111d)

Where it lives: internal/tools/builtin/skill_search.go / skill_get.go / skill_list.go / skill_propose.go / skill_capability.go (the delegations + the server-computed capability envelope) + skill_delegation_test.go; internal/tools/builtin/builtin.go (RegistryContext gains Bus / Redactor / GrantedScopes; registry entries for skill_list / skill_propose); internal/skills/tools/tools.go (exported SearchHandler / GetHandler / ListHandler seam; the §17.6 nil-Skills-slice fix in GetHandler); internal/tools/visible_names.go (tools.VisibleNames — the ONE allowed-tools producer); internal/skills/importer/importandstore.go (ImportAndStore / ImportReport / ErrDuplicateSkillName / WithOverwrite) + tests; cmd/harbor/cmd_skill.go (+ root.go bind, cmd_skill_test.go); internal/config/config.go + validate.go (skills.directory.{pinned,max_entries,selection}; skill_list/skill_propose in the built-in allowlist); internal/skills/from_config.go (DirectoryFromConfig); cmd/harbor/cmd_dev.go + cmd_dev_runloop.go and the harbortest/devstack mirror (Directory construction + the <skills_context> swap); internal/runtime/runctx/runctx.go (ExtractSkillKeywords DELETED per its D-195 deprecation notice; ProjectSkillsDirectory added); internal/runtime/assemble/assemble.go (RegistryContext wiring); internal/drivers/prod/prod.go (honesty notes replaced with the truth); internal/planner/react/prompt.go (+ golden) — the discovery-section arg shapes; test/integration/phase111d_skills_surface_test.go; scripts/smoke/phase-111d.sh (+ re-pointed assertions in phases 38/83f/83m/110b smokes); docs/skills/configure-memory-and-skills/SKILL.md + define-the-agent-yaml/SKILL.md (§18 same-PR); docs/recipes/use-memory-and-skills-from-go.md; docs/CONFIG.md; docs/glossary.md.

Decision. The skills subsystem shipped deep (Phases 37–41) and production routed around it (SDK friction audit §3): the rich Phase-38 planner tools and the Phase-41 generator were registered NOWHERE while the boot path registered thinner parallel builtin bodies (the §13 two-implementations smell, live); the Phase-40 importer had no shipped invocation path; the Phase-39 Directory had only test consumers. Phase 111d converges all three onto ONE canonical surface. Settled calls:

1. The Phase-38/41 handlers are the single implementation home; the builtin registry stays the single registration carrier. internal/tools/builtin's skill_search / skill_get become thin delegations to the exported skilltools.SearchHandler / GetHandler; the duplicate query/projection bodies (including the 107c client-side tag filter) are DELETED, not toggled. skill_list (Phase-38's third tool) and skill_propose (the Phase-41 generator, D-054 semantics untouched — conflict policy + audit-mandatory emit + rollback) gain their first production registrations through the same carrier. Net effect: capability default-deny filtering, tool-name redaction, and the skill_get token budgeter run on the production path for the first time.

2. The capability envelope is SERVER-computed, never LLM-supplied. The builtin arg shapes deliberately omit the rich handlers' capability field — a model must not widen its own allowed-tool set. CapabilityContext.AllowedTools is derived per call from tools.VisibleNames(catalog, CatalogFilter{triple, GrantedScopes}) over BOTH loading modes (the run's full reachable set). AllowedNamespaces / AllowedTags stay EMPTY — Harbor has no runtime source of namespace/tag grants, so skills requiring them are default-deny filtered (the plan's "surface it rather than pass allow-all" risk resolved on the deny side); when a grants surface lands, runCapability + the run-loop call site are the two places to thread it.

3. skill_propose opt-in rides the existing tools.built_in names list — no second enablement mechanism. The plan sketched a tools.builtin.skill_propose.enabled key; implementing it would have added a parallel enablement shape next to the 107c names-list carrier (§13). Default-disabled = absent from every recommended set in examples/ and the init template; the explicit tools.built_in: [skill_propose] listing IS the yaml opt-in (§4.3 deviation, recorded in the plan).

4. Ingestion ships: importer.ImportAndStore + the harbor skill import / harbor skill rm verbs. ImportAndStore(ctx, id, store, deps, path, opts...) composes the Phase-40 pipeline (frontmatter scan, validation, path-safe attachment resolution rooted at the file's directory) with the store upsert. Conflict policy: duplicate names reject LOUD with ErrDuplicateSkillName unless WithOverwrite(); under overwrite the store's own pack-protection still gates (the verb writes Origin=pack, so pack→pack and pack-over-generated replace; the generator can still never overwrite pack). The CLI verbs are THIN callers (never a second implementation), resolve the store from harbor.yaml's skills: block (the same projection harbor dev boots), print the resolved driver + userinfo-redacted DSN, default to the dev identity triple with --tenant/--user/--session overrides, honour --json, and exit non-zero on rejection with stable codes (skill_config_invalid / skill_import_rejected / skill_rm_failed / skill_internal_error).

5. Directory disposition — RESOLVED: wire it (owner, 2026-06-09; recorded in the plan pre-implementation, logged here in full at ship). Directory.View is the producer of the run loop's <skills_context> prompt block — pinned-then-recent, identity-scoped, capability-filtered (same capfilter source as Phase 38, D-052), redacted — replacing the raw SkillStore.Search + runctx.ExtractSkillKeywords path in both cmd/harbor/cmd_dev_runloop.go and the devstack mirror. The owner adopted the plan's recommendation with the D-176 manifest-pattern + KV-cache framing: a stable pinned-then-recent browse window mirrors the session-artifact manifest, and a stable prompt prefix beats a per-turn query-churned block; per-query RELEVANCE retrieval is the LLM's job via skill_search (107c); the raw-Search path bypassed the capability filter + redaction (a real injection-hygiene gap); operator pinning (DirectoryConfig.Pinned) becomes functional for the first time. The supersede alternative (delete the Directory, keep the keyword heuristic) was presented and declined. Consequences: ExtractSkillKeywords is DELETED per its D-195 deprecation notice (golden tables removed; the 83m item-4 smoke assertions re-pointed at the supersession); runctx.ProjectSkillsDirectory projects the compact SkillView shape (name/title/trigger/task_type/pinned — full bodies stay behind skill_get); the new skills.directory.{pinned,max_entries,selection} config block feeds skills.DirectoryFromConfig, with unset max_entries falling back to the resolved planner.skills_context_max so the pre-111d injection-budget knob keeps its meaning; the Directory is constructed once per stack at the two driver sites (the run-loop driver shell is per-caller per D-197 call 4).

6. §17.6 fix surfaced by the new tests. Invoking the rich skill_get through a catalog with zero surviving skills returned "skills": null and failed the inproc output-schema validation — a latent Phase-38 bug (the handlers were never production-registered, so no catalog-path caller had hit it). Fixed in GetHandler (non-nil empty slice), in the same PR per the fix-both-sides rule.

Addendum (2026-06-10, Wave C checkpoint audit). ImportAndStore's duplicate-name gate is check-then-act (store.Get then store.Upsert, no store-level conditional insert on the V1 SkillStore interface): two CONCURRENT same-name imports without WithOverwrite can both observe ErrSkillNotFound and last-write-win silently. The checkpoint fix serialises the gate+upsert window behind a process-local mutex inside ImportAndStore itself — acceptable for the surface's actual callers (the one-shot CLI verb and single-process headless embedders), and recorded honestly in the godoc. A store-level conditional create (the real cross-process fix) is the noted follow-up if a multi-process ingestion path ever lands; same-name concurrent imports of IDENTICAL content converge regardless.

Why. Brief 04's designed surface (rich tools §4.5, importer §4.7, directory §4.6, persistence-capable generator §5) existed in full and was unreachable — "a Harbor-defining feature that is unreachable is not shipped." After 111d a headless consumer gets ONE answer to "how do I do skills in Go" (ImportAndStore → the exported handlers → Directory.View; docs/recipes/use-memory-and-skills-from-go.md), and the operator gets the verbs the SKILL.md had documented fictionally. The audit's "a headless consumer cannot tell which retrieval surface Harbor stands behind" is closed by there being exactly one.

Findings I'm departing from. None — the phase is the act of stopping a silent departure from brief 04.

Protocol additions. None — CLI + Go surface only; zero schema or wire changes. (The <skills_context> block's per-entry shape changes from search-ranked full bodies to compact directory views; the planner wrapper contract — section name, UNTRUSTED framing — is unchanged.)

Cross-references. D-193 (the re-homing program; this closes audit §3), D-195 (the deprecation notice executed + runctx home), D-196 (config projections precedent; SkillsContextMaxResolved fallback), D-197 (assemble carries the builtin RegistryContext wiring; driver shell stays per-caller), D-052 (Phase 39 directory + the shared capfilter), D-054 (generator semantics, untouched), D-053 (importer round-trip), D-167 (the 107c meta-tool carrier + LoadingMode shape), D-156 (granted-scopes input to the capability envelope), D-176 (the manifest-pattern framing the owner adopted), D-149 (the RunContext skills seam), D-025/D-026 (contracts re-proven through the new registration path). CLAUDE.md §13 (two-implementations closed by deletion; primitive-with-consumer: Phase-38/41 Register surfaces gain production consumers), §17.6 (the nil-slice fix), §18 (SKILL.md same-PR), §4.2 rule 8 (smoke degradation path), §6 (identity-scoped throughout). RFC §6.7, §8. Plan: docs/plans/phase-111d-skills-canonical-surface.md; findings: docs/notes/sdk-friction-audit.md §3.


D-202 — Phase 111e: trajectory compression ships — the TrajectorySummariser home, the RunLoop MaybeCompress call site, planner.token_budget wiring, and the single-compression scope fence

Date: 2026-06-10 Status: Settled (shipping with Phase 111e)

Where it lives: internal/llm/summarizer/trajectory.go (TrajectorySummariser + NewTrajectorySummariser + options; the two-interface disambiguation in the package godoc) and trajectory_test.go (unit + D-025 concurrent-reuse, N=120 shared runner + summariser under -race); internal/runtime/steering/runloop.go (RunSpec.Compression + the step-boundary MaybeCompress gate) and runloop_compression_test.go (golden no-op + fires-once + fail-loud); internal/runtime/assemble/assemble.go (Stack.Compression construction from planner.token_budget); cmd/harbor/cmd_dev_runloop.go + harbortest/devstack/devstack.go (the per-caller driver-shell Budget/Compression projection — both sides in the same PR, §17.6); internal/config (PlannerConfig.TokenBudget + validation); test/integration/phase111e_compression_test.go (the long-trajectory E2E over the 83l scripted-wire server); scripts/smoke/phase-111e.sh.

Decision. The Phase 46 trajectory-compression seam gets its production half end-to-end — the SDK friction audit's "dead on every production path" finding (§3) resolves as SHIP, not defer. Settled calls:

1. The summariser home is internal/llm/summarizer, as a DISTINCT type. NewTrajectorySummariser(client llm.LLMClient, opts ...TrajectoryOption) lands beside the Phase 64/D-089 memory Summarizer — same "LLM client + versioned compaction prompt" composition, same package precedent — but the two interfaces are never conflated: memory.Summarizer is conversation-window → summary text; planner.Summariser is trajectory → five-field TrajectorySummary. The package godoc carries the disambiguation; the import direction is clean (llm/summarizerplannerllm; no cycle). Options: WithTrajectoryModel (route compaction to a cheaper/stronger model than the planner's), WithTrajectorySystemPrompt, WithTrajectoryMaxSummaryTokens. The prompt is versioned (TrajectoryPromptVersion); the response rides the Phase 35 structured-output path (FormatJSONSchema + the existing downgrade ladder); the parse tolerates exactly one markdown fence and otherwise fails loud — garbage is a loud parse error, a vacuous-but-valid object is planner.ErrEmptySummary, an LLM error propagates wrapped.

2. The compaction payload is the planner-facing projection, not the raw serialize (recorded §4.3 deviation). The plan's sketch said "composes a compaction prompt over Trajectory's serialized state"; the implementation renders Step.LLMObservation (the D-026 heavy-content-disciplined projection — what the planner itself saw) over the raw Step.Observation, with a per-fragment byte cap (trajectoryFragmentCap, 4 KB) and a raw-observation fallback for pre-projection-split steps. Reason: the raw observation may legally carry heavy content that MUST NOT reach the LLMClient edge (§13 / ErrContextLeak); a compaction call that trips the safety net on exactly the over-budget trajectories it exists to rescue would be self-defeating. The estimator side is untouched — DefaultTokenEstimator still measures the full Serialize bytes (the budget meters what the trajectory CARRIES; the prompt renders what the planner SEES).

3. The RunLoop is the cadence owner: one MaybeCompress per step boundary, gated spec.Compression != nil && rc.Budget.TokenBudget > 0. The call sits after the control drain/projection and before Planner.Next, so the firing step's own prompt build already renders the Summary != nil path (the consumer half Phase 46 left pre-wired in the React prompt builder). Nil runner / zero budget is byte-identical to the pre-111e loop (golden no-op tests). A MaybeCompress error fails the run LOUDLY — the runner emitted trajectory.compression_failed, the loop returns the wrapped error, the task marks Failed. Never a silent fall-through that pretends compression happened.

4. Single compression per run — the V1.1.x scope fence. The runner's existing Summary != nil idempotence IS the fence: the always-over-budget E2E observes exactly one summariser invocation across the remaining steps. No auto-cascade; a trajectory that re-exceeds budget post-compression grows until the D-026 context-window safety net backstops it. Recorded follow-up: re-compaction cadence (clear-and-recompress policy, owned by the cadence layer) is deliberately out of scope; whoever picks it up clears Trajectory.Summary before re-invoking and revisits the fence here.

5. Production wiring through the merged 110d assembly; the budget is a run option, never planner state. assemble.Assemble constructs planner.NewCompressionRunner(NewTrajectorySummariser(stack.LLM)) onto Stack.Compression when planner.token_budget > 0 — and fails loud at assembly when the budget is set with no LLM configured (§13: no silently-inert knob). The per-task run-loop driver shells (cmd + devstack, the seam D-197 deliberately left per-caller) project cfg.Planner.TokenBudget onto RunSpec.Base.Budget.TokenBudget and Stack.Compression onto RunSpec.Compression — per brief 02 §planner-knobs the budget rides the per-run RunContext, never the planner struct (D-025). Headless reachability: NewTrajectorySummariser(client)NewCompressionRunner(s)RunSpec.Compression + Base.Budget.TokenBudget, no config file required; the recipe section in docs/recipes/configure-a-planner.md shows the snippet.

6. Godoc honesty, reversed. The Wave A dormant-seam markers on planner.Summariser ("Production consumer pending…") and Budget.TokenBudget ("CURRENTLY INERT…") are removed — both godocs now state the true wiring, and the smoke greps assert the markers stay gone.

Why. "Durable long-running agents" is hollow while compression is dead: the consumer half (the React prompt's Summary != nil branch) had been live since Phase 46 with no producer — a standing §13 primitive-without-consumer violation two RFC sections promise away (§6.2, §6.5). The E2E pins the value: a ~2.7 KB tool observation inflates the trajectory past an 800-token budget; the summariser fires once (one extra wire round-trip, latency logged); the next prompt grows by ~320 B instead of the ≥5.4 KB raw-history counterfactual while still carrying the load-bearing fact the final answer depends on; trajectory.compressed lands on the bus under the run's full quadruple.

Findings I'm departing from. None (the payload-projection refinement in call 2 is a §4.3 deviation from the plan's sketch, recorded above; it follows brief 02's intent — the planner-visible view is what gets compacted).

Protocol additions. None — trajectory.compressed / trajectory.compression_failed were already canonical event types (Phase 46); they now occur in production.

Cross-references. D-055 (the five-field summary + estimator mirror), D-025 (compiled-artifact reuse — both new artifacts tested at N≥100), D-026 (heavy-content discipline shaping call 2), D-089 (the summarizer-package precedent; mock stays gated), D-192 (the post-fix step loop this lands in), D-195/D-196/D-197 (driver shells / config projection / assembly bands this threads through), D-094 (cmd↔devstack mirror moved in the same PR). CLAUDE.md §13 (primitive-with-consumer; no silent degradation), §4.3, §17.6. RFC §6.2, §6.5. Plan: docs/plans/phase-111e-trajectory-compression-consumer.md; findings: docs/notes/sdk-friction-audit.md §3.


D-203 — Phase 111f: telemetry assembled in production (telemetry.New + RunErrorHandler + BridgeBusToTracer); the approval gate de-protocolized via the injected resolve authorizer; the Protocol import direction rule recorded

Date: 2026-06-10 Status: Settled (shipping with Phase 111f)

Where it lives: internal/runtime/assemble/assemble.go (telemetry.New construction with the eventbus.New(bus) emitter; Stack.Telemetry / Stack.Tracer / Stack.RunErrorHandler; NewTracer + BridgeBusToTracer started alongside the metrics bridge, all on the closer chain; Options.TelemetryOptions / Options.TracerOptions / Options.ApprovalAuthorizer); internal/telemetry/tracebridge.go (BridgeBusToTracer, DefaultTraceBridgeFilter, ErrTraceBridgeMisconfigured) + tracebridge_test.go; internal/runtime/flow/flow.go (flow.WithRunErrorHandler pass-through to engine.WithRunErrorHandler); internal/tools/approval/authorizer.go (ResolveAuthorizer / PendingInfo / IdentityAuthorizer / ErrAuthorizerRequired / ErrResolveForbidden) + authorizer_test.go; internal/tools/approval/gate.go (GateDeps.Authorizer mandatory; the internal/protocol/auth import DELETED); internal/server/approval_authorizer.go (ProtocolScopeAuthorizer — the wire-side adapter) + tests; internal/tools/catalog/catalog.go (Deps.Authorizer threaded into every built gate); internal/runtime/steering/apply.go (the D-192-era protocolauth.WithScopes self-elevation DELETED; resolve_forbidden apply-error class); cmd/harbor/cmd_dev.go + harbortest/devstack/devstack.go (both inject server.NewProtocolScopeAuthorizer(approval.NewIdentityAuthorizer())); docs/recipes/observe-an-embedded-runtime.md (+ index entry); test/integration/phase111f_telemetry_test.go + phase111f_approval_seam_test.go; scripts/smoke/phase-111f.sh; docs/glossary.md (three entries).

Decision (telemetry half). RFC §6.14's load-bearing claims were false on every production path: telemetry.New (the redactor-mandatory, identity-attributed, bus-paired Logger) had ZERO production callers, engine.WithRunErrorHandler described "production wiring" that did not exist, and NewTracer was never constructed despite main.go blank-importing its span exporters (metrics got BridgeBusToMetrics in PR #91; traces got nothing — the brief 06 "no OTel in the runtime" anti-pattern, half-closed). Phase 111f wires all three ONCE, in assemble.Assemble (the D-197 single fan-out; cmd + devstack inherit as thin callers): the Logger is constructed the moment the redactor + bus exist; the tracer is constructed unconditionally (noop exporter without a collector — spans still exist for in-process propagation); the new BridgeBusToTracer starts alongside the metrics bridge and both join the closer chain. Span model: canonical lifecycle pairs open/end spans, openers nest under the quadruple's most recent open span (tool under task), failure-suffixed closers set span status Error, non-lifecycle events attach as span events (standalone instantaneous spans when no span encloses — nothing silently dropped), and stop ends still-open spans. The brief 06 cardinality split is enforced by construction: metrics keep Type/Producer/Node labels only; identity + run IDs ride on spans. DefaultTraceBridgeFilter() (Admin + lifecycle types only) is the production volume guard — chunk-grade events never become span traffic. The run-error handler is Stack.RunErrorHandler (RunError → Telemetry.Error → paired runtime.error); flow.WithRunErrorHandler is the compose-time pass-through, exercised end-to-end by the flow-as-tool failure E2E. The boot-window posture stands: Options.Logger (bare slog) remains the bootstrap/wiring logger for the pre-redactor window; nothing identity- or payload-shaped is logged there.

Decision (approval half). ApprovalGate.ResolveApproval hard-required internal/protocol/auth scopes, which forced the runtime's own steering bridge to SELF-ELEVATE with protocol scopes to call its own gate (apply.go) — wire vocabulary inside an in-process control path, the audit-§4 tell that the check sat one layer too low. The privilege decision becomes the injected GateDeps.Authorizer seam (interface form, AuthorizeResolve(ctx, PendingInfo) error — PendingInfo carries Tool/Token/Identity/Tags, never arg bytes). Settled calls:

  1. The package default speaks runtime vocabulary. IdentityAuthorizer: the resolving ctx carries the pause's ORIGINATING identity tuple, or the elevated control-scope claim. The control-scope claim REUSES internal/runtime/registry.WithControlScope (evaluated per the plan and preferred over minting a new claim shape: one trust-based-in-V1 elevation vocabulary, same audit posture as the Agent Registry's fleet-control commands). The originating-identity arm is the steering bridge's shape — the run resolving its own gate after the Phase 54 edge already vetted the wire caller's RFC §6.3 steering scope (CheckScope: APPROVE/REJECT need owner-user-or-admin) — so the bridge's self-elevation block is DELETED outright, not relocated. Deliberate, documented permissiveness delta: a DIRECT in-process caller presenting the originating identity (no scopes) could not resolve pre-seam and now can; that is the SDK-consumer story (a headless embedder resolves its own approvals in the identity vocabulary it already has) and it is not weaker on any wire-reachable path — the Protocol edge's scope checks are untouched, and the Coordinator's Resume-side identity equality check still applies after the authorizer (defence in depth, asserted by the integration matrix).
  2. The protocolauth check moves OUT, one-way. server.ProtocolScopeAuthorizer (the Runtime's network-surface package) preserves the pre-seam admin / console:fleet acceptance byte-for-byte and falls through to Next (production: the identity default); Next == nil is the strict wire-only posture. cmd/harbor and harbortest/devstack inject server.NewProtocolScopeAuthorizer(approval.NewIdentityAuthorizer()) at gate assembly (assemble.Options.ApprovalAuthorizer; nil defaults to the identity authorizer for headless embedders). internal/tools/approval no longer imports internal/protocol/auth; ErrApprovalScopeRequired is replaced by ErrResolveForbidden (same fail-closed posture, runtime vocabulary).
  3. Nil authorizer fails loud (ErrAuthorizerRequired at gate construction; catalog.ErrAuthorizerRequired at Builder validation) — an approval gate with no resolve privilege check is a misconfiguration, not a permissive mode.
  4. Ordering note: ResolveApproval now locates the pending entry BEFORE authorizing (the authorizer needs PendingInfo) and reserves it only after authorization succeeds, so a rejected resolver never mutates pause state; tokens are unguessable Coordinator-minted handles, so the lookup-first not-found answer leaks nothing actionable.

The direction rule (recorded). Runtime packages may import internal/protocol/types (pure data projection); they must never import protocol auth / methods / transports (behaviour). The gate's import was the one standing violation of the otherwise-clean direction check (audit §4); this phase repays it. Mechanical tripwire: the phase-111f smoke greps internal/tools/approval for protocol/auth and apply.go for protocolauth.WithScopes. A depguard rule encoding the full direction rule is the noted follow-up (deliberately out of this phase's scope).

Addendum (2026-06-10, Wave C checkpoint audit). The "otherwise-clean / one standing violation" sentence above was too strong: the repo-wide grep finds two further internal/protocol/{auth,methods} imports outside internal/protocol + internal/server + cmd. (a) internal/runtime/flow/protocol/catalog.go imports internal/protocol/methods — this is the <area>/protocol adapter shape the SDK friction audit (docs/notes/sdk-friction-audit.md §7) blessed as an honest one-way adapter; that carve-out is now RECORDED as part of this rule (it previously lived only in the audit notes): a dedicated <area>/protocol subpackage whose whole purpose is the protocol projection may import internal/protocol/methods. (b) internal/search/scope.go imports internal/protocol/auth (Phase 72c's AdminScopeFromAuth) — the area package itself, NOT an adapter subpackage: the exact pre-seam approval-gate shape this phase relocated. RESOLVED (2026-06-10, program follow-ups chore): the predicate relocated to internal/server/search_scope.go as server.SearchAdminScopeFromAuth (the ProtocolScopeAuthorizer precedent); internal/search/scope.go is deleted, cmd/harbor injects the server-owned checker at both Phase 72c construction sites, and the phase-111f smoke's direction-rule tripwire now greps internal/search too. No standing violation remains ahead of the depguard follow-up. The glossary entry mirrors both clarifications.

§4.3 deviations. (a) Options gains TelemetryOptions / TracerOptions (the MetricsOptions precedent: the integration tests are real consumers needing the writer / in-memory-exporter seams) and ApprovalAuthorizer — still the union of real-caller needs, no speculative surface. (b) The plan's "Protocol-side adapter injected at server-side gate assembly" lands as the assemble.Options injection point because gate assembly lives in the ONE fan-out (D-197) — the adapter is still owned by internal/server and only the serving callers inject it. (c) engine/options.go's godoc is updated (not merely "now true"): the Wave A honesty text explicitly said "no production assembly installs one today", which became false.

Why. D-193's program is "the layer below is right, the assembly never happened" — this phase is its observability + layering capstone: three shipped-but-consumerless primitives (telemetry.New, WithRunErrorHandler, NewTracer) gain their production consumers in the same phase as the new bridge (§13 primitive-with-consumer), and the one runtime→protocol-behaviour import is repaid with a seam that makes the gate cheaper to construct headless, not just cleaner. docs/recipes/observe-an-embedded-runtime.md stops being unwritable (audit §7) because phase111f_telemetry_test.go executes its path.

Findings I'm departing from. None.

Protocol additions. None — wire behaviour for APPROVE/REJECT is unchanged (the Phase 31/54 suites pass with only the construction-site mechanical updates); zero schema changes.

Cross-references. D-193 (the re-homing program; Wave C item 6), D-197 (the assembly site), D-192 (the bridge dispatch whose self-elevation this deletes), D-097 (the steering→gate bridge, option A unchanged), D-096 (the typed pause.resumed Decision the E2Es assert), D-082 (the metrics bridge — the symmetry target), D-020 (redaction fail-loud), D-025 (bridges + gate concurrent-reuse), D-090 (catalog Builder — Deps.Authorizer), D-059/D-124 (the Agent Registry whose control-scope claim is reused). CLAUDE.md §5 (logging canon), §6 (identity), §13 (primitive-with-consumer; fail-loud; two-implementations), §17.3/§17.6. RFC §6.14, §6.4, §5.1, §6.3. Plan: docs/plans/phase-111f-telemetry-assembly-approval-seam.md; findings: docs/notes/sdk-friction-audit.md §3, §4, §7.


D-204 — Wave D: the public SDK facade — a top-level sdk/ tree of alias-based re-exports makes RFC §1's "Go module" claim true for external teams; scaffold output must compile externally, gated by a standing smoke

Date: 2026-06-10 Status: Settled (planning — implemented by Phases 112a/112b; D-205/D-206 reserved)

Where it lives: RFC §3.6 (the settled design, added in this PR); docs/plans/phase-112a-sdk-facade.md + phase-112b-external-consumers.md; the SDK friction audit's external-surface findings (docs/notes/sdk-friction-audit.md §5 — scaffold-with-tools cannot compile, harbortest's vocabulary is externally unconstructible, README presents the test kit as the runtime library).

Decision. A new top-level sdk/ package tree (the harbortest/ Phase 71 precedent for escaping internal/; the pkg/ convention was already rejected there) re-exports the curated public surface via type aliases, re-exported constants/sentinels, and thin forwards. internal/ remains the implementation home — an alias IS the internal type, so no mechanism is duplicated, no types fork, and interface satisfiability crosses the boundary for free. The facade is the API-stability contract: re-exported = supported; omitted = deliberately private. The V1.2 inventory is RFC §3.6's list — exactly the audited set that templates, recipes, and devstack already assumed public. Phase 112a ships the tree + an in-module facade-integrity test (every re-export resolves; the curated surface compiles); Phase 112b converts the external consumers — scaffold templates emit sdk/ imports, harbortest's parameter vocabulary becomes externally satisfiable through the aliases, consumer-facing recipes/README flip to the public paths — and lands the standing external-module compile gate (scaffold a tool-declaring agent into a temp module, go build it) so the audit's headline external break cannot silently return.

Why aliases, not moves. Physically relocating packages out of internal/ would churn every import in the repo, break the §3 layout's contract that internal/ is production code's home, and turn every future internal refactor into a public API event. Aliases give a curation point instead: the public surface is chosen line-by-line, internal packages keep full freedom behind it, and the facade's godocs become the external documentation surface.

Numbering. D-205 reserved for 112a; D-206 for 112b.

Cross-references. Builds on D-193 (the program; Wave D was explicitly gated on Wave B's re-homing — "you cannot facade what lives in a binary"), D-197 (the assembly the facade exposes), D-085/Phase 71 (the top-level-package precedent), the §13 primitive-with-consumer rule (112a's facade ships with 112b's consumers in the same wave). RFC §1, §3.6 (new), §5.3 (deprecation posture). CLAUDE.md §3 gains sdk/ with 112a (the implementation PR carries the AGENTS/CLAUDE amendment, mirror-gated).


D-205 — Phase 112a: the public SDK facade shipped — the sdk/ alias tree per RFC §3.6, forwards-only with one documented generic-wrapper carve-out; parity-by-construction aggregator; the integrity test runs the headless recipe through the facade with zero internal/ imports

Date: 2026-06-10 Status: Settled (shipping with Phase 112a)

Where it lives: sdk/doc.go (the tree-level contract statement) + the twenty inventory packages sdk/{identity,events,config,tools,tools/inproc,tools/builtin,llm,memory,state,artifacts,skills,planner,planner/react,planner/deterministic,tasks,steering,dispatch,runctx,assemble,drivers/prod}; test/integration/phase112a_sdk_facade_test.go (the facade-integrity test); scripts/smoke/phase-112a.sh; AGENTS.md/CLAUDE.md §3 (the sdk/ layout entry, mirror-gated); docs/glossary.md ("SDK facade").

Decision. D-204's design lands exactly as settled: every sdk/<area> package is alias-based re-exports of its internal/<area> counterpart — type X = internal.X aliases, const X = internal.X re-exports, and var Open = internal.Open function/variable forwards — with package godoc naming the internal home and the curation contract (re-exported = supported; omitted = deliberately private). Calls made while shipping, recorded so 112b inherits them:

  1. Forwards-only, one carve-out. The facade declares exactly ONE func body: sdk/tools/inproc.RegisterFunc, a thin generic wrapper — Go has no generic function values, so a var forward cannot express it. The wrapper's signature uses the sdk/tools aliases (identical types), and the smoke's no-behavior guard greps that no other func exists anywhere under sdk/.
  2. Curation posture. The demand signal was the headless recipe + the consumer-facing recipes + harbortest's vocabulary: driver Open/OpenDriver/RegisteredDrivers + ctx helpers + sentinel errors + the operator-buildable config/section types are IN; driver Register* factories (except events.RegisterEventType, which an embedder publishing custom event types genuinely needs, and planner.Register/MustRegister, the external-planner-author seam the swappable-planner property promises), wire/Protocol adapters, event payload structs, wrapper-hook registration, and the conformance kits are OUT. steering.NewRunLoop and the RunLoopOption set stay internal: the constructor's signature names pauseresume.Coordinator, a type the facade deliberately does not export — the production RunLoop reaches embedders as assemble.Stack.RunLoop.
  3. sdk/drivers/prod parity is by construction. The public aggregator's ONLY content is _ "github.com/hurtener/Harbor/internal/drivers/prod" — init() transitivity seats the identical registration set, so the two aggregators cannot drift (the smoke pins the single-import shape; the integrity test additionally asserts the expected production driver names through each facade's RegisteredDrivers). The mock LLM stays out, same as the internal aggregator (D-089).
  4. The integrity test is the facade's first consumer (§13). It re-expresses the headless recipe exclusively through sdk/ imports (grep-gated: zero internal/ imports): Defaults → ValidateCore → sdk/drivers/prod → Assemble → an in-proc tool registered via the facade → RunLoop → AnswerEnvelope → Close, with two-identity isolation at the bus, the fail-closed incomplete-identity gate, the missing-LLM-config fail-loud failure mode, and a concurrent slice against the one shared stack. The LLM block uses a custom-provider entry (loopback BaseURL, env-var dummy key) so the REAL bifrost driver constructs offline, and assemble.Options.PlannerOverride injects a deterministic planner (a shipped production concrete, not a stub) — the recipe path runs in CI with no network and no mock-driver import. A compile-coverage block references every exported facade name (575 aliases/consts/forwards at ship), so a re-export that stops resolving breaks the build — the plan's ≥95% integrity-completeness bar, satisfied at 100%.

Cross-references. D-204 (the wave decision this implements), D-196 (the internal aggregator), D-197 (the assembly the facade exposes), D-103 (the planner driver registry now externally reachable), D-089 (the mock exclusion), D-085/Phase 71 (the top-level-package precedent). RFC §3.6, §1. Phase 112b consumes this surface (scaffold templates, harbortest vocabulary, recipes/README, the external-module compile gate).


D-206 — Phase 112b: external consumers on the sdk/ facade + the standing external-module compile gate — Wave D complete; the SDK friction audit's §5 external findings are closed

Date: 2026-06-10 Status: Settled (shipping with Phase 112b)

Where it lives: cmd/harbor/scaffold/templates/minimal-react/ (the sdk/-importing templates); scripts/smoke/phase-112b.sh (the standing gate); harbortest/doc.go (the external-usage contract); the five consumer-facing recipes + README.md + docs/recipes/README.md; the 112b facade additions sdk/{audit,telemetry,telemetry/eventbus,governance,tools/auth,skills/importer,skills/tools,skills/generator} + sdk/tools.ErrorClass; RFC §3.6 item 3 (inventory amended); docs/plans/phase-112b-external-consumers.md.

Decision. Everything that pretends to be external now IS external, and a standing preflight gate keeps it true:

  1. Scaffold templates emit sdk/ imports. agent.go.tmpl imports sdk/tools + sdk/tools/builtin + sdk/tools/inproc; the one behavioral substitution is builtin.Register (deprecated, not re-exported by 112a's curation) → builtin.RegisterWith(builtin.RegistryContext{Catalog: cat}, ...), semantics identical for the catalog-only shape. A tool-declaring scaffold (--from-config, ≥1 built-in + ≥1 custom tool) compiles and TESTS green as an external module — the audit's headline external break (§5: "the product's own golden path is broken for its advertised audience") is closed.
  2. The standing external compile gate (scripts/smoke/phase-112b.sh, preflight unit-tests class): scaffolds the tool-declaring shape into a temp dir, appends a replace directive, go mod tidy && go build ./... — FAIL on compile error; bounded (240s default, ~1–2s warm) and self-tested (a deliberately-broken injected file must fail the same step). Phase-67's smoke keeps the TOOLLESS build-check; the tool-declaring shape is owned here, not duplicated (§4.3 call recorded in both scripts and the plan).
  3. harbortest vocabulary is externally satisfiable through aliases — signatures unchanged, zero kit constructors. The audit's three type-poisoned surfaces (Deps.{Bus,Redactor,Identity}, AssertSequence's []events.EventType, NewFaultInjector's tools.ToolCatalog) resolve via sdk/events + sdk/audit + sdk/identity + sdk/tools; an external Agent emits events and reads identity via the same aliases (the "EventLog is structurally empty" finding is dead). The smoke's second external module proves it by RUNNING go test, not just compiling.
  4. Facade additions, flushed out by the conversions (RFC §3.6 item 2 makes additions cheap; item 3 amended): sdk/audit (events.Open and harbortest.Deps both demand a Redactor — the facade's one genuinely-unconstructible mandatory parameter), sdk/telemetry + sdk/telemetry/eventbus (the observe recipe's manual chain), sdk/governance (the D-198 multi-stack path in the headless recipe), sdk/tools/auth (the headless OAuth callback mount — the only externally-unreachable half of a shipped choreography), sdk/skills/{importer,tools,generator} (D-201's "one skills surface" read externally), and sdk/tools.ErrorClass (+ the four class constants, for SimulateFailure). All forwards-only; the phase-112a no-behavior guard still holds.
  5. What was deliberately NOT added: sdk/pauseresume. D-205 settled the Coordinator as facade-private; the steer-and-resume recipe was reworked to the config-driven shape (cfg.PauseResume.* + assemble.Assemble, which wires bus/checkpoint-store/max-park and starts the sweeper) instead of re-litigating the curation.
  6. Docs flipped truthfully. The five consumer recipes + the README's "runtime library" section import sdk/ only (grep-gated by the smoke: zero hurtener/Harbor/internal in any of the six); the in-module-only scope notes are gone where no longer true. §18 sweep: add-an-in-process-tool rewritten to the real RegisterFunc surface (its worked example predated it), scaffold-a-harbor-agent's output tree corrected to the real scaffold shape.

Program closeout. With this phase, Wave D — and the SDK re-homing program (D-193, D-204) — is complete. The friction audit's §5 external-surface findings (scaffold-with-tools cannot compile; harbortest type-poisoned externally; README presenting the test kit as the runtime library; recipes teaching internal/ imports to external readers) are all closed, and scripts/smoke/phase-112b.sh is the standing guarantee that the class of breakage cannot silently return: every preflight compiles a scaffolded external module and runs an external harbortest probe against the live tree.

Cross-references. D-204 (the wave decision), D-205 (the facade + its curation calls, inherited here), D-201 (one skills surface), D-198 (multi-stack governance), D-199 (OAuth completion), D-196/D-197 (aggregator + assembly), D-089 (the mock stays out — the headless recipe now documents the loopback-custom-provider + PlannerOverride offline shape instead). RFC §3.6 items 2–5, §8. docs/notes/sdk-friction-audit.md §5.


D-207 — Program follow-ups: the StateStore maintenance scan + crash-orphan checkpoint sweep, per-Open :memory: databases, emit-constructor base-ctx threading

Date: 2026-06-10 Status: Settled (shipped)

Where it lives: internal/state/state.go (StateStore.ListKind + ListScope + ErrMaintenanceScopeRequired + ValidateListKind) with all three drivers (drivers/{inmem,sqlite,postgres} — the SQL drivers escape LIKE metacharacters) + six new conformancetest cases; internal/runtime/pauseresume/sweeper.go (rescanCrashOrphans — the scan's first consumer) + coordinator.go (Resume skips the tool-context re-attach for DecisionTimeout only) + sweeper_test.go; the four sqlite-family drivers' uniqueMemoryDSN (internal/state/drivers/sqlite, internal/artifacts/drivers/sqlite, internal/memory/drivers/sqlite, internal/skills/drivers/localdb); internal/events/emitter.go (IdentityStampingEmitterContext) + internal/llm/chunk_publisher.go (NewChunkPublisherContext) + internal/events/drivers/durable/durable.go (Publish honours ctx.Err()) with both run-loop drivers (cmd/harbor/cmd_dev_runloop.go, harbortest/devstack) passing d.subCtx; sdk/state (ListScope + sentinel aliases); test/integration/d207_program_followups_test.go + the durable driver's cancellation tests; RFC §6.11; D-195/D-200 dated notes flipped to resolved; docs/plans/phase-111c-durable-pause-lifecycle.md deviations note; docs/glossary.md. Findings: docs/notes/sdk-friction-audit.md (the program this closes out).

Decision. The SDK re-homing program (D-193..D-206) deferred three design-shaped follow-ups rather than quietly widening settled seams mid-wave. This change closes all three:

1. The StateStore maintenance scan + the crash-orphan checkpoint sweep (D-200's recorded V1 boundary). state.StateStore gains its ONE cross-identity surface: ListKind(ctx, scope, kindPrefix) returns every record whose Kind starts with the literal prefix, across all identities. The elevation is explicit and fail-closed — ListScope{MaintenanceScoped: true} is mandatory (ErrMaintenanceScopeRequired otherwise; the §13 "no cross-session queries without an explicit elevated scope claim" rule applied to the persistence floor), an empty prefix is rejected (ErrInvalidRecord — a whole-store dump is never a valid scan), and callers MUST act on each returned record under that record's own identity. There is deliberately NO identity-scoped ListKind mode: identity-scoped reads stay on Load/LoadByEventID, and a second mode would ship without a consumer (§13). All three drivers implement it (no Supports* ceremony, §4.4/§9); the conformance suite gains six cases including the LIKE-metacharacter-literalness trap for the SQL drivers. The first consumer ships in the same change (§13 primitive-with-consumer): every pause-sweeper pass first rescues pauseresume.checkpoint: rows with no live registry entry into the registry (rescanCrashOrphans), re-stamping the expiry from the running Coordinator's own maxPark (the D-200 derived-deadline discipline) — so a crash-orphaned pause is reaped by the unchanged expired-scan + public-Resume path once its deadline passes, and a not-yet-expired orphan becomes legitimately resumable until then. Corrupt checkpoints are loud-skipped and left in the store for the operator, never silently deleted. One consequence on Resume: a DecisionTimeout resume skips the tool-context handle re-attach — timeout is terminal (D-200 call 4: the run finishes with Finish{ConstraintsConflict}; the planner is never re-entered), so the non-serialisable tool half is never needed, and requiring it would wedge crash-orphan reaping forever (a crashed process's handle registry is empty by definition). Every run-continuing decision keeps the fail-loud ErrToolContextLost re-attach unchanged.

2. Per-Open :memory: databases (the cross-subsystem collision named in PR #301 / the 112b workaround comment). Every sqlite-family driver translated the bare :memory: DSN to the PROCESS-WIDE file::memory:?cache=shared database, so two subsystems' parallel :memory: stores collided on one shared schema_migrations table (observed: the artifacts driver-parity test losing its artifacts_blobs migration to a skills store's migration run). Fixed driver-side in all four (state, artifacts, memory, skills/localdb): :memory: now mints a per-Open uniquely named memory URI (file:harbor_<subsystem>_mem_<crypto-entropy>?mode=memory&cache=shared) — shared across the one store's pool (every driver pins SetMaxOpenConns(1), so the pinned connection's lifetime bounds the data's), fully isolated across Opens and subsystems. Same-DSN-reopen sharing was never a documented :memory: contract (verified: no test depended on it — every consumer opens once and passes the handle); the full suite runs green. The 112b integration test's file-backed-DSN workaround is reverted to :memory: with the comment re-pointed at this fix, and a new cross-subsystem isolation test opens state + artifacts + skills :memory: stores concurrently (N=8 each).

3. Emit-constructor base-ctx threading (D-195's dated correction). events.IdentityStampingEmitterContext + llm.NewChunkPublisherContext are the ctx-first variants of the Phase-110b promoted constructors (stdlib CommandContext shape — one implementation, the ctx-less originals delegate with context.Background() and stay as the documented unmanaged-async-boundary bridge for callers with no lifetime ctx). Both run-loop drivers pass their driver-lifetime d.subCtx, restoring the pre-110b cancellation-on-Close semantics the Wave B audit found lost on the durable bus. §17.6 production-side fix found while pinning it: the durable driver's ctx-boundedness silently depended on the configured StateStore driver reading ctx (the inmem store does no I/O and ignores it) — durable.Publish now honours ctx.Err() up front (§5), so the caller-ctx bound holds deterministically across all stores. Tests pin both closures against the real durable driver: live driver ctx persists, cancelled driver ctx stops persistence and Warns loudly.

Why one entry. The three items are one program-closeout unit: each was a deferral the program's decisions recorded with a named boundary (D-200 call 5, D-195 call 4, the 112b test comment), and each lands with its boundary note flipped to resolved in the same change — no orphaned "follow-up" markers survive.

Findings I'm departing from. None. The D-200 deviation note's own instruction ("adding one is a §9 RFC conversation, not a quiet widening") is followed: RFC §6.11 is amended in this change.

Protocol additions. None — all three items are runtime-internal; no Protocol method, endpoint, or wire shape changed (hence no smoke-script delta; the §4.2 rule 2 trigger never fires).

Cross-references. D-200 (the boundary this closes; the derived-deadline + timeout-is-terminal semantics reused verbatim), D-195 (the correction this resolves; the promoted-constructor seam), D-206 (the program closeout this follows), D-027 (the generic StateStore surface ListKind extends), D-067 (StateStore as the checkpoint seam), D-096 (DecisionTimeout), D-110 (the §6-scoped List deliberately NOT widened — ListKind is a different, store-level surface with its own explicit claim), D-025 (drivers + Coordinator stay compiled artifacts; conformance concurrency suite unchanged and green), D-020 (redaction stays upstream of Save — ListKind returns opaque bytes). CLAUDE.md §4.4, §5, §6 rule 5, §9, §11, §13 (primitive-with-consumer; elevated scope claim), §17.6. RFC §6.11, §3.3, §6.3.


Date: 2026-06-10 Status: Settled (shipped)

Where it lives: docs/site/ (the VitePress project: package.json + committed lockfile, .vitepress/config.ts, the landing page, and one include-stub page per operator skill / recipe / reference doc); .github/workflows/docs.yml (build on every PR + push, deploy to GitHub Pages on main only, permissions-minimal per the official deploy-pages template); Makefile (docs / docs-install); scripts/smoke/phase-103.sh; CLAUDE.md/AGENTS.md §18 (the navigation-manifest drift rule); docs/glossary.md ("docs site"); docs/plans/phase-103-github-pages-docs-site.md.

Decision. Harbor's published docs site is the sibling Dockyard project's mechanism, ported: VitePress under docs/site/, DOCS_BASE read from the environment so CI pins the Pages path without hard-coding the repo name, make docs as the local equivalent of the CI build, and the VitePress build doubling as the link-check gate on every PR. Three calls worth recording:

  1. The site renders FROM the repo; it never forks it. Every published page is either the landing page or a one-line <!--@include: …--> stub over a canonical in-repo file (skills, recipes, CONFIG, glossary, decisions, RFC, master plan, productionization playbook, changelog). Drift between source and site is impossible by construction; the cost is a navigation manifest (config.ts + the stub tree) that must move when a skill or recipe moves — pinned by the §18 extension and by scripts/smoke/phase-103.sh, which fails preflight when a repo skill/recipe has no site page.
  2. The dead-link gate is scoped, not blanket. Canonical docs legitimately link into the repo tree (Go packages, scripts, phase plans, examples) — paths that exist on GitHub but are not site pages. ignoreDeadLinks carries one carve-out function for repo-tree pointers; everything else stays fatal, including cross-skill links (../<slug>/SKILL.md), so a renamed skill referenced by a sibling fails the build. A blanket ignoreDeadLinks: true is rejected by the phase smoke.
  3. Vue-template hazards are neutralised config-side, not by editing canonical docs. The canonical docs quote Go text/template syntax ({{ .Args.city }}) and angle-bracket placeholders (<slug>) in prose; VitePress compiles markdown as Vue SFCs and fails on both. The site config sets markdown.html: false (no included doc uses raw inline HTML) and adds v-pre to every inline code span. The canonical files are untouched.

Findings I'm departing from. The phase plan's dependency ordering ("Land Phase 102 first so the site renders the cleaned godoc cross-links"). Phase 103 ships ahead of 102 for the v1.3.0 cut: the site's pkg.go.dev links work regardless — 102's godoc-jargon cleanup improves what pkg.go.dev RENDERS, not whether the site's links resolve — and the published site is the adoption surface the release needs now. When 102 lands, no docs-site change is required. Also narrowed from the plan: no dedicated "Releases"/"Contributing" site pages (the changelog page + GitHub serve both); no theme customisation beyond VitePress defaults (Dockyard's theme is also stock; "matching posture" is satisfied by the shared mechanism).

One manual step. The repository's Pages settings must be flipped to "Source: GitHub Actions" once (Settings → Pages → Build and deployment) before the first deploy succeeds.

Cross-references. RFC §1, §7, §12; brief 13 (operator UX — the published-site adoption signal), brief 06 (devx); D-137-equivalent posture at Dockyard (VitePress, docs/site/, DOCS_BASE); CLAUDE.md §18 (the same-PR drift rule the navigation manifest joins).


D-209 — Phase 113a: the Protocol adoption track opens — generated contract reference (cmd/harbor-gen-protocol-docs + the protocol-docs-gen-check gate), the executed quickstart, choreographies 1–3

Date: 2026-06-10

Status: Settled (shipping with Phase 113a)

Where it lives: cmd/harbor-gen-protocol-docs/ (the generator: method/route/wire-type join table, the eventPayloadIndex, the typeInstanceIndex, the error-guidance table — each pinned by a lockstep test); docs/site/protocol/ (the four generated pages methods.md / events.md / errors.md / types.md + the hand-written index.md / quickstart.md / auth-and-identity.md / streaming-semantics.md / task-control.md); Makefile (protocol-docs-gen / protocol-docs-gen-check); .github/workflows/docs.yml (the gen-check before the VitePress build; the job now carries the Go toolchain); docs/site/.vitepress/config.ts (the Protocol nav section); scripts/smoke/phase-113a.sh (site trip-wires + the executed quickstart); internal/protocol/transports/control/status.go (HTTPStatus, exported); AGENTS.md/CLAUDE.md §18 (the Protocol-docs regeneration clause, mirror-gated); README.md (the Docs-table Protocol row); docs/skills/use-the-harbor-protocol/SKILL.md (the D-093/D-132 claim correction); docs/glossary.md (three terms).

Decision. The Protocol's adopter-facing documentation is generated from the same canonical sources the Runtime compiles from, and gated so it cannot drift — the house single-source discipline (D-072/D-082) applied to published docs, using the gate SHAPE D-093 specified for the TS client generator (git diff --exit-code after regeneration) but built for a generator that actually exists. The D-093 TS generator stays deferred (D-132 / issue #179); nothing here blocks on it, and its future implementation can reuse this phase's reflection plumbing. The proposal's four open questions land per the owner's resolutions recorded in the phase plan: Q1 the event catalog is registry-read at gen time — the generator blank-imports internal/drivers/prod and reads events.EventTypes(), with payload shapes joined by a generator-side eventPayloadIndex pinned in lockstep; Q2 OpenAPI emission deferred (nothing in the generator precludes it); Q3 conformance sdk-export waits for a real third-party ask (113b documents the in-repo path); Q4 versioned docs deferred to the first breaking Protocol change.

Calls made while shipping, recorded so 113b inherits them:

  1. The lockstep mechanism is four tables, each pinned by a test. methodTable() (route derived from the transports' exported *RoutePattern constants — never hand-typed paths; the nine steering controls' scope column rendered via steering.RequiredScope so the docs publish the binding the inbox enforces), eventPayloadIndex (event type → payload reflect.Type(s) or an explicit no-typed-payload note; audit.admin_scope_used legitimately carries TWO shapes — the bus admin-filter emit and the Protocol-edge impersonation emit), typeInstanceIndex (CanonicalWireTypes name → live reflect.Type, since singlesource records names only), and errorTable (when-it-fires + retry guidance; the HTTP column reads the newly-exported control.HTTPStatus so the docs and the wire transport share one binding). A registry gaining an entry any table lacks fails go test — the TestSingleSource_CanonicalMethodsInLockstep mechanism, four times over.
  2. The registry-read blind spot is closed by a source scan. A subsystem registering an event type from a package the generator does not import would be invisible to events.EventTypes() in the generator binary. TestGen_EventConstantsAllRegistered walks internal/ for typed EventType = "..." constant declarations and asserts each is registered AND indexed in the generator binary — so the import list in cmd/harbor-gen-protocol-docs/events.go cannot silently rot. Two payload-home imports reach into driver packages (distributed/drivers/loopback, tools/drivers/mcp) to read exported payload types those drivers declare; each carries a one-line comment naming why (the §4.4 carve-out hygiene; prod already seats loopback's registration, and the import adds no second registration path).
  3. control.HTTPStatus exported (§4.3 deviation, plan did not name it). The error page's HTTP column must come from the binding the transport serves, and the previous httpStatus was unexported. Renamed-with-export + the one call site updated; no behavior change, no Protocol-surface change.
  4. The executed quickstart's steering step accepts both documented outcomes. Against the preflight mock-LLM dev server a demo run reaches a terminal state in milliseconds, so a post-hoc cancel deterministically returns the canonical 404 not_found envelope (steering targets live inboxes; the inbox is registered and torn down with the run). The page teaches exactly this — acknowledgement-vs-effect, controls-target-live-runs — and the smoke accepts 200 {"accepted":true} (run still live; the real-provider path) OR 404 {"code":"not_found"} (terminal; the mock path), each with its shape asserted. The deterministic not_found leg doubles as the §17.3 failure-mode requirement.
  5. The smoke executes the page, not a copy. scripts/smoke/phase-113a.sh extracts the quickstart's five <!-- qs-step: ... -->-tagged bash blocks and sources them in order in one shell (variables flow block-to-block exactly as for a reader), asserting status + JSON shape per step against HARBOR_BASE_URL. The tag count is load-bearing (≠ 5 fails loudly). The recipe-cannot-lie pattern from embed-harbor-headless (D-197), applied to curl.
  6. Generated markdown is markdownlint-clean and deterministic. Sorted iteration everywhere; two consecutive runs are byte-identical (pinned by test) — the git diff --exit-code gate depends on byte-stability. The pages live directly under docs/site/protocol/ (not include stubs — there is no canonical prose to mirror; the generator IS the source) and are committed.

Findings I'm departing from. None beyond call 3 (recorded §4.3 deviation).

Cross-references. D-093 (the gate shape), D-132 (the TS-generator deferral this phase corrects the skill's claim about), D-072/D-082 (the single-source registries read), D-171 (the session model choreography 1 documents), D-105/D-106/D-079 (the streaming + scope semantics choreography 2 documents), D-070/D-066 (the steering taxonomy choreography 3 documents), D-196 (the prod aggregator the Q1 registry-read imports), D-197 (the recipe-cannot-lie precedent), D-208 (the docs site + §18 manifest rule this extends). RFC §5.1–§5.5, §3.6. CLAUDE.md §4.2, §13 (primitive-with-consumer: the generator's first consumers — the gate, the smoke, the choreography lockstep greps — ship in this phase), §18. Phase 113b consumes this surface (choreographies 4–5, build-a-client, certification).

Addendum (2026-06-11, Protocol-track §17.5 checkpoint audit). The Auth column was the one join the four lockstep tables did NOT pin, and it had drifted: a hand-typed shared note claimed admin-or-console:fleet cross-tenant fan-in on ~10 rows whose handlers consult ScopeAdmin only (tasks.list, pause.list, topology.snapshot, flows.list / flows.runs.list) or overlay the verified identity with no fan-in at all (tools.list, the eight agents.* reads, memory.get / memory.health), while the seven posture rows omitted the note where it actually applies (internal/protocol/posture.go admits both scopes). The checkpoint fix pins the cell so it cannot drift like this again: each row now carries a machine-readable crossTenantPolicy (none / admin-only / admin-or-fleet / admin-widens — the last for the flows describe/metrics reads whose only elevation is visibility widening), the rendered note derives from that single value, and TestGen_AuthColumnMatchesHandlerGates drives baseline / admin-only / fleet-only tokens with cross-tenant request shapes against a devstack-assembled wire for every row claiming a rejecting gate, asserting the observed accept/reject matches the policy (the probe map is itself lockstep-checked both directions). Two ride-alongs: harbortest/devstack gained the production search.* + artifacts-surface mux mounts it had silently omitted (tests-track-production, §17.6 — the probe surfaced the gap), and auth-and-identity.md's scope table now states console:fleet's full real grant set with the admin-only fan-ins called out. The events.md notes for runtime.warning and task.paused / task.resumed were tightened to name their no-production-emit reality, and the 113a quickstart-smoke criterion wording was amended (§4.3, recorded in the plan) to match the shipped inline-jq shape-assertion mechanism.


D-210 — Phase 113b: the Protocol adoption track closes — pause + versioning choreographies (captured wire truth), the SDK-free worked event-viewer (compile-gated), the in-repo conformance-certification path (Q3 honored)

Date: 2026-06-11

Status: Settled (shipping with Phase 113b)

Where it lives: docs/site/protocol/ (the four hand-written closers: pause-model.md / versioning-and-compatibility.md / build-a-client.md / conformance-certification.md, plus the completed-track updates to index.md / quickstart.md / task-control.md); examples/protocol-clients/event-viewer/ (the worked client — package main, stdlib-only); docs/site/.vitepress/config.ts (choreographies 4–5 + the "Adopt" nav group); scripts/smoke/phase-113b.sh (page/nav trip-wires, the lockstep greps, the Q3 guard, the worked-client compile gate); docs/skills/use-the-harbor-protocol/SKILL.md (the §18 sweep — pause + versioning/handshake sections reconciled to the real wire); docs/glossary.md (two terms).

Decision. The track's second half documents shipped mechanics only — zero Protocol surface added — and holds the 113a honesty bar: prose that demonstrates the wire must be pinned to the wire. Calls made while shipping:

  1. The pause guide quotes captured traffic, not freehand prose. The HITL approve / reject / DecisionTimeout legs (SSE frames, pause.list snapshots, control request/response pairs) were captured from a runtime assembled with the production drivers (harbortest/devstack — the same assembly harbor dev boots) running a deny-all-gated tool driven through the production dispatch path (the D-192 mid-step drain). The OAuth-callback leg's shapes are transcribed from the handler + its tests (internal/tools/auth) — a live capture needs a real authorization server — and the page says so explicitly. The capture harness was a throwaway (the standing gates are the 111b/111c E2Es plus this phase's lockstep greps; a committed capture test would be a second harness for an already-gated surface).
  2. The lockstep mechanism extends to events, Decision values, and the callback route. The 113a "> Methods demonstrated:" grep convention continues on the three new method-demonstrating pages; the smoke additionally pins every pause event the guide narrates to its catalog section heading in the generated events.md, the four taught Decision branches to the literal Decision = "<value>" declarations in internal/runtime/pauseresume/decision.go, and the quoted OAuth callback route to the exported auth.CallbackPath constant. One §4.3 deviation from the plan's smoke sketch: the callback route canNOT lockstep against the generated reference — it is a provider-redirect mount, deliberately not a canonical Protocol method, so it has no methods.md row; the source-constant grep is the honest equivalent trip-wire, and the guide states the route's non-method status.
  3. A documentation-honesty fix shipped with the pause guide (§17.6 posture). task-control.md claimed a pause control's effect surfaces as task.paused / task.resumed; in the shipped runtime nothing calls MarkPaused/MarkResumed on the live pause path — a parked run's task status stays running, and the park/wake narration is pause.requested / pause.resumed. The line is corrected and the pause guide documents the real semantics (pause.list, not a task-status filter, is the "what awaits a human" read).
  4. The worked client is in-module and stdlib-only. examples/protocol-clients/event-viewer/ compiles as part of the repo module (examples/ is in-module by layout convention), so the compile gate is a direct bounded go build — the 112b external-module ceremony is unnecessary AND would be dishonest here: the client's premise is zero Harbor imports (the smoke asserts grep-absence of hurtener/Harbor in the source). The guide quotes the gated file and walks its three moves (token → runtime.info handshake with major-pin + capability check + unknown-field tolerance → generic SSE tail).
  5. Q3 honored verbatim. The certification page documents internal/protocol/conformance as run-from-a-clone (go test -race ./internal/protocol/conformance/), the Factory seam as the certify-your-own-assembly path, and the precise pass-claim (wire-level compatibility with the pinned Protocol version across both consumer profiles; nothing about behavioral quality, vendor extensions, or operational properties). It explicitly states no importable package / standalone runner exists and routes third-party demand to the issue tracker — the demand signal the sdk-export decision (proposal Q3) waits on. The smoke's Q3 guard asserts grep-absence of sdk/ phrasing on the page.
  6. The §18 sweep went one honest step past cross-links. use-the-harbor-protocol's pause section (tasks.pause / tasks.resume JSON-RPC shapes that never shipped) and handshake section (a fabricated capability-map response) are exactly the surfaces this phase's pages now document authoritatively; leaving them stale while publishing the truth one click away is the §18 failure mode verbatim. Both sections are reconciled to the real wire (POST /v1/control/{pause,resume,approve,reject}, the real RuntimeInfo shape, the 404/405/501-degrade posture). Sections tied to other surfaces (start/events/artifacts/topology recipes, which predate the track and carry their own drift) were left for a dedicated skill-reconciliation pass — noted here so the gap is on the record, not silent.

Findings I'm departing from. None beyond call 2's recorded smoke-sketch deviation.

Protocol additions. None — no method, error code, event type, wire type, or capability changed; make protocol-docs-gen-check is untouched and green.

Cross-references. D-209 (the track's first half; the lockstep conventions continued), D-080 (the conformance suite documented), D-200/D-096 (durable pauses + the typed Decision the guide teaches), D-199 (the OAuth completion leg), D-192/D-097/D-071 (the approve/reject mechanics the captures exercise), D-171 (the session model the worked client rides), D-132 (the hand-maintained protocol.ts described accurately, again), D-206 (the compile-gate precedent, miniaturized). RFC §5.1–§5.5, §3.3, §6.3, §6.4. CLAUDE.md §4.2, §13 (the worked client is the build-a-client guide's same-wave consumer; the §18 clause), §17.6, §18.

Addendum (2026-06-11, Protocol-track §17.5 checkpoint audit). Three honesty tightenings landed at the checkpoint. (1) pause-model.md's provenance header overclaimed — "every request/response and SSE frame on this page is real wire traffic, captured" was false for the tool.auth_required frame's placeholder values; the header now scopes itself ("except for the OAuth intervention section…") and the OAuth transcription disclaimer explicitly covers the whole leg, frame included (call 1 above already recorded the transcription; the page's blanket claim just hadn't been scoped to match). (2) The "~100-line" event-viewer figure was quoted on six reader-facing surfaces while the shipped file is 158 lines; every quote now says ~150 (or "under 160") — the kind of checkable number the track stakes its credibility on. (3) The §17.6 sweep this phase's own common.sh fix promised is now complete: the || echo "000"-after--w '%{http_code}' shape (curl already prints "000" on connection failure, so the fallback produced "000000" and dodged dead-server SKIP arms) survived at ~45 inline call sites across 13 sibling smokes; all are now || true, the three divergent case-arm postures (000), 000|000000), 000*)) are normalized to plain 000), the 404|405|501) arms across the touched scripts uniformly accept 000 as the SKIP leg, and the affected set was verified server-less (SKIP-not-FAIL everywhere except phase-64.sh's two /healthz/readyz assert_status 200 checks, which stay FAIL on a dead server by the helper's documented design — that smoke is live-server-classed and the preflight guarantees its server). The smoke-sketch line "113a's pages still assert green" is recorded as a §4.3 deviation in the plan (the regression gate lives in phase-113a.sh, which runs in the same preflight fleet).


D-190 — Phase 84c: provider-native multimodal lands as a driver-internal upload — ProviderNative part flag, file_id rewrite inside Complete, identity-scoped cache with a driver-owned lifecycle, llm.provider_file.uploaded observability

Date: 2026-06-11 Status: Settled (shipping with this PR)

Where it lives: internal/llm/llm.go (the ProviderNative bool + ProviderFileID string fields on ImagePart / AudioPart / FilePart, plus FilePart.DocumentType); internal/llm/events.go (EventTypeProviderFileUploaded + ProviderFileUploadedPayload); internal/llm/drivers/bifrost/providerfiles.go (the upload pass applyProviderNative, the providerFileCache LRU+TTL cache with per-key fill coalescing, the Close-time sweep); internal/llm/drivers/bifrost/translate.go::providerFileBlock (the wire-side file_id reference block); internal/planner/disposition.go (EffectiveDisposition now honours provider_native; the provider_native_unavailable degradation vocabulary is retired); internal/planner/multimodal.go::providerNativePart (the per-modality typed-part rendering); docs/recipes/provider-native-attachments.md (the headless recipe).

Decision. The provider_native attachment disposition (84b / D-189) gains its mechanism, and the mechanism is one seam: the LLM driver. When Complete encounters a ProviderNative-flagged part without a ProviderFileID, the bifrost driver resolves the bytes (artifact-store fetch for Artifact-backed parts — fail-loud on a missing store, a missing ref, or a cross-scope ref — or an inline sub-threshold DataURL decode), uploads them via Bifrost.FileUploadRequest, and rewrites the part (copy-on-write; the caller's request value is never mutated) to carry the returned opaque file_id, which the translator emits as bifrost's file-reference content block — the ONE bifrost chat shape that carries an uploaded file_id, so every modality (image / audio / video / document) routes through it and bifrost's per-provider converters re-shape it (e.g. Anthropic source.file_id). LLMClient stays one method (RFC §6.5); the run loop never pre-uploads; planner.InputArtifactView gains NO ProviderFileID. Because the seam is the driver, a library consumer calling llm.Open + Complete with the flag set gets provider-native handling with zero planner, run loop, config file, or Protocol — the headless reachability guarantee the SDK-lens review (C2) demanded.

Priority order (deliberate, D-189). Image first — over-threshold images regain VISION (pre-84c they degraded to a text stub whose artifact_fetch returned raw bytes the model still could not see); then audio and video (perception modalities); application/pdf + documents last, with FilePart.DocumentType disambiguating structured docs — the ref/tool:<name> + retrieval route stays the preferred document path.

The file_id lifecycle is driver-owned, end to end (SDK-lens C3). The cache is keyed (tenant, user, session, content) — the identity TRIPLE is the isolation boundary (a file_id never crosses sessions; the cross-session test pins it) and the content key is the content-addressed artifact ref or a sha256 of inline bytes. TTL expiry (default 1h) and LRU eviction (default capacity 128) best-effort delete the remote file (FileDeleteRequest); Close drains and sweeps everything left, so a headless consumer who never runs a dev loop does not leak provider-side files. Same-key fills are coalesced (a per-key lock) — the concurrent-reuse test surfaced that two simultaneous first-attaches of the same content otherwise double-upload and orphan one remote file. D-025 holds: the cache is internally synchronized and the N=128 concurrent test runs under -race asserting no bleed, bounded uploads, and goroutine-baseline restoration.

Observability is an event, not a task field (SDK-lens C1). The driver emits llm.provider_file.uploaded (identity, provider, model, artifact ref, MIME, modality, file_id, size) on the bus it already holds; cache hits do not re-emit. The Protocol/Console read it from the event stream like everything else; the generated Protocol events reference carries the row (make protocol-docs-gen).

Degradation stays loud and ArtifactStub stays universal. A provider whose file surface returns bifrost's unsupported_operation keeps the part's canonical ArtifactStub rendering and the driver logs a Warn naming provider + modality; any OTHER upload failure fails the call (no silent degradation — CLAUDE.md §13). EffectiveDisposition no longer degrades provider_native — the 84b-era DegradationProviderNativeUnavailable constant and its provider_native_unavailable event vocabulary are retired (the dependent planner / runctx / integration tests were updated in this PR per §17.6).

Edge-guard precision (D-026). A file_id-only part is legal over-threshold — it carries no inline bytes, so findContextLeak has nothing to flag (pinned by test rather than new guard code: the existing guard checks DataURL/text payloads only, which is exactly the right shape). An oversize DataURL riding the same part is still auto-materialized by the safety pass, and a raw heavy sibling text part still trips ErrContextLeak.

Streaming residual closed (RFC §11 Q-3). A provider_native multimodal request with req.Stream=true uploads first, then streams deltas through the Phase 107 path (req.Stream + llm.completion.chunk); pinned by a driver test and a live conformance row.

Conformance. The live matrix gains TestE2E_Bifrost_LiveProviderNativeMultimodal — per-modality rows (image / audio / video / pdf, each against one capable provider, gated by HARBOR_LIVE_LLM + the per-provider key) plus the streaming-with-multimodal row. The video row requires an operator-supplied fixture (HARBOR_LIVE_VIDEO_FIXTURE) — a valid container cannot be synthesized inline.

Deviation from the plan (§4.3, recorded in the plan). The plan's optional run-loop cancel hook ("the cancel path MAY trigger early cleanup through the driver") is NOT shipped: the run loop holds the wrapped LLMClient (governance→retry→downgrade→corrections→safety→driver), so reaching a driver-exported purge method would require threading a forwarding method through five wrapper layers or widening LLMClient beyond one method — exactly the ceremony the SDK-lens review (C3) said to avoid ("Close-time + TTL cleanup needs no new interface method"). The driver-owned lifecycle is the authority and is covered by tests that never touch the run loop, satisfying the acceptance criterion's substance.

Cross-references. D-189 (the split + the policy-not-mechanism principle), D-025 (concurrent reuse — the cache), D-026 (the edge-guard exemption), D-166 (F11 multimodal happy path), D-167 (native tool-calling / 107c streaming vocabulary), D-204 (the sdk facade the recipe imports). RFC §6.5, §6.10, §11 Q-3; brief 03 (provider correction / per-provider shapes), brief 08 (the conformance matrix). Plan: docs/plans/phase-84c-provider-native-multimodal.md; review record: docs/notes/phase-84bcd-sdk-lens-review.md (C1–C5 all addressed).


D-191 — Phase 84d: the Embedder seam + semantic memory & skill retrieval — embeddings land as a standalone §4.4 primitive whose first consumers are opt-in retrieval modes, never a standalone RAG tool

Date: 2026-06-11

Status: Settled (shipping with Phase 84d)

Where it lives: internal/embeddings/ (the Embedder interface + sentinel errors + Cosine + the registry/factory Open(ctx, cfg, deps) with the identity-mandatory guard wrapper + SnapshotFromConfig); internal/embeddings/drivers/bifrost/ (the production driver over the gateway's embedding surface, with the HARBOR_LIVE_LLM-gated conformance probe); internal/embeddings/embeddingstest/ (the deterministic test-grade embedder — never registered, never a default); internal/memory/ (RetrievalMode + Deps.Embedder + the registry guard + MemoryStore.SearchTurns + ErrSemanticDisabled; internal/memory/strategy/semantic.go — the wrapper executor + the memory.vectors StateStore record; the conformance suite's semantic cases, passed by all three drivers); internal/skills/ (RetrievalMode + Deps.Embedder + the registry guard + PathSemantic; internal/skills/drivers/localdb/search_semantic.go); internal/config/ (the embeddings block + memory.retrieval/retrieval_top_k + skills.retrieval + validateEmbeddings's cross-block rule); internal/runtime/assemble/ (Stack.Embedder + Options.Embedder + the Deps threading); internal/drivers/prod (the driver's blank import); sdk/embeddings + the sdk/memory/sdk/skills additions; docs/recipes/embed-and-retrieve.md (+ the docs-site stub/nav); test/integration/phase84d_semantic_retrieval_test.go; RFC §6.5 (the D-191 contract sentence), §6.6, §6.7.

Decision. Harbor's first embeddings capability ships as its own seam, with both §13 consumers in the same PR. The calls that shape it:

  1. Own package, dependency-light Deps. The Embedder lives in internal/embeddings, NOT under internal/llm — an embeddings-only consumer must not inherit the chat client's Deps (artifact store + bus). Deps is empty at this revision (reserved for future governance metering); the factory signature Open(ctx, cfg ConfigSnapshot, deps Deps) mirrors llm.Open so the seam stays shape-compatible. The interface carries Embed(ctx, []string) ([][]float32, error) plus a lifecycle Close — a §4.3 addition over the plan's one-method sketch, mirroring LLMClient: the production driver owns gateway worker pools that must join on teardown (the goroutine-baseline gate).
  2. Identity is mandatory at the Embed edge, enforced by construction. Open wraps every driver in a guard that fails closed on missing-identity ctx (ErrIdentityMissing), rejects empty input before provider traffic, and checks the returned shape; the driver re-checks identity defensively for direct constructors — the chat edge's posture, replicated. The seam stays Protocol-free: identity.With/WithRun is the library path. Consumers that carry identity as an explicit argument (memory, skills) bridge it onto the embed ctx so billable embedding traffic is attributed to the identity the derived vectors are scoped under.
  3. Consumer 1 — semantic memory retrieval composes around the strategy, not inside it. retrieval: semantic wraps the strategy executor (semanticExec): AddTurn embeds the turn (embed-first, so a failure surfaces before state mutates) and appends to an identity-scoped memory.vectors StateStore record (bounded at 256 entries, drop-oldest); SearchTurns ranks by cosine; GetLLMContext delegates untouched — rolling_summary keeps its exact semantics. Persisting through the StateStore floor (the D-027 typed-wrapper pattern, a sibling Kind to memory.state) is what makes §9 conformance parity FREE: no per-driver migrations, all three memory drivers inherit vector persistence, and the suite's new semantic cases (Semantic_SearchTurns_RanksBySimilarity, cross-session/cross-tenant vector isolation, Semantic_Flush_DropsVectors, SearchTurns_DisabledFailsLoudly) run against in-mem, SQLite, and Postgres. SearchTurns joins the MemoryStore interface (every driver implements it; the non-semantic answer is the loud ErrSemanticDisabled, never an empty success — the §4.4 no-optional-capability rule, not a Supports* probe). Vectors are derived data: snapshots/restores carry strategy state only, and a dimension mismatch at search time (embedding-model drift) fails loudly with re-embedding named as the migration.
  4. Consumer 2 — semantic skill retrieval is a ranking mode of the store's Search, not a second tool. skills.retrieval: semantic makes localdb's Search rank the identity-scoped catalog (SQL-side WHERE; candidates capped at 256 newest-first) by one batched Embed call + cosine, result path "semantic", scores mapped onto the canonical 0–1 scale. skill_search, the builtin carrier, capability filtering, redaction, and the budgeter are all untouched — they sit downstream of ranking. An embed failure fails the search loudly; the store never silently degrades to the lexical ladder (§13).
  5. Fail-loud injection, three layers deep. The memory and skills registries reject a semantic config without Deps.Embedder (mirroring the Deps.Summarizer rule verbatim — "no stub fallback"); the driver constructors re-check; and validateEmbeddings enforces the cross-block rule at config time — memory.retrieval: semantic or skills.retrieval: semantic without a configured embeddings block fails validation naming the missing keys and pointing at examples/harbor.yaml. There is deliberately NO mock/stub embeddings driver: the deterministic embeddingstest embedder exists for suites only and is never registered.
  6. À la carte is first-class. embeddings.Open + Embed + the shared Cosine (the ONE ranking primitive — a second cosine implementation anywhere is a bug) work with no memory subsystem, no config file, no Protocol; docs/recipes/embed-and-retrieve.md walks it, and assemble.Assemble exposes Stack.Embedder (+ Options.Embedder for caller-owned injection). The future document.search-style tool is a consumer of this same primitive, never a parallel implementation.

§4.3 deviations from the plan. (a) The plan's "blank-import at cmd/harbor" wording predates D-196 — the driver registers via the internal/drivers/prod aggregator, which the binary/devstack/embedders import; the plan file is amended. (b) Close on the interface (call 1 above). (c) The plan's "injected at the skills directory / skill_search constructor" resolved to the store seam (skills.Deps.Embedder + the localdb Search mode): the directory is a recency-ordered browse window where similarity doesn't apply, and ranking at the store keeps one implementation under skill_search, Search, and any future caller. (d) The RFC §6.5 Embedder-seam paragraph itself pre-landed with the D-189 plans PR; this PR's RFC delta is the D-191 contract sentence in §6.5 plus the §6.6/§6.7 consumer-side settled text.

Protocol additions. None — no method, error code, event type, or wire type changed. Semantic retrieval is a runtime/SDK surface at this phase; a Protocol read over SearchTurns is future work that rides the existing memory-protocol pattern when a Console page demands it.

Cross-references. D-189 (the 84b/c/d split + the direction: embeddings serve semantic memory/skill retrieval, not a standalone RAG tool), D-174 (the registry-threaded Deps.Summarizer fail-loud pattern this mirrors), D-027 (typed-wrapper StateStore persistence the vector record rides), D-196 (the prod aggregator home), D-204 (the sdk/embeddings facade), D-025 (concurrent-reuse: the driver + semantic executors are compiled artifacts; N≥100 gates ship in-package), D-001 (identity-mandatory). RFC §6.5, §6.6, §6.7, §9. CLAUDE.md §4.4, §9, §13 (primitive-with-consumer: both retrieval modes ship in this PR, each exercised end-to-end; no stub defaults), §17.1–§17.3, §18. Briefs 04 §retrieval, 08 §driver seam.


D-211 — Phase 84e: the run loop consumes semantic memory — FetchMemoryBlocks populates the External tier; retrieval_min_score floor; D-094 mirror collapsed

Date: 2026-06-12

Status: Settled (shipping with Phase 84e)

Where it lives: internal/runtime/runctx/memory_fetch.go (FetchMemoryBlocks + capText); internal/memory/from_config.go (RecallSettings + RecallFromConfig); internal/config/config.go (RetrievalMinScore field on MemoryConfig); internal/config/validate.go (validateMemory range check); cmd/harbor/cmd_dev_runloop.go (collapsed to thin FetchMemoryBlocks call + memoryRecall field); cmd/harbor/cmd_dev.go (RecallFromConfig projection into opts); harbortest/devstack/devstack.go (D-094 mirror collapsed, same pattern); internal/runtime/runctx/memory_fetch_test.go (unit + concurrent-reuse + fail-loud suite); test/integration/phase84e_semantic_recall_test.go (E2E acceptance); scripts/smoke/phase-84e.sh (real assertions); docs/CONFIG.md / examples/harbor.yaml / cmd/harbor/init/templates/default/harbor.yaml.tmpl (new field documented); docs/glossary.md (Semantic recall term); docs/skills/configure-memory-and-skills/SKILL.md (§18 sweep).

Decision. MemoryStore.SearchTurns shipped in 84d with store/SDK consumers only; the run loop never called it, so the agent never semantically recalled earlier conversation turns. 84e closes this gap. The calls that shape it:

  1. One home: runctx.FetchMemoryBlocks. The fetch+recall step lives in exactly one function — the 110b promoted-helper pattern (BuildArtifactManifest, ProjectMemoryBlocks, BuildSkillsContext are the siblings). Both cmd/harbor/cmd_dev_runloop.go and harbortest/devstack/devstack.go previously held identical ~20-line inline blocks (GetLLMContextProjectMemoryBlocks); both collapse to a thin call. The D-094 "mirror discipline" is honoured by construction: one implementation, two callers, parity enforced structurally rather than by hand.

  2. Composition, not replacement. When recall fires, GetLLMContext is called first (unchanged) and its patch feeds ProjectMemoryBlocks for the Conversation tier; SearchTurns populates the External tier only. The Conversation tier is byte-untouched. Mode off → FetchMemoryBlocks is byte-for-byte identical to the prior inline block — the 84b golden default-parity posture applied to the recall gate. The ONLY enable switch is memory.retrieval: semantic (the 84d seam); no second knob is introduced.

  3. Three filters before injection. SearchTurns results pass through: (a) retrieval_min_score cosine-similarity floor — turns below the configured threshold (default 0.0, range [-1,1], validated at boot) are dropped silently; (b) recent-turn dedup — a turn whose UserMessage is already in GetLLMContext's RecentTurns window is skipped (injecting duplicates wastes tokens and confuses temporal ordering); (c) 2 KiB per-side text cap (capText, valid UTF-8 boundary truncation with a …[truncated] marker) — this is a D-026 first-line guard; the LLM-edge safety pass stays the authoritative backstop.

  4. RecallSettings + RecallFromConfig follow the 110c field-parity pattern. memory.RecallSettings{Enabled, TopK, MinScore} is the typed holder; memory.RecallFromConfig(cfg.MemoryConfig) is the single exporter — the same structure SnapshotFromConfig, GovernanceFromConfig, HintsFromConfig established in the 110c band. A reflection-based field-parity test gates the exporter (per D-155/B3): every MemoryConfig field is either projected by RecallFromConfig or explicitly excluded in the test with a one-line reason comment.

  5. Fail-loud is the ONLY posture. A SearchTurns error (network outage, embedder down, driver closed) is returned to the caller and propagates as MarkFailed(runtime_fetch_error) — the LLM is never called. There is no silent fall-back to rolling-summary-only. A GetLLMContext error likewise propagates. The "no silent degradation" rule (CLAUDE.md §13) is structurally enforced: FetchMemoryBlocks has no catch-and-ignore path.

  6. Deferred: memory.search Protocol method. The run-loop recall surface is a runtime-internal call to SearchTurns; there is no Protocol method for it. A memory.search method is the prerequisite for any Console memory-search page (D-062 ordering rule). It is not introduced here and is parked for a post-109 planning round.


D-214 — Phase 109c: the Playground DisplayMode layout is a pure page-level state machine; pip / fullscreen are mutually-exclusive regions; the host grants modes the page can apply

Date: 2026-06-13

Status: Accepted

Context. Phase 109b shipped the MCP-Apps iframe host + AppBridge (manual-handler, D-173) + the inline renderer registered on the shared chat-renderer registry. The DisplayMode contract (D-062) also defines fullscreen (the app replaces the chat + composer region, addressable via a tab strip — multiple fullscreen apps yield multiple tabs) and pip (a resizable 50/50 chat-beside-app split, right rail hidden by default with a toggle). 109c delivers the Playground page-level layout that honours those two modes, driven at runtime by the AppBridge onrequestdisplaymode request and by operator affordances, without reloading the session.

Decision. The calls that shape it:

  1. The layout is a pure, DOM-free state machine. web/console/src/lib/components/playground/layout.ts exports reduceLayout(model, action) (the reducer) and computeRegion(model) (a total (LayoutModel) → RegionLayout projection), plus clampRatio. The Playground page is the only stateful holder; the machine is unit-tested without a DOM (layout.spec.ts, 19 cases). This keeps the region routing, the split-ratio clamp, and the tab add/remove/activate logic verifiable independent of Svelte/browser.

  2. pip and fullscreen are mutually-exclusive page regions; pip is ONE app. The reducer maintains the invariant that apps holds EITHER N fullscreen apps OR exactly one pip app — never both. Requesting pip replaces the whole set with the single app and hides the rail by default; requesting fullscreen drops any pip app and adds/focuses a tab. This makes computeRegion total and pins the explicit distinction from PG-6's two-agent comparison (D-064): pip is one app beside chat, not a comparison surface. No comparison affordances are added.

  3. The renderer is REUSED, not forked. The fullscreen / pip AppPanel pulls the registered 109b renderer out of the shared registry (dispatchRenderer(MCP_APP_INLINE_MIME).component) — the public §4.5#11 seam — rather than deep-importing or duplicating mcp-app.svelte. The chat module stays self-contained (the layout components live OUTSIDE it, under components/playground/, and the module gains no page/route import). inline is unchanged — 109b behaviour, guarded by a regression test.

  4. The host grants the modes the page can apply (backward-compatible seam extension). createAppHandlers / AppBridgeHost gain an optional availableDisplayModes (default ['inline'], preserving 109b's inline-only grant + its pinned test). The Playground App panel passes ['inline','fullscreen','pip'] so a ui/request-display-mode for fullscreen/pip is granted and routed to the layout machine; an unsupported mode falls back to the always-available inline. The renderer forwards an optional onDisplayModeRequest to the host — the consumption seam the page reduces on.

  5. Split ratio + rail toggle are Console-local view state (D-061). The split ratio is clamped to [0.2, 0.8] on drag and persists across teardown; the rail toggle reopens the rail in pip WITHOUT resetting the ratio. Layout itself (which apps are open, the active region) is conversation-scoped, not persisted across sessions.

§4.3 deviations from the plan. Minor, in-scope: the plan's file list did not name mcp-app.svelte / the registry index.ts / app-bridge-host.ts, but consuming 109b's onrequestdisplaymode requires threading an optional onDisplayModeRequest + availableDisplayModes through the renderer to the host — an additive, backward-compatible seam extension (the 109b inline-only default + its test are untouched). Documented in the PR.

Known upstream gap (named, not masked — §17.6). Inline MCP-app discovery in the chat bubble (a tool result's _meta.ui.resourceUri → a chat message that mounts the inline renderer) is NOT wired in the Console today — MessageBubble does not dispatch the MCP-app MIME and ChatMessage carries no app ref. That is a 109a/109b integration gap (it needs runtime event surface that carries the app ref), out of 109c's file list. 109c delivers the complete page-level layout subsystem (machine + components + region routing + the onrequestdisplaymode grant seam + operator affordances), which activates the moment that discovery path lands; until then the page region stays chat in production. Tracked as a follow-up.

Protocol additions. None — no method, error code, event type, or wire type changed. DisplayMode rides 109a's projection (MCPAppRef.display_mode), surfaced by 109b.

Cross-references. D-062 (DisplayMode semantics — honoured exactly, no new modes), D-064 (PG-6 two-agent comparison — explicitly NOT this), D-061 (Console-local view state — ratio/rail toggle may persist; layout is conversation-scoped), D-091 (shared chat module encapsulation — layout lives outside it; renderer reused via the registry), D-121 (Console design-system foundation), D-173 (manual-handler AppBridge — app→host stays Protocol-proxied), D-172 (the 109a–c wave). RFC §6.4, §7. CLAUDE.md §4.5, §13, §17.6, §18. Plan: docs/plans/phase-109c-mcp-apps-displaymode-layout.md.

§4.3 deviations from the plan. None — the plan's design matched the implementation.

Protocol additions. None — no method, error code, event type, or wire type changed.

Cross-references. D-191 (the Embedder seam + SearchTurns — 84e is its run-loop consumer, closing the §13 primitive-with-consumer cycle), D-094 (the D-094 mirror discipline; the collapsed duplication), D-026 (the D-026 heavy-content guard; capText is the first-line guard at the injection seam), D-025 (concurrent-reuse contract; the N=100 test in memory_fetch_test.go), D-155 / D-196 / D-197 (the 110c/110b patterns this follows: field-parity test, promoted helper, thin callers), D-062 (the deferred memory.search Protocol method gate). RFC §6.2, §6.5, §6.6. CLAUDE.md §4.2, §4.4, §13 (no second knob; no silent degradation; fail-loud), §17.1–§17.3, §18. Plan: docs/plans/phase-84e-semantic-memory-runloop.md.


D-215 — Phase 109d: inline MCP-app discovery — the mcp.app_available event closes the planner-path → renderer-mount seam; MCPAppRef gains server_id

Date: 2026-06-13

Status: Accepted

Context. The 109 wave shipped the MCP Apps runtime/Protocol projection (109a), the sandboxed iframe renderer + AppBridge (109b), and the fullscreen/pip DisplayMode layout (109c). The wave-end §17.5 audit pinned that the chain "a planner-initiated MCP tool result carrying _meta.ui.resourceUri → a chat message that mounts the 109b renderer → 109c's layout activates" was DEAD: the renderer + entire layout were unreachable in production. Three breaks: (1) the MCP driver parsed the app reference (content.go::parseAppRefMCPToolValue.AppRef, which is json:"-") but projected it onto exactly ONE surface — the mcp.apps.call_tool proxy response — and a planner-initiated call never enters the proxy, so its app reference was dropped; (2) the wire MCPAppRef carried no server id, but the renderer needs one to fetch the ui:// document via mcp.servers.read_resource(serverID, resourceUri); (3) the Console had no ChatMessage app field, no MessageBubble dispatch under MCP_APP_INLINE_MIME, and no population site. D-214 named this as a known upstream gap; 109d closes it.

Decision. The calls that shape it:

  1. A new canonical SafePayload event mcp.app_available, emitted at the MCP provider's invoke site. When a tool result declares a ui:// app, Provider.callTool publishes mcp.app_available (internal/tools/drivers/mcp/events.go, registered alongside mcp.resource_offloaded). The transport-honest choice: the Playground already consumes the SSE event stream, so the discovery rides the same wire as tool.invoked / mcp.resource_updated. The payload (AppAvailablePayload, SafePayload — no caller-controlled bytes) carries the server source id, the ui:// resource URI, the per-result display-mode hint, the default-deny raw-HTML trust posture, and the actor identity quadruple (its RunID correlates the discovery to the turn). The emit is best-effort observability — a missing identity or a publish failure logs and returns rather than failing the tool call (the tool result is the source of truth).

  2. The wire MCPAppRef gains server_id (single-sourced). internal/protocol/types/mcp_apps.go adds ServerID string json:"server_id,omitempty"; MCPAppRefRow carries it and the app-tool-call proxy projection populates it from the tool's catalog source id; web/console/src/lib/protocol/mcp.ts hand-syncs the field (the TS generator is unbuilt — CLAUDE.md §4.5 rule 5). This makes both the proxy-response app ref and the discovery event self-describing about which server hosts the document.

  3. The Console wires discovery at the page level + the message model — the chat module stays encapsulated (D-091). ChatMessage gains app?: MCPAppRefView + serverID?; MessageBubble mounts the 109b renderer via the registry dispatch (dispatchRenderer(MCP_APP_INLINE_MIME)) when an app ref is present, with the injected appHostClient + availableDisplayModes + an onAppDisplayModeRequest callback threaded through ChatPanel; the Playground page decodes mcp.app_available (wire-events.ts::decodeAppAvailable) and attaches the app to the run's agent bubble (applyAppAvailable). An inline app's onrequestdisplaymode — granted fullscreen/pip by the page's full available-mode set — opens it through 109c's already-shipped layout reducer. The chat module gains ZERO imports from Console internals; the discovery is wired outside it (the page + the model), so the future web/shared/chat extraction stays mechanical.

  4. The W3 weak synthetic-DOM Playwright test is replaced by a real-component guard. The audit flagged tests/mcp-app-displaymode.spec.ts for hand-building document.createElement fixtures and re-implementing the split-ratio clamp. The deterministic, always-on regression guard is now a Vitest component suite (web/console/src/routes/(console)/playground/[session_id]/mcp-app-discovery.spec.ts) that mount()s the REAL MessageBubbleMcpAppRenderer and drives the real reduceLayout/computeRegion into the real AppPanel — it fails if the discovery→render wiring is reverted (verified). The Playwright spec is rewritten to drive the real built Playground route under the standard CONSOLE_AVAILABLE skip.

§4.3 deviations from the plan. (a) The Playwright spec cannot trigger a runtime-emitted mcp.app_available without a real MCP app, so the literal "drives the real route" assertion is satisfied for the bundle-level surface while the deterministic real-component guard moves to the always-on Vitest suite — recorded in the plan's Risks + the PR. (b) web/console/vite.config.ts gains a VITEST-gated resolve.conditions: ['browser'] so component specs can mount() real .svelte components in jsdom; the production vite build is byte-unchanged.

Protocol additions. One new canonical event (mcp.app_available) + one new field on an existing wire type (MCPAppRef.server_id). No new method or error code. make protocol-docs-gen regenerated docs/site/protocol/{events,types}.md in the same PR.

Cross-references. D-214 (the 109c layout this activates + the known-gap it named), D-172 / D-173 (the 109a–c wave + the manual-handler AppBridge invariant — app→host stays Protocol-proxied), D-062 (DisplayMode semantics; the no-primitive-without-consumer ordering rule), D-091 (shared chat-module encapsulation), D-026 (heavy-content discipline — the renderer fails loud on a heavy ui:// document), D-209 (the generated Protocol docs regenerate in the same PR). RFC §6.4, §6.5, §7. CLAUDE.md §4.5, §6, §8, §13, §17.6, §18. Plan: docs/plans/phase-109d-inline-mcp-app-discovery.md.


D-216 — Phase 109e: MCP App discovery reads the tool-DEFINITION _meta.ui, not the tool result — discovery now fires against real ext-apps servers

Date: 2026-06-13

Status: Accepted

Context. A live test against a real io.modelcontextprotocol/ui ext-apps server (go-study-mcp) found the 109 wave's MCP App discovery inert: the renderer + the entire DisplayMode layout never activated against a real server. Root cause: the 109 wave (a–d) parsed the app reference from the WRONG place. content.go::lowerCallToolResult did AppRef: parseAppRef(res.Meta) — it parsed _meta.ui.resourceUri from the tool RESULT (CallToolResult._meta), and mcp.go::callTool fired mcp.app_available (D-215) from that. But the canonical spec — the official io.modelcontextprotocol/ui dialect (SEP-1865, rev 2026-01-26; vendored McpUiToolMetaSchema: "UI-related metadata for tools", resourceUri = "URI of the UI resource to display for this tool") — places _meta.ui.resourceUri on the tool DEFINITION. A stdio probe of go-study-mcp confirms it: every tool's tools/list entry carries _meta = {"ui":{"resourceUri":"ui://go-study-mcp/studio/index.html"}}, while a tool call returns an empty/null result _meta. So Harbor's result-parse found nothing and never fired discovery. Every 109a–d test put _meta.ui on the RESULT — a self-consistent but non-conformant fixture that matched the buggy code, so four phases of green tests hid the bug (the §17.8 failure mode, added to CLAUDE.md/AGENTS.md in this PR).

Decision. The calls that shape the fix:

  1. Capture the tool-definition _meta.ui at discovery, bound to the tool. buildToolDescriptor parses parseAppRef(t.Meta) (the MCP SDK Tool embeds Meta, populated from the server's tools/list) and captures the resulting binding by value into the descriptor's Invoke closure (immutable after discovery — no shared mutable per-run state on the Provider, D-025). Preferring closure capture over a per-Provider map[toolName]AppRef keeps the binding strictly immutable: no mutex, no re-Discover write hazard, no shared state to race. This is the spec-conformant source of the ui:// resource URI.

  2. Fire mcp.app_available on invocation of a UI-bound tool. callTool takes the captured toolApp *AppRef; after lowering the result it sets value.AppRef = reconcileAppRef(toolApp, resultHint, uiDisplayModeHint(res.Meta)) and publishes the event when the reconciled ref is non-nil. This REPLACES the old "fire when res.Meta.ui present" trigger. The reconciled value.AppRef (which is json:"-") feeds BOTH the discovery event AND the app-tool-call proxy projection (mcpconsole/apps.go::appRefFromValue), so the SAME §17.6 bug shape in the proxy path — it also read the result-only ref and also broke against a conformant server — is fixed in the same change, no separate edit.

  3. The result _meta.ui is a SECONDARY merge, never required. reconcileAppRef takes the tool binding as the source of the resource URI; a per-result display-mode hint (preferredFrame / displayMode on CallToolResult._meta.ui) wins over the binding's mode for THAT result. A server that (non-conformantly) declares the full app only on the result still surfaces via the resultHint fallback. Conformant servers leave the result _meta empty, so the binding stands alone on the golden path.

  4. DisplayMode default is inline. go-study-mcp advertises capabilities {logging, resources, tools} — NOT io.modelcontextprotocol/ui — so 109a's negotiation yields empty modes and the tool-def _meta.ui carries only a resourceUri (no mode). A UI-bound tool with no negotiated/declared mode still surfaces as renderable: the event's DisplayMode is empty and the Console renderer (mcp-app.svelte: data-display-mode={app?.displayMode || 'inline'}) defaults to inline. No Console change was needed — the renderer already mounts on a bare {resourceUri, serverID}.

The fixture mandate (§17.8, why the bug shipped). The 109d Go test put _meta.ui on the RESULT. This PR corrects it: the fake MCP server declares _meta.ui.resourceUri on the tool DEFINITION and returns an EMPTY result _meta — exactly matching go-study-mcp and the canonical schema — and asserts mcp.app_available STILL fires (now from the binding). A HARBOR_LIVE_MCP-gated probe drives the real go-study-mcp binary over stdio and asserts a UI-bound tool call fires mcp.app_available from the tool-definition ui:// (CI skips it; verified green in dev — the discovery fires even when the TTS call returns IsError, because the binding is captured at discovery, not parsed from the result).

§4.3 deviations from the plan. This phase had no pre-existing plan; it was authored from the live-test finding per §16. One design call worth naming: the binding is captured in the Invoke closure rather than a per-Provider map (the scope's suggested shape) — closure capture is strictly more D-025-clean (no shared mutable state at all) and is documented in mcp.go.

Protocol additions. None — no method, error code, event type, or wire type changed. mcp.app_available + MCPAppRef are unchanged from D-215; only the SOURCE the runtime reads them from changed. make protocol-docs-gen-check is clean.

Cross-references. D-215 (the mcp.app_available event + MCPAppRef.server_id this corrects the source of), D-172 / D-173 (the 109a–c MCP Apps wave + the manual-handler AppBridge invariant), D-025 (concurrent-reuse — the binding is immutable, captured by value in the closure), D-026 (heavy-content discipline on the ui:// read — unchanged), D-062 (DisplayMode semantics; the renderer's inline default). RFC §6.4, §6.5, §7. CLAUDE.md §5 (fail-loud, D-025), §6 (identity on the emit), §13 (no second knob — single reconcile, not parallel triggers), §17.6 (fix the proxy bug-twin in the same PR), §17.8 (real-spec fixtures — new), §18. Plan: docs/plans/phase-109e-mcp-app-tool-def-discovery.md.


D-217 — Phase 109f: heavy MCP App documents render by FETCHING the offloaded artifact (the D-026 by-reference form is consumed, not refused); operator "pop to side-by-side" affordance reuses the injected display-mode seam

Date: 2026-06-13

Status: Accepted

Context. With 109e fixing discovery against real ext-apps servers, a live test drove the real go-study-mcp stdio server in the Console Playground. Two gaps surfaced. Gap A (the primary bug): go-study-mcp's ui://go-study-mcp/studio/index.html is 86.4 KB. The default artifacts.heavy_output_threshold_bytes is 32 KiB, so 109a's mcp.servers.read_resource correctly applies the D-026 heavy-content safety net — it offloads the document to the ArtifactStore by reference and returns an MCPResourceArtifactRef instead of inline content. But the 109b renderer (web/console/src/lib/chat/renderers/mcp-app.svelte) treated any artifactRef as a FATAL error: it threw "app document … exceeds the inline heavy-content threshold" and a code comment wrongly called it "a server bug." It is not — real Svelte/React App bundles are almost always larger than 32 KiB, so this refused nearly every real App; the studio App only rendered earlier behind a threshold-raising config workaround. Gap B: inline→fullscreen/pip was app-initiated only (the app calls AppBridge requestDisplayModeonrequestdisplaymode → the page layout). The owner asked for a HOST-side operator affordance to pop the app to the 109c side-by-side (pip) without the app having to ask.

Decision. The calls that shape it:

  1. The offload is correct; only the renderer's content SOURCE changes. The heavy ui:// document stays offloaded by reference (D-026 — heavy bytes never inline through the context/LLM plane). When readResource returns an artifactRef, the renderer resolves it to a presigned URL and fetches the bytes at the iframe edge, then loads them into the SAME sandboxed srcdoc via the SAME wrapAppDocument + buildAppCSP + appIframeSandbox (no allow-same-origin) + postMessage origin guard the inline path uses. The inline (content) and heavy (artifact-fetch) paths differ ONLY in where the HTML comes from; the security envelope is byte-identical. The wrong "server bug" comment is corrected.

  2. The artifact-fetch capability is a new method on the INJECTED MCPAppHostClient, not a chat-module reach for $lib/protocol (D-091). app-bridge-host.ts adds resolveArtifact(artifactID: string): Promise<string> to the injected interface; the renderer calls it and fetches the returned presigned URL (the same fetch-from-presigned-URL pattern every other MIME renderer uses for its src). The REAL implementation lives OUTSIDE the chat module, in the Console adapter makeMCPAppHostClient (web/console/src/lib/mcp-app-host-client.ts), which delegates to client.artifacts.getRef and returns res.presigned_url. The chat module keeps ZERO $lib/ imports — the future web/shared/chat extraction stays mechanical.

  3. §17.6 bug-twin: the playground ChatProtocolClient.resolveArtifact read the absent resp.url. The Go wire field is presigned_url (internal/protocol/types/artifacts.go::ArtifactsGetRefResponse), but the page-level adapter read resp.urlundefined — silently breaking every chat-bubble artifact preview. Same bug shape as Gap A (reading the wrong get_ref field); fixed in the same PR.

  4. The operator "pop to side-by-side" affordance reuses the EXISTING display-mode dispatch path (no parallel mechanism, §13). An "expand ⤢" button (plus an optional fullscreen button) overlays the inline app frame and dispatches onDisplayModeRequest({ requested: mode, granted: mode }) — the SAME injected callback the renderer already receives, which MessageBubble forwards as onAppDisplayModeRequest(req, app, serverID)ChatPanel+page.svelte::onInlineAppDisplayModeRequest, reducing into the 109c request-display-mode layout action. The host grants the mode directly because it advertised it can apply it (availableDisplayModes). The affordance shows ONLY while the app is inline (the page-level fullscreen/pip panels carry their own mode bar) and ONLY for advertised non-inline modes; it never reaches into the page or imports page/route/store code. Tokens-only; accessible buttons with aria-label. Teardown reuses 109c's return-to-inline.

The fixture mandate (§17.8). The Gap-A guard mounts the real MessageBubbleMcpAppRenderer with an injected client whose readResource returns an artifactRef (NOT inline content) and an artifact-fetch stub returning a REALISTIC >32 KiB App document — modelled on a Vite single-file build (doctype + inlined CSS + a large inlined ES-module bundle), the shape go-study-mcp's 86.4 KB studio/index.html ships, asserted > 32 KiB so the offload path is genuinely exercised. The test asserts the iframe srcdoc is populated from the FETCHED bytes and the "App failed to load" error is gone; it FAILS (times out on the error path) if the artifactRef branch reverts to throwing (verified). An inline-path regression test and a Gap-B affordance→reducer test stay.

§4.3 deviations from the plan. This phase had no pre-existing plan; authored from the live-test finding per §16. The artifact-fetch seam is exposed as resolveArtifact (a presigned-URL resolver the renderer fetches) rather than a bytes-returning method, to match the established renderer pattern (every MIME renderer fetches its own presigned src) and keep the adapter thin.

Protocol additions. None — Console-only. artifacts.get_ref, mcp.servers.read_resource, and the MCPResourceArtifactRef shape all already ship; this phase consumes them. No Go/Protocol source changed, so make protocol-docs-gen-check is clean and no protocol.ts hand-sync is needed.

Cross-references. D-026 (the heavy-content safety net whose by-reference form this CONSUMES — not weakened), D-062 (DisplayMode semantics; the no-primitive-without-consumer ordering), D-091 (shared chat-module encapsulation — the injected MCPAppHostClient), D-172 / D-173 (the 109a–c MCP Apps wave + the manual-handler AppBridge invariant), D-214 (the 109c layout the affordance reuses), D-215 (109d discovery → renderer mount), D-216 (109e tool-def discovery). RFC §6.4, §6.5, §7. CLAUDE.md §4.5 (Console conventions; rule 3 tokens, rule 11 chat encapsulation), §5 (fail-loud on a non-2xx fetch), §13 (no parallel display-mode path; no chat-module reach into the page), §17.6 (the resp.url bug-twin fixed same-PR), §17.8 (real-spec >32 KiB fixture), §18 (drive-the-playground skill sweep). Plan: docs/plans/phase-109f-heavy-app-doc-render.md.


D-218 — Phase 109g: read_resource scopes the LLM-context heavy threshold OUT of ui:// MCP App documents — they render inline on every artifact driver

Date: 2026-06-13

Status: Accepted

Context. A live test against the real go-study-mcp ext-apps server found MCP App documents fail to render on every non-S3 artifact driver. The 109 MCP Apps host gated a ui:// App document on the D-026 LLM-context heavy-output threshold (32 KiB, config.DefaultHeavyOutputThresholdBytes) in internal/mcpconsole/apps.go::AppsAccessor.ReadResource. go-study-mcp's studio App HTML is ~86 KB — above that threshold — so mcp.servers.read_resource offloaded it to the ArtifactStore by reference and returned an artifactRef. The Console can only fetch a by-reference resource via artifacts.get_ref → a presigned URL, and ArtifactsSurface.handleGetRef fails loud with CodePresignUnsupported on every non-S3 driver (D-022 fail-loud posture, the only Presigner is the S3 driver). So on the inmem / fs / sqlite / postgres stores the App never rendered — live error: "the inmem artifact-store driver does not support presigned URLs." The root cause is a category error: the heavy-output threshold exists to keep bulky bytes OUT of the LLM context window (RFC §6.5 / D-026), but a ui:// App document NEVER enters the LLM context — the tool result carries only the tiny _meta.ui.resourceUri reference string; the actual HTML is fetched ONLY by the Console (via mcp.servers.read_resource) and rendered in a sandboxed iframe. Gating an App document on the LLM-context threshold is wrong in principle and breaks rendering on non-S3 stores.

Decision. The calls that shape the fix:

  1. A ui:// App document is a Console-render payload, not heavy LLM output — it rides inline up to a dedicated App-document cap. ReadResource checks mcp.IsUIResourceURI(resourceURI) (the existing driver predicate) and, for a ui:// document, uses appDocumentInlineCap (2 MiB) as the inline ceiling instead of the LLM-context heavy threshold. Below the cap (every real app — a studio App's HTML runs 80–100 KiB) the document rides inline as Content, so 109b's inline renderer path works on EVERY driver with no artifact fetch, no presigning, no S3. The cap is a named const with godoc explaining WHY it differs from the heavy-output threshold (App docs are rendered, not context-injected).

  2. The >cap fallback is preserved, not removed. Above the 2 MiB App-document cap, the existing D-026 offload→artifactRef path stands — the loud mcp.resource_offloaded bypass event fires, the bytes route to the ArtifactStore, and a pathologically large App is never inlined unbounded and never silently truncated (§13). For such an App the Console's presigned fetch (S3-only) is the acceptable degradation; this phase merely raises the boundary at which that path is hit for ui:// documents.

  3. The change is scoped to ui:// App documents specifically. An ordinary (non-ui://) resource read keeps the LLM-context heavy threshold unchanged — the App-document cap is not a blanket widening of the heavy threshold for unrelated content. The resource URI is available at the threshold-decision point (the request carries it), so the scoping is a one-line predicate.

  4. The tests use a REAL inmem ArtifactStore on the seam — no stub (§17.8). 109f's artifact-fetch test stubbed the artifact resolver, so it never hit the real presign-unsupported driver — the same fixture-vs-reality failure mode §17.8 names. The below-cap revert-guard reads an 86 KiB ui:// document against a real artifacts/drivers/inmem store under the identity triple and asserts it returns INLINE with NO mcp.resource_offloaded event; it FAILS if the gate is reverted to the 32 KiB heavy threshold (verified — a reverted build offloads the 86 KiB doc to an artifactRef). The above-cap test asserts a >2 MiB ui:// doc still offloads + fires the event. A HARBOR_LIVE_MCP-gated probe drives the real go-study-mcp studio doc through ReadResource and asserts it returns inline (CI-skipped). The AppsAccessor stays immutable-after-construction (D-025); the N=128 concurrent-reuse test is retained.

§4.3 deviations from the plan. This phase had no pre-existing plan; it was authored from the live-test finding per §16. One design call worth naming: the ui:// document always uses appDocumentInlineCap regardless of the operator-configured heavy threshold — the App-document cap is the App-document ceiling, independent of the LLM-context threshold (an operator who tightens the LLM threshold does not thereby cripple App rendering).

Protocol additions. None — no method, error code, event type, or wire type changed. ReadMCPResourceResponse.Content already carries inline bytes; this phase only populates it for App documents under the cap. make protocol-docs-gen-check is clean.

Cross-references. D-026 (the context-window safety net — this NARROWS its application: the heavy-output threshold never governed a render-only payload in spirit; the LLM-edge net is untouched and still governs every byte that can reach the LLMClient), D-172 / D-173 (the 109a–c MCP Apps wave + the manual-handler AppBridge invariant — app→host stays Protocol-proxied), D-214 / D-215 / D-216 (the wave; 109g is the read-side counterpart to 109e's discovery-side spec-correctness fix), D-022 (the fail-loud CodePresignUnsupported posture that surfaced the bug), D-025 (concurrent-reuse — the AppsAccessor is immutable, per-call identity rides ctx), D-062 (DisplayMode semantics; the renderer's inline default). RFC §6.5 (context-window safety net — the threshold this re-scopes), §7 (the Console as a Protocol client that renders the document). CLAUDE.md §4.4, §5 (fail-loud, D-025), §6 (identity on the read), §13 (no silent degradation — the >cap path fails loud), §17.6, §17.8 (real-spec fixtures on the seam — a real inmem store, not a stub), §18. Plan: docs/plans/phase-109g-app-doc-inline-read.md.


D-219 — Phase 114: the steering control surface derives caller authority from the VERIFIED context identity, never from the request body

Date: 2026-06-14

Status: Accepted

Context. A planning + adversarial review of the Protocol surface found a privilege escalation on the steering control plane. internal/protocol/control.go::dispatchControl discarded the verified request-context identity (_ = ctx) and built BOTH the caller's privilege tier and tenant from the request BODY: the steering scope came from cr.Identity.Scope (a caller-supplied string), and the steering event's CallerTenant was set to the body's target-run tenant (q.TenantID). Two consequences. (1) A caller could assert scope:"admin" in the request body and the per-event check (steering.CheckScope, run inside Inbox.Enqueue) would rubber-stamp it — any authenticated caller could submit any control, including admin-only PRIORITIZE, against any run they could name. (2) Because CallerTenant was always equal to the target run's tenant, CheckScope's cross-tenant-requires-admin gate (callerTenant != runIdentity.TenantID) could never fire — cross-tenant steering was undetectable. The design intent was always the opposite: steering/scope.go documents "The Protocol edge derives the Scope from the caller's JWT scope claim before calling CheckScope" — the edge simply did not do it. The bug was latent (not yet exploitable) only because the dev bootstrap mints admin-scoped tokens exclusively; it becomes live the moment a lesser-privileged token exists. Every sibling Protocol surface (artifacts artifacts.go:261/380, search search.go:73, topology control) already derives identity/scope from the verified ctx — steering was the lone exception.

Decision. The calls that shape the fix:

  1. Authority comes from the verified ctx; the body names only the TARGET run. dispatchControl reads the caller via identity.From(ctx) (the identity the auth middleware places on ctx) and fails closed with CodeIdentityRequired when it is absent — NO fallback to the body. The request body's IdentityScope is treated as a routing key (which run to steer), never as an authority claim. The body's Scope field is no longer read for any purpose; the wire field is retained for compatibility and documented as ignored.

  2. A pure derivation maps the verified caller onto a steering tier (deriveSteeringScope). It reads ONLY the verified ctx identity + JWT scope claims and compares them against the target run: auth.HasScope(ctx, auth.ScopeAdmin)steering.ScopeAdmin (sufficient for every control, cross-tenant included); verified (tenant, user) == the run's → steering.ScopeOwnerUser (which by rank satisfies the session_user-minimum controls INJECT_CONTEXT / USER_MESSAGE); otherwise no authority (the control is rejected CodeScopeMismatch before it reaches the inbox). auth.ScopeConsoleFleet is deliberately NOT honoured — fleet is a read/observe entitlement, steering is a write control; only admin confers cross-tenant write.

  3. ScopeSessionUser is NOT derived from a bare session-id match. Session ids are not globally unique across users, so a same-tenant session-id collision must not confer authority over another user's run. Owner-tier covers every same-user control by rank; the session-scoped tier becomes safe to grant only once a non-admin token carries a verified session principal — owned by the follow-on lesser-privileged-token phase. Until then only the owning user and the administrator can steer.

  4. CallerTenant is the VERIFIED caller tenant, so steering.CheckScope's cross-tenant-requires-admin gate is live: a control whose target run lives in a different tenant passes only for an admin caller. CheckScope stays as the defence-in-depth second gate (per-event minimum + cross-tenant), NOT a substitute for the edge derivation.

The fixture mandate. New unit tests assert the security contract directly: TestDispatch_BodyScopeClaimIsIgnored_NoEscalation (a non-admin owner submitting PRIORITIZE with body Scope:"admin" is rejected CodeScopeMismatch and never reaches the inbox), TestDispatch_NoVerifiedIdentity_FailsClosed (a fully-populated body incl. Scope:"admin" with no ctx identity → CodeIdentityRequired), TestDispatch_CrossTenantNonAdmin_Rejected / TestDispatch_CrossTenantAdmin_Allowed. The round-trip, conformance, concurrent-reuse (N=150), and test/integration/wave9 control scenarios were migrated to authenticate via ctx (a shared authCtx / callerCtx / wave9CallerCtx helper) — the authority source the surface actually reads. The two obsolete body-scope tests (UnknownScope, the Lookup-miss-based CrossTenantNonAdmin) were removed; the cross-tenant contract is now tested through ctx, the source of truth.

§4.3 deviations from the plan. Authored from the adversarial finding per §16; no pre-existing plan. One call worth naming: this phase does NOT mint or accept a lesser-privileged token — it is the prerequisite hardening that MUST precede any phase that does (a non-admin token landing first would open the exact escalation window this closes). The §13 "primitive with a consumer" rule is satisfied by the existing steering controls, which are the derivation's consumers; the lesser-privileged-token contract is its co-requisite follow-on, not a missing consumer.

Protocol additions. None — no method, error code, event type, or wire type changed. types.IdentityScope.Scope is retained (now ignored for steering). No Go/Protocol wire-shape change, so make protocol-docs-gen-check is clean and no protocol.ts hand-sync is needed.

Cross-references. D-025 (concurrent-reuse — ControlSurface stays immutable-after-construction; per-call authority rides ctx, never the surface), D-059 (agent_id is not an isolation principal — the steering authority tuple is (tenant, user) + the run, never agent_id). RFC §5.5 (the Protocol rejects any request without an identity scope), §6.3 (the per-event steering scope mapping + cross-tenant-requires-admin this enforces at the edge — resolves brief 02 Q-3), §7 (the Console / third-party clients that drive the control plane). CLAUDE.md §6 (identity is mandatory; fail closed; no package-level identity), §7 (security — JWT-derived identity, no privilege from request bodies), §13 (no silent degradation; identity is mandatory), §17.6 (the test migration fixes every caller the change surfaced — protocol unit + conformance + transports + wave9 — in the same PR). Plan: docs/plans/phase-114-steering-verified-identity-authority.md.


D-220 — Phase 115: production JWT verification (JWKS-backed KeySet) + harbor serve

Date: 2026-06-15

Status: Accepted

Context. The Protocol's auth surface shipped a production-grade Validator (asymmetric-only allowlist enforced at the parser via jwt.WithValidMethods, the KeySet seam, the eight typed sentinels) but the ONLY KeySet wired into a running binary was the harbor dev ephemeral dev signer — an in-memory ES256 keypair minted at boot. The identity.jwks_url / identity.jwks_file config fields existed and were validated (the full-binary Validate() profile requires asymmetric jwt_algorithms + issuer + audience + one of the two JWKS sources), but had NO consumer: no code path turned a JWK Set into a KeySet, and there was no production-shaped subcommand to boot the headless Runtime behind an operator's own IdP. An operator could not run Harbor against their identity provider; the dev signer was the only path, which is the §13 "test stubs as production defaults" failure mode one layer up — the seam existed but the binary defaulted to the dev surface.

Decision. The calls that shape the fix:

  1. A JWKS-backed KeySet behind the EXISTING interface — additive, no reshape. internal/protocol/auth/jwks.go ships JWKSKeySet, which implements KeyByID(kid) (crypto.PublicKey, alg, error) exactly as the static dev KeySet does. The production Validator is the unchanged NewValidator(keys, …); the asymmetric allowlist gate (HS*/none rejected at the parser before the keyfunc runs) is inherited unchanged. The dev signer and the JWKS keyset are the two concrete KeySet implementations of one interface.

  2. The JWK Set is parsed with the standard library only — no new dependency. RSA (kty:"RSA", base64url n/ersa.PublicKey) and ECDSA (kty:"EC", crv P-256/384/521, base64url x/yecdsa.PublicKey, point validated via the non-deprecated (*ecdsa.PublicKey).ECDH() on-curve check) are supported. kty:"oct" and any symmetric / unsupported material is rejected per-key; a set that yields ZERO usable asymmetric signing keys fails closed (ErrJWKSNoUsableKeys). Each key's alg must fall in the operator's allowlist ∩ the asymmetric AllowedAlgorithms; a key outside it is dropped. Adding a JWKS library would have required an RFC change (§13 — no new heavy deps), so the stdlib parse is the deliberate choice.

  3. Cache + TTL refresh + a BOUNDED on-miss refresh. A KeyByID lookup serves from an RWMutex-guarded snapshot while the cache is within its TTL (default 5m). A kid miss OR a stale cache triggers a re-fetch, but the re-fetch is rate-limited by a minimum interval (default 1m) under a single-flight refresh mutex: a flapping/hostile IdP or kid-spray cannot drive an unbounded fetch storm — at most one fetch per window, and a miss inside the window resolves against the current cache and returns ErrUnknownKey without touching the network. The URL fetch is bounded (client timeout + a 1 MiB response-size LimitReader). No background goroutine is started — refresh is on-demand — so there is nothing to leak.

  4. Fail loud at construction. NewJWKSKeySet performs the initial fetch+parse synchronously and returns the error if it fails: a Runtime serving the Protocol edge must not boot with an unverifiable identity surface. The auth.NewJWKSValidator(ctx, cfg.Identity, deps) projection (mirrors the *.FromConfig convention; auth imports internal/config additively, config stays a leaf) wires issuer/audience/redactor/logger/bus and is the single public entry the serve command consumes.

  5. harbor serve is the production sibling of harbor dev, NOT a parallel boot stack (§13). Rather than duplicate cmd_dev's surface wiring, bootDevStack gained an authValidatorFactory injection point: when non-nil (the serve path) it builds the JWKS-backed validator from the loaded config and marks the boot production, so the dev-only surfaces stay un-mounted — NO bootstrap-token endpoint, NO dev-token mint/print, NO draft scaffolding, NO dev-signer token-rotation surface, NO Console embedding (D-091 — only harbor console serves the Console). When nil, the dev signer path is unchanged. Production binds server.bind_addr (may be non-loopback; --bind overrides) and logs JSON (§5). The mock LLM escape hatch is NOT honoured — harbor serve demands a real provider and the full Validate() profile + the existing LLM-provider gate fail the boot loud (naming the missing field) when the JWKS source, the provider, or the API key is absent.

The fixture mandate (§17.8). The JWKS parse tests exercise a REAL committed JWK Set fixture (internal/protocol/auth/testdata/jwks.json) generated independently from the committed RSA + EC test PEM public keys (a standalone generator using math/big + base64.RawURLEncoding, not the parser under test), so a self-consistent hand fixture cannot rubber-stamp a wrong-field parse. Coverage: RSA + EC resolution, kid hit, kid miss → bounded refresh → still-miss → ErrUnknownKey, oct/symmetric rejected, malformed JWK rejected, alg-outside-allowlist dropped, TTL-refresh picks up a rotated key, and a fetch-counting transport proves the on-miss refresh is rate-limited. End-to-end: a token signed by the fixture's private key verifies through NewValidator(jwksKeySet, …); a foreign kid and HS256/none are rejected. The mandatory concurrent-reuse test (D-025) runs N=150 concurrent KeyByID/Validate against one shared keyset+validator under -race with a goroutine-baseline assertion. A HARBOR_LIVE_JWKS_URL-gated probe fetches a real endpoint (CI-skipped). The integration test (test/integration/jwks_serve_test.go) drives a JWKS-verified request through auth.Middleware over a real httptest server (real keyset + real audit/drivers/patterns redactor + real inmem event bus) and asserts the verified identity + scopes reach the downstream surface on ctx, with three failure modes (foreign kid, HS256, no token) and an N=32 no-identity-bleed concurrency run.

Adversarial-review hardening (same PR). An adversarial security pass over the implementation found and this PR fixed: (F1) the runtime-fixture seeder (HARBOR_DEV_SEED_FIXTURES) was gated on the env var ALONE, not on signer != nil — so a harbor serve boot with that env var set would seed fixtures AND drive the real planner/LLM on them; now gated on signer != nil too, making point 5's "dev-only surfaces stay un-mounted" fully true (proven by cmd/harbor.TestBootDevStack_ServeProductionBoot_GatesDevSurfacesAndVerifiesJWKS, added in the 114–118 wave-end checkpoint: a production boot with HARBOR_DEV_SEED_FIXTURES=1 set verifies a JWKS token to the surface while the seeder never fires and every dev route 404s). (W1) parseRSAJWK now enforces a 2048-bit minimum modulus (minRSAModulusBits) — a weak/compromised IdP RSA key is dropped rather than accepted, defense in depth against signature forgery. (W4) validateIdentity now rejects BOTH jwks_url and jwks_file being set (not just neither), so the config validator fails early with a clear message instead of deferring to ErrJWKSSource at keyset construction. (W2/W3) the keyset godoc now documents the refresh/staleness tradeoff: concurrent miss/stale lookups block on the in-flight single-flight fetch (bounded by the HTTP timeout; hits never block), and a refresh-fetch failure retains+serves the prior snapshot (availability over hard-fail), so a REMOVED/revoked key stays accepted until the next successful fetch — no max-stale ceiling. New regression tests: TestJWKSKeySet_RejectsUndersizedRSAKey (real 1024-bit key dropped) and a config both JWKS sources set case.

§4.3 deviations from the plan. The plan named auth.NewJWKSValidator(cfg) as the public surface; the shipped signature is NewJWKSValidator(ctx, cfg.Identity, ValidatorDeps{…}, …opts) — the ctx is required for the synchronous initial fetch and the deps carry the mandatory redactor + optional logger/bus the projection must wire (a bare cfg could not supply them). bootErrorToCLIError gained a subcommand parameter so serve / console errors attribute correctly (a latent cosmetic mislabel: console errors previously printed "harbor dev:").

Protocol additions. None — no method, error code, event type, or wire type changed. The JWKS keyset reuses the existing ErrUnknownKey sentinel and the existing auth.rejected bus event. make protocol-docs-gen-check is clean; no protocol.ts hand-sync needed.

Cross-references. D-219 (Phase 114 — the verified-identity steering authority this builds toward; 115 gives the JWKS consumer, 116 the lesser-privileged token), D-091 (the Console is served only by harbor console — serve embeds no Console), D-089 (the dev-only mock LLM stays gated; serve never enables it), D-025 (concurrent-reuse — the keyset is immutable after construction except the internally-synchronized cache; no per-call state on the artifact), D-026 (untouched — JWKS is an identity surface, not an LLM-context payload). RFC §5.4 (wire transport — the surface serve exposes), §5.5 (authentication — asymmetric-only, the Protocol rejects any request without a verified identity scope). CLAUDE.md §4.4 (the KeySet seam — interface + two drivers, no Supports* ceremony), §5 (fail-loud, JSON logging in production, D-025), §7 (asymmetric-only JWT allowlist, no hardcoded secrets — generated testdata keypairs are the sanctioned fixture), §13 (no new heavy dep — stdlib JWK parse; no test stub as production default — serve fails loud at boot; fail-loud when a required external dependency is missing), §17.8 (real-spec fixture on the seam — a JWK Set generated from real key material, plus a live-gated probe), §16 (authored from the master-plan detail block). Plan: docs/plans/phase-115-production-jwt-jwks-serve.md.


D-221 — Phase 116: the non-admin session-scoped token contract — session_user collapses into the owner tier; collision safety is structural

Date: 2026-06-15

Status: Accepted

Context. Phase 114 (D-219) moved steering authority to the verified context identity and explicitly DEFERRED the session_user tier, leaving a note in deriveSteeringScope's godoc that a real session-scoped principal — a non-admin token carrying a verified session claim — was "the seam where that tier becomes safe to grant." Phase 116 is that seam. Two things were missing for the Phase 114 derivation to be load-bearing: (1) a lesser-privileged token to judge — every token a Harbor binary minted carried admin, so the escalation 114 closed had no non-admin principal to exploit; (2) a settled, airtight rule for when the session-scoped tier is safe to grant given the hazard 114 named — session ids are client-chosen and NOT unique across users, so a same-tenant session-id collision must never confer authority over another user's run.

Decision. The calls that shape the contract:

  1. The session-scoped tier collapses into the owner tier — option (a), full-triple match. A run is keyed by the full triple (tenant, user, session), and a session belongs to exactly one (tenant, user): the principal authenticated into a run's session IS that run's owning user. Granting session_user therefore requires a full-triple match, which is a SUBSET of the (tenant, user) match that already earns the strictly-higher owner_user. So deriveSteeringScope hands a verified session participant owner_user, and never mints the distinct session_user tier. The distinct tier (where session_user is strictly below owner_user) is meaningful only for multi-PARTICIPANT sessions — a capability the runtime does not have at V1. Inventing it would be a §13 "primitive without a consumer" violation, so it stays RESERVED: the constant, the rank, and the per-control minimum in steering.CheckScope keep it defined for that future and for the admin/owner total order. Option (b) (granting on a bare (tenant, session) match with a different user under a uniqueness invariant) was rejected because the prerequisite invariant — session ids unique across users — does not hold in Harbor.

  2. Why a session-id collision cannot escalate (the load-bearing property). Collision safety is STRUCTURAL, not a special case: the only non-admin grant in deriveSteeringScope compares the user component, so a verified principal whose user differs from the run owner's earns nothing — a bare (tenant, session) match never confers authority. A verified token for (tenant-a, user-B, session-x) cannot steer (tenant-a, user-A, session-x)'s run despite the shared session-id STRING. This is proven directly: TestDeriveSteeringScope_Matrix ("session-id collision earns nothing"), TestDispatch_SessionIDCollision_NonAdminCannotSteerOtherUsersRun (surface-level, every control type rejected CodeScopeMismatch, inbox never touched), and the integration test's wire-level session_id_collision_rejected (the body-vs-JWT transport gate rejects user-B naming user-A's run with 401 — defence in depth over the surface derivation). console:fleet confers no steering write authority (only admin does); cross-tenant by a non-admin stays rejected (114, kept green).

  3. Per-control scope check moves to the edge (defence-in-depth ordering refinement). dispatchControl now runs steering.CheckScope at the Protocol edge — after deriveSteeringScope, before the inbox Lookup — in addition to the existing Inbox.Enqueue call. It is the SAME function (not a second validator, CLAUDE.md §13), run one step earlier so a caller who holds SOME authority over a run but not enough for THIS control (the owning user submitting the admin-only PRIORITIZE) is refused CodeScopeMismatch rather than leaking run existence via not_found. This also makes the live negative-escalation smoke robust: PRIORITIZE → 403 fires on a ghost run with no live inbox. Enqueue's CheckScope stays as the authoritative gate for any caller reaching the inbox by another path.

  4. The dev/test non-admin mint is the consumer. The loopback-gated POST /v1/dev/bootstrap.json endpoint gains an OPTIONAL request body: a full (tenant, user, session) triple overrides the minted token's identity, and a scopes array — INCLUDING an explicit empty [] — overrides its scope set (a nil/absent scopes keeps the default admin scopes, so the existing one-click Console-attach flow and every -d '{}' caller are unchanged). An empty scopes mints a non-admin token; a partial identity triple fails closed (400 — identity is mandatory). This is dev-only convenience behind the same loopback boundary as the default mint; harbor serve never mounts the endpoint. Production non-admin tokens come from the operator's IdP (Phase 115 JWKS verifies them) — 116's mint exists only to make the contract exercisable end-to-end and to let the live smoke run.

The session-scoped tier semantics (INJECT_CONTEXT / USER_MESSAGE accepted, owner-level controls rejected) remain covered at the tier level by steering.CheckScope and internal/runtime/steering/scope_test.go::TestCheckScope_PerEventSufficientScope; because deriveSteeringScope never mints the distinct tier in V1, there is no Dispatch-level path that yields a session_user-but-not-owner_user principal — the acceptance-criterion line for that is satisfied as RESERVED-and-documented, per the §4.3 deviation below.

The fixture mandate. Tests assert the contract directly: TestDeriveSteeringScope_Matrix + TestDeriveSteeringScope_NeverMintsSessionUserTier (white-box, every tier outcome incl. collision + the reserved-tier guard), TestDispatch_SessionIDCollision_NonAdminCannotSteerOtherUsersRun (mandatory collision-safety), TestDispatch_NonAdminOwnerPrioritize_RejectedBeforeLookup (edge ordering), TestDispatch_NonAdminOwner_OwnRunControls (the positive half — a non-admin owner exercises all eight owner/session controls). Dev-mint: TestBootstrap_NonAdminScopes_MintsLesserPrivilegedToken, TestBootstrap_NonAdminScopes_DefaultIdentity, TestBootstrap_DefaultBody_MintsAdmin (regression — {} still admin), TestBootstrap_PartialIdentity_Rejected, TestBootstrap_MalformedBody_Rejected. Integration (test/integration/nonadmin_steering_test.go): the real control transport behind the production JWKS auth.Middleware over a real httptest server with real RS256 tokens — non-admin owner injects (200), is refused on prioritize (403 scope_mismatch), admin prioritizes (200), the collision token is refused (401), unauthenticated is refused (401), and an N=16 concurrency stress drives distinct non-admin owners against distinct runs with no cross-talk.

§4.3 deviations from the plan. (a) The plan's "Public API surface" said "the token-claims shape gains a verified tier"; the shipped contract carries NO new token field — the tier is the verified identity-vs-run comparison the Phase 114 invariant established (a token-carried tier would re-open the body-trust escalation 114 closed). (b) The plan's acceptance criterion "a session_user: inject_context / user_message succeed; owner-level controls are rejected" describes a DISTINCT non-owner session participant; under option (a) that principal does not exist in single-participant-session V1, so the criterion is satisfied as the reserved-tier documentation here plus the tier-level CheckScope coverage — inventing a multi-user-session consumer to satisfy it literally would violate §13. (c) The edge CheckScope ordering refinement (point 3) was not in the plan; it is a defence-in-depth improvement that also removes a run-existence oracle for unauthorised callers.

Protocol additions. None — no method, error code, event type, or wire type changed. The bootstrap request body is a dev-only endpoint shape (not a canonical Protocol wire type). make protocol-docs-gen-check is clean; no protocol.ts hand-sync needed.

Cross-references. D-219 (Phase 114 — the verified-identity steering authority this completes; the deferred session_user tier resolved here), D-220 (Phase 115 — the JWKS verifier that verifies production non-admin tokens; this phase is its sibling consumer), D-070 (the steering Scope total order + per-event mapping), D-059 (agent_id is not an isolation principal — the steering authority tuple is (tenant, user) + the run, never agent_id), D-025 (concurrent-reuse — ControlSurface + the bootstrap handler stay immutable after construction; per-call authority rides ctx, the mint reads the per-request body). RFC §5.5 (the Protocol rejects any request without an identity scope), §6.3 (the per-event steering scope mapping + cross-tenant-requires-admin — resolves brief 02 Q-3). CLAUDE.md §6 (identity is mandatory; the isolation boundary is (tenant, user, session); fail closed), §7 (security — asymmetric JWT only; authority never from a request body; no hardcoded secrets), §13 (no silent degradation; no primitive without a consumer — the reserved tier is existing, not new dead code), §17.6 (the edge-ordering refinement + the bootstrap-default regression test fix both halves in one PR), §17.8 (the integration test drives real RS256 tokens through the production JWKS validator). Plan: docs/plans/phase-116-non-admin-token-contract.md.


D-222 — Phase 117: the chat module renders self-contained (D-091) — font inheritance, host identity + theme injected through the seam, a token contract, and a mechanical encapsulation guard

Date: 2026-06-15

Status: Accepted

Context. D-091 mandates the chat module (web/console/src/lib/chat/) be a self-contained component library — encapsulated in place now, extracted to web/shared/chat/ only when a second consumer (the packed dev UI in harbor dev that D-091 / brief 12 name) lands. The import boundary already held (zero imports of non-chat $lib/ from inside the module), but two implicit inheritances meant the module did NOT render standalone, and nothing mechanically prevented the boundary from silently regressing. (1) ChatPanel.svelte's .chat-panel root set background but no font-family — it relied on inheriting font-family: var(--font-sans) from the Console global html, body rule in fonts.css; mounted without that global stylesheet the module fell back to the UA serif. (2) app-bridge-host.ts baked the host identity (HOST_INFO = { name: 'harbor-console', version: '1' }) into the ui/initialize handshake and took theme as a positional constructor default — neither was injectable through the module seam, so a second framework surface could not advertise its own identity/theme. And the encapsulation invariants (no non-chat import, tokens-only) were enforced only by human review.

Decision. The calls that shape the hardening:

  1. The chat module root self-applies its typeface from a token. .chat-panel now declares font-family: var(--font-sans) (token, never a literal) so typography is correct without the Console global stylesheet. This is the litmus test that the module is self-contained: rendered outside the app shell it must not fall back to the UA serif.

  2. Host identity + theme are injected through the typed seam, defaults preserved. AppBridgeHostOptions gains optional hostInfo?: { name; version } and theme?: 'light' | 'dark'. The constructor reads opts.hostInfo ?? DEFAULT_HOST_INFO and opts.theme ?? 'dark' — the prior baked-in values become named, overridable defaults (DEFAULT_HOST_INFO is exported), so an existing caller is byte-unchanged in behaviour while a second surface CAN parameterize. The positional theme constructor parameter was folded into the options object (the only callers were the Console call site and the spec, neither passing it positionally). The Console call site (mcp-app.svelte) passes hostInfo: DEFAULT_HOST_INFO explicitly and relies on the theme default — behaviour identical.

  3. A documented token contract enumerates the module's design-token dependency surface. tokens.contract.json lists the 47 CSS custom properties the chat module references — the exact set a second surface must supply. JSON (not .ts/.css) so BOTH the node guard and the vitest tests read it natively without a TS loader or regex parsing; the description field carries the doc. Every token resolves in the single token surface tokens.css.

  4. A mechanical encapsulation guard makes the boundary un-regressable. web/console/scripts/check-chat-encapsulation.mjs (node stdlib only, no dependency) fails CI when the chat module (a) imports a non-chat Console internal ($lib/… outside $lib/chat, $app/…, or a relative specifier resolving outside the module), (b) references a var(--…) token absent from the contract, (c) declares a contract token that no longer resolves in tokens.css, or (d) carries a raw colour literal in CSS (backstop to stylelint). It is wired into npm run lint and re-run by a vitest test (tests/encapsulation.spec.ts) so the lint gate and the test gate share one scanner and cannot disagree. Test files (*.spec.*) are exempt from the scan — the boundary governs the production module surface, mirroring the _test.go carve-outs in the Go rules.

The §17.6 fix this surfaced. Building the token contract surfaced a latent bug: ReasoningAccordion.svelte referenced var(--space-05), which was NOT defined in tokens.css (the spacing scale jumped --space-0--space-1), so the intended tight gap silently resolved to normal (0). Fixed in the same PR by adding --space-05: 0.125rem (the clearly-intended half-step) to the single token surface — the contract now resolves and the accordion renders its intended gap.

The fixture mandate. tests/encapsulation.spec.ts runs the guard's scanner and asserts zero violations, asserts the .chat-panel root rule declares the --font-sans token (a structural assertion — jsdom does not compute the full CSS cascade, documented as such), and asserts every contract token resolves in tokens.css. renderers/app-bridge-host-injection.spec.ts mocks the official AppBridge to capture its constructor arguments and asserts (1) the default preserves the Console identity + dark theme and (2) an injected hostInfo/theme actually flows through. The guard was proven to FAIL on planted violations (a non-chat import, an undeclared token, a raw literal, an unresolvable contract token) and pass after revert.

Adversarial-review hardening (same PR). An adversarial pass over the implementation found and this PR fixed two real self-containment gaps the guard could not see, plus a guard blind spot: (F1) the token contract was INCOMPLETE — --border-hairline (in the contract, used in 10+ chat components) is defined as 1px solid var(--color-border), but --color-border was NOT in the contract; a second surface supplying only the 47 declared tokens would get an undefined inner var and every hairline border would vanish. Fixed by adding --color-border AND by teaching the guard a new check (e): a contract token whose tokens.css value references another var(--…) not in the contract now fails — so this transitive-dependency class cannot recur (verified: removing --color-border now fails the guard with "transitively depends on … required by --border-hairline"). (F2) .chat-panel self-applied background + font-family but NOT color, the same inheritance trap the font fix closed — a standalone mount would paint the panel dark while descendant text without its own color fell back to UA near-black. Fixed by adding color: var(--color-text) symmetric with the background, and the portability test now asserts BOTH properties at the root. (W1) the guard's dynamic-import regex matched only single/double quotes, so a backtick (template-literal) dynamic import specifier could dodge the boundary; fixed to accept backticks (verified: a planted backtick-quoted $lib dynamic import now fails the guard). The inline-style= raw-literal blind spot the review also noted is left to stylelint (the documented primary enforcement, which catches it in npm run lint).

§4.3 deviations from the plan. (a) The plan's acceptance line "no Console-specific literal remains in the chat module" is satisfied as "no BAKED-IN literal" — 'harbor-console' survives as the value of the exported, overridable DEFAULT_HOST_INFO default, which is the point of the seam (the caller CAN override it); removing the string entirely would leave the Console with no default identity. (b) The token contract is JSON rather than the plan's example tokens.contract.css, justified above (dual-consumer native parsing). (c) The web/shared/chat/ move stays explicitly NOT done — there is no second consumer yet; this phase is the in-place encapsulation that makes the eventual git mv mechanical.

Protocol additions. None — no Harbor Protocol method, error code, event type, or wire type changed. This is a Console-internal frontend hardening; the injected AppBridgeHostOptions is a module seam, not a Protocol wire type. No protocol.ts hand-sync needed.

Cross-references. D-091 (the chat-module encapsulation decision — encapsulate first, extract on second consumer; this delivers the encapsulation half in place), D-092 (Svelte 5 runes only — the touched components stay runes-mode, svelte-check --fail-on-warnings clean), D-121 (Console design-system conventions — tokens referenced never literal'd, the single tokens.css surface), D-173 (the manual-handler AppBridge invariant — the new AppBridge(null, …) first argument stays null; only hostInfo/theme now flow through the seam, the no-direct-transport posture is byte-identical). RFC §7 (the Console layer — the chat/playground surface as a Protocol client). Brief 12 (§11 the future packed dev UI reuses the chat components via the shared library — the legitimate second consumer; §26 tokens from a single location, raw literals rejected; §35–37 one component library serves two surfaces). CLAUDE.md §4.5 (Console conventions — Svelte 5 runes, tokens-only, the shared-chat-module rule #11: no imports of other Console internals, a typed host interface injected never a singleton), §13 (no raw literals in .svelte, no hand-rolled fetch, no Svelte 4 syntax), §17.6 (the --space-05 latent bug fixed in the same PR), §18 (operator-skill drift — the Playground operator steps are unchanged, so drive-the-playground needs no edit). Plan: docs/plans/phase-117-chat-module-encapsulation-hardening.md.


D-223 — Phase 118: the Protocol TS lockstep gate VERIFIES the hand-maintained Console client against the Go wire manifest (D-093's "generate" half deferred; generator name reserved)

Date: 2026-06-15 Status: Settled (shipping with Phase 118)

Where it lives: cmd/harbor-protocol-ts-lockstep/ (the Go manifest generator + lockstep tests); web/console/src/lib/protocol/wire-manifest.gen.json (the committed, generated wire manifest); web/console/scripts/check-protocol-ts-lockstep.mjs (the TS-source scan, wired into npm run lint); web/console/scripts/protocol-ts-untyped-allow.json (the justified untyped-type allowlist); Makefile (protocol-ts-gen + protocol-ts-gen-check); .github/workflows/docs.yml (the CI step); CLAUDE.md / AGENTS.md §4.5 rule 5 (the reworded rule); web/console/src/lib/protocol.ts (the reworded header).

Context. D-093 mandated a cmd/harbor-gen-protocol-ts generator that would REGENERATE the Console Protocol client from internal/protocol/singlesource.CanonicalWireTypes, with a make protocol-ts-gen-check CI gate; the generator was never built (D-132 corrected the formerly-false generated header to an accurate hand-maintained notice and tracked the work). Reality diverged from the D-093 assumption of a single protocol.ts: the Console's 221 canonical wire types are hand-authored across ~18 per-page modules (web/console/src/lib/protocol/*.ts plus sessions/types.ts and flows/types.ts). That per-page split is correct modularity and stays.

Decision. Build option A — a field-level LOCKSTEP GATE that VERIFIES the hand-written TS against the Go single source, NOT a generator that replaces it. A Go tool (cmd/harbor-protocol-ts-lockstep) reflects over CanonicalWireTypes (reusing the docs generator's typeInstanceIndex + struct-field-walk mechanism) and emits a committed JSON manifest of the wire surface: per canonical type, its JSON field keys, each field's canonical TS-type token (string / number / boolean / array / object / any), a named-type ref, and optionality; plus the sorted method, error-code, and event-type name sets (events read by a textual tree scan, so the tool needs zero driver imports). make protocol-ts-gen-check runs the gate in three halves.

What the gate catches — and the honest residual it does NOT.

  • Half 1 — Go↔manifest git diff. Regenerate the manifest and assert the tree is clean. Catches: any Go-side wire-shape change (new/removed/renamed type, field, method, error, event) not followed by make protocol-ts-gen. Residual: a worktree where the manifest is still untracked sees nothing from git diff until the manifest is committed — covered by half 2 in the meantime.
  • Half 2 — Go lockstep test (go test ./cmd/harbor-protocol-ts-lockstep/...). Catches: a new canonical wire type with no typeInstanceIndex instance (fails building the manifest), a stale committed manifest (a regenerate-in-memory vs committed-file byte comparison), and any manifest method/error/event that is not canonical. Runs in the main go test ./... CI job too.
  • Half 3 — TS-source scan (check-protocol-ts-lockstep.mjs, in npm run lint). Catches: a manifest type with neither an exported TS interface nor a justified allowlist entry (new/removed/renamed TYPE), and a typed wire type whose TS interface is missing a manifest field key (new/removed/renamed FIELD). Field PRESENCE is mandatory; a best-effort field-TYPE-token comparison (resolving named string-enum aliases to string) catches most type swaps. Residual: an in-place field-type swap WITHOUT a rename, where the TS type is one the cheap parser cannot resolve, is the one drift class presence-cannot-see — partially caught downstream by svelte-check at the use site. The transport-injected identity key on *Request types is the one sanctioned per-field omission (the shared client folds it in), and 58 server-only / inline-request / meta wire types the Console does not declare a named interface for are carried in the justified, hygiene-checked untyped allowlist.

Additional residuals the scan does NOT cover (honest coverage boundary). The TS-source scan is manifest ⊆ TS field-presence on NAMED typed shapes; it deliberately does not check: (1) extra/phantom TS fields (a Go field REMOVAL is caught by half 1's manifest regen, but the now-orphaned TS field is not flagged — TS ⊆ manifest is not enforced, because a per-page module may legitimately carry a Console-local field); (2) optionality drift (the manifest records optional but the scan does not compare it against the TS ?); (3) nested-ref identity (a field typed as the WRONG canonical object type passes as long as both reduce to the object token; the inner shapes are each checked independently, only the cross-reference is unenforced); (4) methods / errors / events on the TS side (the scan iterates manifest.types only — method-name and event-type STRING constants the Console hardcodes are covered Go-side by half 2's manifest presence, but are not cross-checked against the Console's usage); (5) the event list is a textual tree-scan for EventType = "..." declarations (a non-conforming declaration outside that shape would be absent from both the manifest and the re-scan with no independent catch — all current events conform). These are the gaps the deferred full generator (option "B") closes by construction; the tracking issue stays open.

Adversarial-review hardening (same PR). An adversarial pass found the allowlist's single false justification: TasksListStatusCounterStrip (the Live Runtime header strip) WAS declared in TS but as TaskListStatusCounterStrip (singular "Task"), a one-character mismatch from the canonical TasksListStatusCounterStrip — so the scan could not match it and it fell to the allowlist with an inaccurate "consumed inline" justification, leaving its five fields unguarded. Fixed by renaming the TS interface to the canonical name and removing the allowlist entry (the five strip fields are now field-guarded). Also typed the load-bearing StartResponse (task_id / reused / protocol_version) as a named interface in client.ts instead of an inline generic default and removed it from the allowlist, so a Go rename of task_id now fails the gate rather than silently passing both the gate and svelte-check. Both verified by planted-drift probes (dropping a field now fails the scan).

Pre-existing Go↔TS drift this fixed (§17.6). The first scan run surfaced genuine latent drift the hand-maintained client had accumulated, all corrected in this PR: SearchFilter declared singular tenant_id / user_id / session_id where the wire carries plural tenant_ids / user_ids / session_ids arrays plus since / until (the Console's filter never reached the runtime); GovernancePostureResponse consumed a non-wire tiers array + an invented latent flag while the runtime sends an identity_tiers map + protocol_version (the Settings governance card always fell through to "latent default" against a real runtime — fixed the interface AND the GovernancePostureCard.svelte consumer to iterate the map); RateLimitView.refill_interval (string) corrected to refill_interval_ms (number); LLMPostureResponse / GovernancePostureResponse missing protocol_version; IdentityScope missing run / scope / actor / requester / impersonating; SearchRequest missing facets (plus the SearchFacet interface); TaskDetail missing trajectory (plus the TaskTrajectoryRef / TaskTrajectoryStep interfaces).

Deviation from D-093. D-093's "generated, never hand-edited" client is superseded for the foreseeable future by "hand-maintained, mechanically lockstep-gated." The "generate" half — emitting per-domain generated TypeScript type modules that separate pure types from the hand-written client logic — is a deliberately deferred FUTURE phase ("B"); the cmd/harbor-gen-protocol-ts name is RESERVED for it and stays unused. The committed wire-manifest.gen.json IS generated and never hand-edited; the TS interfaces are hand-maintained and gated. This is a documented amendment to D-093, not a silent departure.

Protocol additions. None — no Harbor Protocol method, error code, event type, or wire type changed. The manifest is a read-only projection of the existing canonical surface; the Go tool is a build tool with no runtime surface.

Cross-references. D-093 (the original generate-the-TS-client decision this amends — "generate" to "verify lockstep", generator name reserved), D-132 (the per-page split context + the formal post-Wave-13 deferral this closes), D-209 (Phase 113a — the sibling cmd/harbor-gen-protocol-docs generator + protocol-docs-gen-check gate shape this mirrors), D-002 (the Go single source for wire types). RFC §5 (the Harbor Protocol contract), §5.3 (versioning — the manifest pins ProtocolVersion). CLAUDE.md §4.5 rule 5 (the reworded lockstep rule), §13 (no second driver blank-import list — the event tree scan avoids it), §17.6 (fix what the gate finds — the pre-existing drift above), §19 (the AGENTS.md ↔ CLAUDE.md mirror, edited identically). Plan: docs/plans/phase-118-generated-protocol-ts-client.md.


D-224 — Phase 109h: the MCP driver advertises its io.modelcontextprotocol/ui host capability on the initialize handshake — the write side of DisplayMode negotiation, preserving roots

Date: 2026-06-16

Status: Accepted

Context. The 109 MCP Apps wave shipped the READ side of UI capability negotiation: internal/tools/drivers/mcp/mcp.go::negotiateDisplayModes reads a server's io.modelcontextprotocol/ui capability (under extensions / experimental) to learn which display modes (inline / fullscreen / pip) the server's apps prefer. But the driver never advertised its OWN UI capability — mcpsdk.ClientOptions.Capabilities stayed nil, so ClientCapabilities.Extensions shipped empty (brief 14 §2 row 31: "Extension negotiation — Absent: ClientCapabilities.Extensions never populated"). A spec-conformant ext-apps server therefore could not learn that the Harbor host renders apps, and could not tailor the app references it returns to what the host can actually display. The negotiation was one-directional: Harbor read the server's modes but advertised none of its own.

Decision. The calls that shape the fix:

  1. The driver advertises the host's renderable display modes during the initialize handshake. A new hostCapabilities(displayModes) helper in mcp.go builds a *mcpsdk.ClientCapabilities that AddExtensions the io.modelcontextprotocol/ui key (the existing uiExtensionKey const) with a {"displayModes": [...]} settings object, filtered against the closed validDisplayModes set (the same set negotiateDisplayModes uses — symmetric read/write), deduplicated, advertised order preserved. New sets ClientOptions.Capabilities to this value only when modes are configured; with no modes it leaves Capabilities nil, preserving the SDK's default advertisement for an embedder that does not opt in (backward-compatible).

  2. The roots advertisement is PRESERVED, not dropped (the regression trap). The go-sdk advertises {"roots":{"listChanged":true}} by default when ClientOptions.Capabilities is nil; setting Capabilities to a non-nil value OVERRIDES that default, and (SDK #607) the deprecated Capabilities.Roots field is IGNORED in favour of Capabilities.RootsV2. So hostCapabilities MUST set RootsV2: &mcpsdk.RootCapabilities{ListChanged: true} to replicate the current roots advertisement — otherwise opting into the UI extension would silently drop the roots capability the runtime advertises today. This phase PRESERVES current roots behaviour exactly; it does NOT fix the roots honesty defect (brief 14 §3 — Harbor advertises roots without servicing it), which is the separate 85a stopgap scope. Sampling / elicitation remain inferred from their handlers (the SDK adds them after the explicit caps assignment, only overriding when the field is set in Capabilities — which this phase does not set).

  3. The advertised modes come from a deployment-level config field, defaulting to the inline baseline. A new tools.mcp_app_host.display_modes field (config.MCPAppHostConfig + ToolsConfig.MCPAppHostDisplayModes()) resolves a nil / empty block to [inline] — the mode the Console renders out of the box. It is a single deployment-level block, NOT a per-server field: the host's rendering ability does not vary per MCP server. The boot loader (internal/runtime/assemble/assemble.go) resolves it once and threads it into every attached provider via the new mcp.AttachDeps.HostDisplayModes, which doubles as the programmatic SDK seam (an embedder sets it without YAML). Validation (internal/config/validate.go) enforces the closed set + uniqueness, with allowedMCPAppDisplayModes mirroring the driver's validDisplayModes (the config package must not import the concrete driver — §4.4 — so the set is duplicated and a drift-mirror test pins it, same pattern as allowedMCPTransportModes).

  4. The integration test derives its capability fixture from the real SDK shape (§17.8). Two providers built from ONE resolved config value are paired to real SDK in-memory transports; each server's captured serverSession.InitializeParams().Capabilities is asserted to echo the configured modes AND to still advertise roots (Roots.ListChanged — the SDK syncs RootsV2Roots on the wire). The fixture is the SDK's actual InitializeParams, not a hand-authored blob — a hand blob could not tell a correctly-placed extension from a misplaced one. Identity still propagates on a real tool call after the handshake; an opt-out provider (no host modes) advertises roots with NO UI extension (the failure mode). A unit test asserts hostCapabilities preserves roots directly (the revert guard).

§4.3 deviations from the plan. This phase was authored from the live-test finding per §16 (no pre-existing plan file). One design call worth naming: the config validator's allowedMCPAppDisplayModes set is a duplicate of the driver's validDisplayModes rather than an import, because internal/config must not depend on a concrete driver package (§4.4); a drift-mirror test (TestValidateTools_MCPAppDisplayModeAllowlistMirrors_MCPDriver) pins the two together, exactly as the transport-mode allowlist already does.

Protocol additions. None — no Harbor Protocol method, error code, event type, or wire type changed. The capability is an OUTBOUND client→server advertisement on the MCP wire (southbound), not a Harbor Protocol (northbound) surface; there is no inbound method to probe, so the smoke is static-only.

Cross-references. D-172 / D-173 (the 109a–c MCP Apps wave + the manual-handler AppBridge invariant — 109h is the capability-negotiation counterpart to the rendering surface), D-214 / D-215 / D-216 (the wave; 109e corrected the discovery-side _meta.ui placement, 109g the read-side threshold, 109h adds the host-side capability write), D-218 (the read-side render fix this complements), D-025 (concurrent-reuse — HostDisplayModes is read once at New and immutable). RFC §6.4 (Tool catalog and transports — the MCP southbound driver), §7 (the Console as the host that renders the apps). brief 14 §2 row 4 + row 31 (the capability-negotiation + extension-negotiation gaps), §3 (the roots honesty defect this PRESERVES rather than fixes — 85a's scope). CLAUDE.md §4.4 (the config↔driver allowlist duplication + mirror test), §5 (fail-loud, D-025 immutability), §6 (identity propagation on the post-handshake tool call), §10 (the new config field + example), §13 (no silent degradation — opting into the extension never silently drops roots), §17.8 (real-spec fixtures on the seam — the SDK's actual InitializeParams). Plan: docs/plans/phase-109h-mcp-apps-host-capability.md.


D-225 — Phase 109i: MCP Apps tool-context capture + mcp.apps.tool_context — the Data-Delivery backend

Date: 2026-06-16

Status: Accepted

Context. The 109 MCP Apps wave lets the Console discover (mcp.app_available, D-215/D-216), fetch (mcp.servers.read_resource, D-218), and render a ui:// MCP App in a sandboxed iframe. But a rendered app had no way to read the tool context — the input arguments + the lowered result — that produced it. The MCP Apps "Data Delivery" lifecycle (brief 14 §6) is host-pushed: the host delivers the tool call's structured data to the rendered app; the app reads its data, it does NOT re-invoke the tool (re-invoking would double a side effect). Without a runtime capture + a Protocol read, the rendered app is inert — it can render its UI shell but cannot populate it with the data of the call that summoned it.

Decision. The calls that shape the backend half:

  1. Capture at the tool-invocation site, ride the existing StateStore. internal/tools/drivers/mcp/mcp.go::callTool — the same site that emits mcp.app_available — captures {tool, input, lowered result, is_error} whenever a result declares a ui:// app, through a new optional ToolContextCapturer seam on the MCP Config. The capturer (mcpconsole.ToolContextStore) persists a StateRecord through the runtime's own StateStore — so all three persistence drivers (in-mem / SQLite / Postgres) and identity isolation come free; NO new driver, NO new migration. The record is keyed by the caller's identity triple with an EMPTY RunID (session-scoped — the read, from a rendered app, knows the session but not necessarily the producing run) under kind = "mcp.apps.tool_context/<serverID>/<toolCallID>". A cross-identity Load is not found by construction (StateStore.Load filters by the triple — brief 14 §5 security context-binding).

  2. A deterministic tool_call_id, no mutable Provider state (D-025). The id is a content hash of run | server | tool | args (length-prefixed so field boundaries cannot alias), minted in callTool with no counter and no Provider field — the Provider stays an immutable compiled artifact. It is stamped on the mcp.app_available event (alongside tool_name; the payload stays SafeSealed — ids/names are not content), on the wire MCPAppRef, and on the app-tool-call proxy projection, so a client correlates a discovered app to its captured context.

  3. The read is a new identity-scoped Protocol method, heavy-aware. mcp.apps.tool_context (ToolContextRequestToolContextResponse) routes through the AppsSurface dispatcher (IsMCPAppsMethod); a new protocol.AppToolContextReader seam is implemented by mcpconsole.AppsAccessor (delegating to the ToolContextStore). Each of input / result is heavy-content-aware: at WRITE a payload ≥ the heavy threshold offloads to the ArtifactStore by reference through the SAME loud-bypass path the resource read uses (refactored into a shared offloadHeavy helper, the mcp.resource_offloaded event); at READ each half projects inline OR as an artifact_ref the Console resolves through the artifacts surface — exactly the discipline read_resource and the proxy already carry. An unknown or cross-identity (server_id, tool_call_id) fails with CodeNotFound (existence is never revealed across identities).

  4. Fail loud, never silently (§13). A capture failure (store error, encode error, missing identity) is logged loudly and observable, but does NOT fail the tool call — the planner's result is the source of truth, so a capture problem must not break the agent's turn; the app's later tool-context read then returns not-found (the Console handles it as "no context"). A missing identity fails closed on both Capture and Load.

  5. Wired in assemble, mirrored in devstack + cmd/harbor (§17.6). The ToolContextStore is constructed once in internal/runtime/assemble over the runtime's StateStore + ArtifactStore + Bus, exposed on the Stack, and threaded into every MCP Provider via AttachDeps.ToolContext AND into the AppsAccessor read seam — the production path and the harbortest/devstack fixture carry the SAME wiring, so a wave-end E2E can never pass on a fixture-only fix.

§4.3 deviations from the plan. This phase was authored from the 109 live-test program per §16 (there was no pre-existing plan file). One design call worth naming: capture is co-located with discovery in the Provider rather than in a tool-dispatch wrapper, because the planner-path tool call flows through the Provider's descriptor Invoke (not through the AppsAccessor, which only handles the app-initiated proxy), and the tool_call_id + the app reference are both already in scope there — co-locating keeps the id minted once and shared by the event, the capture, and the proxy projection.

Protocol additions. One method (mcp.apps.tool_context), three wire types (ToolContextRequest / ToolContextPayload / ToolContextResponse), one new field on the existing MCPAppRef (tool_call_id), and two new fields on the existing mcp.app_available SafeSealed payload (tool_call_id + tool_name). No error code or event type added. Single-sourced in internal/protocol/{methods,types}, hand-mirrored into web/console/src/lib/protocol/mcp.ts, with make protocol-ts-gen (wire manifest) and make protocol-docs-gen (the generated Protocol reference) regenerated + committed; both generators' typeInstanceIndex + method tables extended (their lockstep tests pin the join rows).

Cross-references. D-215 / D-216 (the mcp.app_available discovery event + the tool-definition _meta.ui placement the tool_call_id rides on), D-218 (109g — the heavy-aware inline/offload pattern reused at the capture seam; the shared offloadHeavy helper extracted here), D-172 / D-173 (the 109a–c MCP Apps wave + the manual-handler AppBridge invariant — app→host stays Protocol-proxied), D-026 (the context-window safety net — the heavy-content threshold the capture honours), D-025 (concurrent-reuse — the Provider / AppsAccessor / ToolContextStore are immutable, the tool_call_id is a pure hash, per-call identity rides ctx; N=128 tests under -race), D-002 (the Go single source for wire types), D-209 (the generated Protocol docs gate this regenerates), D-223 (the TS lockstep manifest this regenerates). RFC §6.4 (Tools), §6.5 (context-window safety net), §7 (the Console as a Protocol client). CLAUDE.md §4.4 (the seam), §5 (fail-loud, D-025), §6 (identity mandatory + fail-closed + the cross-identity isolation test), §8 (Protocol single-source), §9 (persistence — the StateStore ride), §13 (no silent degradation), §17.6 (the assemble/devstack/cmd wiring parity), §17.8 (real-spec fixtures + the HARBOR_LIVE_MCP probe). Plan: docs/plans/phase-109i-mcp-apps-tool-context.md.


D-226 — Phase 109j: Console pushes tool-input/tool-result into the rendered MCP App — the Data-Delivery Console half

Date: 2026-06-17

Status: Accepted

Context. The MCP Apps "Data Delivery" lifecycle (brief 14 §6) is host-pushed: after a rendered ui:// app sends ui/notifications/initialized, the host delivers the originating tool call's INPUT arguments and RESULT into the app; the app reads its data, it does NOT re-invoke the tool (re-invoking would double a side effect). D-225 (109i) shipped the backend half — capture at the tool-invocation site plus the identity-scoped mcp.apps.tool_context read method, with the correlation tool_call_id stamped on the mcp.app_available event and the wire MCPAppRef. But the Console host never called sendToolInput / sendToolResult, so a spec-conformant app that renders from host-pushed data booted empty. This phase closes the lifecycle on the Console side, consuming the 109i surface now on main.

Decision. The Console-side calls that close the lifecycle:

  1. Delivery lives INSIDE AppBridgeHost, on the injected client only (D-173). The push is wired on the existing oninitialized callback: once the app reports ui/notifications/initialized, AppBridgeHost.#deliverToolContext() fetches mcp.apps.tool_context(serverID, toolCallID) through the injected MCPAppHostClient and pushes bridge.sendToolInput({ arguments }) THEN bridge.sendToolResult({ content, isError }) — in that ORDER (the SDK requires initialized before sendToolResult, and input-then-result is the lifecycle order). The bridge module app-bridge-host.ts issues NO raw fetch: the heavy-payload byte fetch lives in fetchArtifactText on the adapter (mcp-app-host-client.ts, outside the chat module), so the no-direct-transport invariant (D-173) covers the delivery path too — the no-direct-transport spy test is extended to assert it.

  2. Heavy-aware, fail-loud (§13, mirrors 109f/D-217). A captured input / result at or above the heavy-content threshold (D-026) rides as an artifactRef; the host resolves + fetches the bytes at the iframe edge and delivers them. A heavy INPUT is JSON-parsed into the tool arguments. When a heavy result's bytes cannot be fetched (e.g. presign unsupported on a non-S3 store), the host delivers a FAITHFUL by-reference stub text block ([artifact <id> · <n> bytes — unavailable on this store]) — never silently empty.

  3. Best-effort delivery, never a render error. The whole delivery sequence is fire-and-forget and wrapped in try/catch: a failure (the fetch rejects, the runtime errors) is logged but NEVER thrown — the app has already rendered its shell, so a delivery problem is not a render problem. A context that does not exist (toolContextnull, the adapter mapping the Runtime's CodeNotFound onto null) yields no push and no error — the app simply boots without a delivered result. Any non-not_found Protocol error re-throws (fail-loud on real failures).

  4. The correlation id flows event → message → renderer → host. wire-events.ts::decodeAppAvailable decodes tool_call_id (an older runtime that predates 109i capture omits it → '' → no push); +page.svelte::applyAppAvailable carries it onto the message's MCPAppRefView; mcp-app.svelte passes app.toolCallId into the AppBridgeHost options. MessageBubble.svelte already forwards the whole app object, so the id rides inside it unchanged.

§4.3 deviations. None — the implementation follows the plan. One scope note honoured: only the FINAL input + result are pushed (no sendToolInputPartial streaming, no re-push on a later tool call within the same app session) — a documented post-V1 extension; ongoing interactivity uses the app's own tools/call, already wired.

Test / Protocol notes. No Go or Protocol change — this phase is a pure consumer of the 109i surface; the toolContext client method consumes the existing ToolContextResponse wire type, so no wire-manifest regeneration. Coverage is the vitest suite: app-bridge-host.spec.ts (a fake bridge + fake injected client asserts push order, payloads, heavy resolve+fetch, the by-reference stub fallback, not-found→no-push, and the no-direct-transport spy extended to the delivery path) and mcp-app-host-client.spec.ts (the adapter's toolContext incl. not-found→null + non-not_found re-throw, and fetchArtifactText). The Playwright render path is documented-as-deferred at the top of tests/mcp-app-host.spec.ts (that spec is a deliberately bridge-free security-primitive harness; a real sendToolResult render would mean standing up a full bridge-handshake harness disproportionate to the unit coverage — the gap is named, not faked).

Cross-references. D-225 (109i — the mcp.apps.tool_context capture + read surface this consumes, and the tool_call_id correlation it rides), D-173 (the manual-handler AppBridge no-direct-transport invariant the delivery preserves — the push uses only the injected client), D-217 (109f — the heavy artifact-fetch-at-the-iframe-edge pattern reused), D-218 (109g — the inline/offload discipline the captured payloads carry), D-091 (the chat-module encapsulation — app-bridge-host.ts imports nothing from $lib/protocol; the client is injected, the raw fetch lives in the adapter), D-026 (the heavy-content threshold), D-215 / D-216 (the mcp.app_available discovery the tool_call_id rides on). RFC §6.4 (Tools), §7 (the Console as a Protocol client). brief 14 §6 (the host-pushed Data Delivery dialect), §2 rows 18–19 (the lowered structured result the app receives), §5 (identity context-binding — the delivered data is the caller's identity-scoped 109i record). CLAUDE.md §4.5 (Console/Protocol-client conventions — Svelte 5 runes, injected typed client, no hand-rolled fetch in .svelte), §5 (fail-loud / best-effort), §13 (no silent degradation — the by-reference stub, never empty), §17 (the vitest coverage + the documented Playwright gap). Plan: docs/plans/phase-109j-mcp-apps-data-delivery-push.md.


D-227 — Phase 109k: MCP Apps spec-conformance hardening — mimeTypes UI capability, server-namespaced app→host calls, and the host-obligation gaps

Date: 2026-06-17

Status: Accepted

Context. The wave-end adversarial spec-compliance review of the MCP Apps band (109a–j) found two conformance-breaking FAILs that were green against Harbor's own fixtures but inert against a real io.modelcontextprotocol/ui ext-apps server — the D-216 failure class (a self-consistent hand fixture passes while the code is wired to the wrong field). FAIL-1: 109h (D-224) advertised the UI host capability as extensions["io.modelcontextprotocol/ui"] = {"displayModes": [...]}, but displayModes is NOT a field of the spec McpUiClientCapabilities — the spec field is mimeTypes (the SDK's getUiCapability(caps).mimeTypes gate, RESOURCE_MIME_TYPE = "text/html;profile=mcp-app"). A conformant server reads mimeTypes to decide whether to register its ui:// tools; against Harbor it saw none and would not register. FAIL-2: an app-initiated tools/call passed the app's bare server-side tool name (get_weather) straight to mcp.apps.call_tool, but the Harbor catalog keys tools <source>_<tool> — so the call could not resolve, and (worse) nothing confined an app to its own server's tools. Plus a set of host-obligation gaps a conformant app relies on: the host ignored ui/notifications/size-changed, never sent ui/resource-teardown, never handled request-teardown, baked the theme to dark (no live theme / host-context-changed), and omitted host-context toolInfo / containerDimensions and the resources/templates/list handler.

Decision.

  1. FAIL-1 — the capability is the spec mimeTypes, advertised unconditionally. internal/tools/drivers/mcp/mcp.go::hostCapabilities now advertises extensions["io.modelcontextprotocol/ui"] = {"mimeTypes": ["text/html;profile=mcp-app"]} (mirroring the SDK's RESOURCE_MIME_TYPE, exported as mcp.ResourceMIMEType), and the non-spec displayModes capability payload is removed. It is advertised on EVERY initialize handshake (Harbor always hosts apps via the Console — there is no per-server opt-out), still preserving the SDK roots advertisement (RootsV2.ListChanged=true — the regression guard, since setting any capability overrides the SDK default). The negotiateDisplayModes / uiCapabilitySettings non-spec server reads are deleted; Provider.DisplayModes() now returns the deployment's configured host modes (Config.HostDisplayModes, filtered) so the Registry/Console column reports what the HOST renders, not a value scraped off the server.

  2. Config reconciliation (the key open decision) — display_modes is surfaced via runtime.info, NOT dropped. Display modes are not a capability field; the spec carries them in the ui/initialize host-context availableDisplayModes. The just-shipped 109h tools.mcp_app_host.display_modes config is therefore given a spec-correct consumer rather than silently dropped (§10 backward-compat): the configured modes are projected onto a new read-only RuntimeInfo.MCPAppDisplayModes wire field (internal/protocol/types/posture.go, set at both boot sites from cfg.Tools.MCPAppHostDisplayModes()), and the Console Playground reads it to seed the AppBridgeHost availableDisplayModes (replacing the hard-coded ['inline','fullscreen','pip']). This is the recommended option from the plan's Risks section; the field is mirrored into web/console/src/lib/protocol/settings.ts and the regenerated wire-manifest.gen.json + generated Protocol docs.

  3. FAIL-2 placement — frontend serverID prefix, NOT a Protocol server_id addition. app-bridge-host.ts::createAppHandlers.oncalltool prefixes the app-supplied bare tool name with the bridge's serverID (dispatching callTool against the qualified <serverID>_<name>) before the call. This both resolves the catalog key AND confines an app to its own server's tools for free — a cross-server or already-namespaced name is still prefixed, so it can never escape this bridge's <serverID>_ namespace. The minimal-surface choice: no Protocol method change, mcp.apps.call_tool semantics unchanged on the wire (the name is qualified host-side). A backend server_id on the method would be more defensive but a Protocol change for no added safety here.

  4. Host-obligation gaps closed (all on the injected client / the bridge — D-173 preserved). AppBridgeHost now: listens for sizechange (the SDK auto-emits size-changed from every app) and forwards it so mcp-app.svelte tracks the inline iframe height to the reported content height; sends ui/resource-teardown (bridge.teardownResource({})) BEFORE bridge.close() on unmount and handles the app's request-teardown (graceful close + an injected callback); threads the live Console theme into the ui/initialize host-context and re-pushes it via setHostContext (→ ui/notifications/host-context-changed) on a theme change (AppBridgeHost.setTheme); populates host-context toolInfo ({ id: toolCallId, tool: { name: toolName } }, with toolName threaded onto MCPAppRefView from the mcp.app_available event) and best-effort containerDimensions (the iframe box); and wires onlistresourcetemplates → a new injected MCPAppHostClient.listResourceTemplates that resolves to an empty list GRACEFULLY (no error) so the advertised serverResources capability is honestly serviceable — full resource-template support is a documented follow-up.

  5. Heavy-INPUT asymmetry (Fold A) — recorded, not silently degraded. The result-delivery path (D-226) delivers a faithful by-reference stub text block when a heavy artifact cannot be fetched; the INPUT-delivery path has no symmetric faithful stub because tool input is a Record<string,unknown> the app reads by key (there is no key-shaped stub) AND input is advisory pre-render data, not the source of truth (the result is). So on a heavy-input fetch/parse failure the host now LOGS loudly (console.warn, removing the prior silent catch { return {} }, §13) and delivers empty arguments. This is the deliberate, recorded resolution of the asymmetry — not a faithful-stub implementation.

  6. Fold A cleanups. The ToolCallID godoc is corrected (it newline-SEPARATES fields; it does not length-prefix). The harbortest/devstack AppsSurface construction is made fail-loud (parity with cmd/harbor/cmd_dev.go): it gates on the MCP/catalog band being present (the same MCPRegistry signal the sibling MCP-surface block uses — under SkipCatalog the whole band is legitimately nil), and WHEN the band is present constructs the accessor fail-loud instead of behind a per-dep non-nil guard. The catalog band always builds the catalog + artifact store + tool-context store alongside the registry, so a nil sub-dep within a present band is a real wiring regression the prior multi-nil guard would have silently masked (§17.6).

§4.3 deviations. (a) The FAIL-1 revert-guard is implemented as a CI-RUNNING real-SDK conformance probe (TestConformance_RealSDKServer_GatesUIToolOnMimeTypes: a go-sdk server reads the capabilities Harbor's client actually advertised during initialize, applies the real getUiCapability(caps).mimeTypes gate, registers its ui:// tool only on a pass, and the test asserts Harbor discovers it) rather than only an env-gated external probe. An in-memory real-SDK server has no API cost, so it runs in CI unconditionally — strictly stronger than the env-gated probe the plan describes, and §17.8-faithful (derived from the official package's wire behaviour, not a hand blob). The external-binary HARBOR_LIVE_MCP probe (TestLive_MCPAppAvailable_RealExtAppsServer) remains for the end-to-end discovery path. (b) The live Console theme toggle is OS-prefers-color-scheme-resolved in the Playground (the Console's profile-level theme toggle is deferred per 108b) — a genuine live signal that re-pushes host-context-changed, tested at the AppBridgeHost seam.

Sanctioned deviations preserved. D-173 (connect-src 'none' is NOT relaxed for a server-declared CSP connectDomains — all app traffic stays bridge-proxied; the new delivery/teardown/size/theme paths use only the injected client / the bridge, asserted by the extended no-direct-transport spy), D-224's deployment-declaration intent (the config still drives the host's renderable modes, now via the spec-correct slot), D-225 (the durable tool-context store), D-218 (the app-doc inline cap).

Test / Protocol notes. Go: the capability mimeTypes-shape + roots-preserved unit test, the real-SDK two-provider handshake echo test, DisplayModes() from config, the runtime.info MCPAppDisplayModes projection test, and the CI-running conformance gate. Frontend vitest: oncalltool bare-name prefix + cross-server confinement, onlistresourcetemplates graceful empty, the AppBridgeHost host-context toolInfo/containerDimensions/theme init, setTheme push-on-change, size-changedonSizeChanged, teardown-before-close + idempotency, request-teardown→close+callback, availableDisplayModes reaching the host-context, and the no-direct-transport spy extended to the new paths; the adapter's listResourceTemplates. The RuntimeInfo wire change regenerated wire-manifest.gen.json + the generated Protocol docs and was mirrored into settings.ts (the make protocol-ts-gen-check / make protocol-docs-gen-check gates pass). A binding pre-merge gate remains: the orchestrator live-tests the full MCP Apps surface against the test agent + Console (the surface worked pre-109 — this proves 109h–k did not regress it).

Cross-references. D-224 (109h — the capability this corrects: displayModes → spec mimeTypes; roots-preservation kept), D-216 (the fix-the-field-vs-real-server lesson this closes for the capability + tool-call paths), D-225 (109i — the tool-context backend whose tool_call_id the host-context toolInfo now also carries), D-226 (109j — the Data-Delivery push whose result-path stub the heavy-input decision mirrors-by-contrast), D-218 (109g — the inline/offload discipline), D-173 (the manual-handler no-direct-transport invariant + the connect-src 'none' divergence held), D-091 (chat-module encapsulation — app-bridge-host.ts imports nothing from $lib/protocol; the client is injected, the new listResourceTemplates lives on the injected surface), D-093 / D-223 (the hand-maintained-but-lockstep-gated Console wire client the new RuntimeInfo field is mirrored into), D-209 (the generated Protocol docs regenerated), D-002 (the Go single source for the new wire field), D-026 (the heavy-content threshold the delivery paths honour). RFC §6.4 (Tools), §7 (the Console as a Protocol client). brief 14 §2–3 (extension negotiation in the shape the spec reads + the roots-honesty bar), §6 (the AppBridge host↔view dialect: size-changed, host-context-changed, teardown, tool-input/result). CLAUDE.md §4.4 (the seam), §4.5 (Console/Protocol-client conventions — Svelte 5 runes, injected typed client, no hand-rolled fetch, the lockstep manifest), §5 (fail-loud), §8 (Protocol single-source), §10 (config backward-compat — the display_modes field re-homed, not dropped), §13 (no silent degradation — the heavy-input log, the devstack fail-loud), §17.6 (the devstack/cmd parity fix), §17.8 (real-spec fixtures + the HARBOR_LIVE_MCP probe). Plan: docs/plans/phase-109k-mcp-apps-conformance-hardening.md.


D-228 — Phase 87: durable TaskService backend — a StateStore-backed TaskRegistry driver over an extracted shared engine

Date: 2026-06-18

Status: Accepted

Context. Background- and foreground-task records (tasks, groups, patches) did not survive a Runtime restart: the in-process TaskRegistry driver wrote every lifecycle transition through the StateStore, but (a) it never reloaded those records on open, and (b) it keyed every record under a single fixed (identity, Kind) slot per type (task.lifecycle / task.group / task.patch), so each Save clobbered the previous record — the write-through was vestigial, never a recoverable log. This closes D-006 (background-task persistence deferred to post-V1). RFC §6.8 illustrates post-V1 durable backends as "Postgres-as-queue / NATS JetStream"; this phase instead ships single-instance restart-survival over the existing StateStore triad — the lower-risk, DRY path the durable events driver (Phase 57) already proved, and orthogonal to the distributed-queue concern (Phase 86). A queue-backed driver remains a valid later driver behind the unchanged seam.

Decision.

  1. Extract a shared internal/tasks/engine package; inprocess and durable are thin wrappers. The full task/group/patch lifecycle state machine (FSM, idempotency dedup, cascade-cancel, group seal/resolve/fail-fast, retain-turn waiters, WatchGroup fan-out) moves verbatim into internal/tasks/engine as *Engine, parameterized by a Backend persistence seam. Both drivers construct an *Engine over a backend and return it (*Engine satisfies tasks.TaskRegistry). Rationale: the alternative — a standalone durable driver duplicating ~2000 lines of lifecycle logic (the literal "mirror events/drivers/durable" framing of the plan) — creates a maintenance fork that drifts, and a durable-imports-inprocess shortcut would leave a stale "no I/O" assumption sitting next to shared code. A neutrally-named engine package signals "this must hold for any backend, including a slow one." The shared conformance suite (D-031) runs against the engine and BOTH drivers, so the single state machine stays contract-locked.

  2. Backend seam. SaveTask(ctx, TaskRecord) / SaveGroup / SavePatch / Hydrate(ctx) (Snapshot, error). The engine never imports the StateStore; it persists only through the backend and rebuilds its indices from Hydrate in New. TaskRecord carries the task plus the engine's idempotency content hash out-of-band — the hash is computed over PRE-redaction content and cannot be re-derived from the post-redaction fields the task stores, so it is persisted explicitly (hex) and replayed; recomputing from stored fields would falsely reject a genuine retry after a restart when the redactor had erased caller tokens. The in-process driver's ephemeralBackend writes the same fixed-Kind slots as before (byte-identical behaviour) and Hydrate returns empty; the durable driver's backend keys each record in its own slot and replays it.

  3. Per-record keying + maintenance-scan hydrate. The durable backend writes task.durable.task/<id>, task.durable.group/<id>, task.durable.patch/<id> (disjoint from the durable event log's events.durable.* and the in-process task.lifecycle Kinds), each under the record's session-scoped identity (RunID dropped from the key; the full identity is preserved in the bytes). Hydrate scans by Kind prefix via StateStore.ListKind under an explicit MaintenanceScoped claim — a boot-time maintenance read; each record's bytes carry its own identity, so cross-identity scanning never widens the isolation boundary (the engine re-keys everything by the record's own identity).

  4. Shared StateStore, NOT owned StateDriver/StateDSN config (§4.3 deviation). The plan called for adding StateDriver/StateDSN to TasksConfig mirroring EventsConfig. Dropped as unnecessary: tasks.Open ALWAYS passes the runtime's shared deps.Store (internal/runtime/assemble), so the durable driver simply uses it; the owned-store machinery exists in events only because events.OpenDriver can be called without deps. The config change reduces to adding durable to allowedTasksDrivers. Cross-process survival therefore requires a durable state.driver (sqlite/postgres); with state.driver: inmem records survive only an in-process driver reopen — documented in docs/CONFIG.md, the example configs, and the define-the-agent-yaml skill.

  5. Recovery posture: Failed{code: "runtime_restarted"}, no FSM/wire change (settled with the operator). The open-time recovery sweep (Engine.RecoverInterruptedTasks, run by the durable driver in New) transitions every task left StatusRunning by a crash to the existing StatusFailed with the reserved error code runtime_restarted, emitting one task.failed event each — rather than widening the FSM/Protocol enum with a dedicated StatusRecovered. It recovers the record, not execution (auto-re-drive is a deferred runloop/steering concern, D-097). StatusPaused (a durable HITL/OAuth wait by design) and StatusPending (never started; still discoverable) are left untouched. It reuses the normal terminal-transition path, so a recovered group member correctly drives the group resolve gate (fail-fast cascade / sealed-group resolution). After the Running sweep it ALSO re-evaluates every non-terminal group's resolve gate against current member terminality (reconcileGroupsLocked) — healing a group whose resolution was computed in a prior session but whose group-record persist failed before the crash (members durably terminal, group durably non-terminal); without this, a diverged Sealed group would never re-resolve and a post-restart WatchGroup would never fire (caught by the second adversarial pass; the plan's Risks section requires exactly this recompute). Idempotent on re-open: a recovered task is persisted as Failed and a reconciled group as terminal, so a second open is not a sweep candidate and never double-writes or double-emits.

  6. No-StateStore posture: fail loud at boot (settled with the operator). Selecting tasks.driver: durable with no StateStore wired returns a boot error naming the missing state.driver rather than silently degrading to non-durable behaviour — an operator who asked for durability must get it (§13 "no silent stub default"). The default/unset path stays inprocess and needs no store. (This intentionally diverges from the durable events driver, which degrades-loudly to a ring buffer; tasks fail closed.)

  7. tasks.ErrUnserializable (new additive sentinel). A record that cannot be marshalled raises a wrapped tasks.ErrUnserializable from both backends (never a silent drop / nil record) — the §5 fail-loud contract. Additive only; the frozen TaskRegistry interface is unchanged.

Known properties. (1) The engine persists while holding its single RWMutex, so a durable backend serializes task mutations on that engine instance behind each StateStore write. Acceptable for the single-instance restart-survival this phase targets, and consistent with the durable event log (which persists under publishMu); a finer-grained scheme is a later concern if contention is measured. (2) Hydrate eagerly loads every persisted task/group/patch (across all identities, via a maintenance-scoped prefix scan) into the engine's in-memory maps on open — bounded by the live record set, fine for single-instance deployments, and the natural consequence of the engine being an in-memory state machine; a lazy/paged hydrate or a retention/GC policy for terminal records is the scaling follow-up (the durable event log accumulates the same way). (3) Error-path atomicity (added after the adversarial review): a mutating call that fails to persist rolls its in-memory mutation back (status / result / error / priority / tool-count), and a spawn whose group persist fails compensates the already-written task record through Backend.DeleteTask — and the group itself is pre-validated (exists / same-session / open) BEFORE the task is persisted — so a restart never replays a half-applied mutation or resurrects a half-wired spawn the caller was told failed. (4) Two recorded boundaries: AcknowledgeBackground state is in-memory only (no Acknowledged field on the persisted task record), so a previously-acknowledged terminal background task reads as un-acknowledged after a restart and can be re-acked (re-emitting task.background_acknowledged) — a documented limitation, durable ack is a follow-up; and a DOUBLE fault (group persist fails AND the compensating Backend.DeleteTask also fails) surfaces the compounded error but can leave a Pending orphan task in the store that resurrects on restart (benign — recovery ignores Pending, it is only List-discoverable) — the irreducible residue of a non-transactional per-record store.

§4.3 deviations. (a) Engine extraction instead of a standalone durable driver (item 1). (b) Shared StateStore instead of StateDriver/StateDSN config fields (item 4). Both are documented simplifications that still satisfy every acceptance criterion; neither reaches into RFC territory.

Tests. The durable driver passes internal/tasks/conformancetest.Run verbatim (the D-031 gate); restart-survival (tasks + groups + patches intact after close→reopen over the same store, identity isolation preserved, idempotency-key dedup preserved across restart); recovery sweep (Running→Failed{runtime_restarted}, exactly one event, idempotent on a second reopen, Paused/Pending untouched); the D-025 concurrent-reuse gate (N=128 against one shared driver under -race) + a goroutine-leak test; fail-loud (nil StateStore at boot, ErrUnserializable on a malformed record); and the §17 integration test test/integration/durable_tasks_test.go over REAL StateStore drivers (in-memory + a file-backed SQLite true restart) with a real EventBus on the seam, identity propagation, and a forced write-error failure mode — all under -race. The engine carries its own conformance + recovery + nil-arg + persist-error tests.

Cross-references. D-006 (background-task persistence deferred — closed here), D-074 / D-064 (the durable event log precedent + the Evaluations program durability rationale this mirrors), D-031 (the shared TaskRegistry conformance gate every driver inherits), D-025 (compiled-artifact concurrent-reuse contract — the engine is the reusable artifact), D-097 (the dead-task / re-drive lineage recovery defers to), D-027 (the typed-wrapper-over-StateStore persistence pattern). RFC §6.8 (TaskRegistry), §12 (durable backends roadmap). brief 05 (unified foreground/background task namespace, at-least-once idempotency on (TaskID, Edge, EventID)). CLAUDE.md §4.4 (the driver seam), §5 (fail-loud, immutable compiled artifacts), §6 (identity isolation — scoping unchanged), §13 (no silent stub default, no parallel implementations — the engine extraction is the "pick one and deepen it" response), §17 (the real-driver integration test). Plan: docs/plans/phase-87-durable-taskservice-backend.md.


D-229 — Phase 86: durable distributed bus backend — a StateStore-backed MessageBus driver with cross-instance fan-out

Date: 2026-06-18

Status: Accepted

Context. The MessageBus (RFC §6.12) is the at-least-once cross-worker fan-out edge. V1 shipped the contract + an in-process loopback driver (Phase 22) but no durable backend — RFC §6.12 / §12 name "NATS, Redis Streams, Postgres-as-queue" as the post-V1 driver set, and D-009 deferred a durable backend to "post-V1 once the operational shape is clear." Phase 86 ships the first durable driver. The MessageBus is a publish-only contract (Publish + Close); consumption is via projection onto the typed events.EventBus (the loopback driver established this — a distributed.bus_envelope event per publish). Note the seam is still contracts-only in production: nothing in the runtime opens a MessageBus yet (no OpenBus in assemble/cmd), so like loopback, the durable driver is registered + conformance/integration-tested, ready for when a production bus consumer lands.

Decision.

  1. StateStore-backed driver, NATS / Redis deferred (the operator-steered scope). internal/distributed/drivers/durable persists every BusEnvelope as one StateStore record keyed by a fresh time-ordered ULID under distributed.bus.entry/<ulid> (disjoint from the durable event log's events.durable.* and the durable task driver's task.durable.*), and projects it onto the local events.EventBus. On a shared Postgres store this is Postgres-as-queue across instances; on SQLite it is single-instance restart-replay. NATS / Redis Streams remain future drivers in the same set — each pulls a new client dependency, which RFC §10 requires be added via an RFC PR first; this driver reuses the existing pgx / modernc.org/sqlite deps (no new dependency). Mirrors the Phase 57 durable events + Phase 87 durable tasks precedent.

  2. The bus-projection contract is promoted to the distributed package. EventTypeDistributedBusEnvelope + BusEnvelopePayload (and the events.RegisterEventType registration) moved from internal/distributed/drivers/loopback into internal/distributed/projection.go. Both drivers (and the conformance suite + the protocol-docs generator) reference the one contract, so the durable driver projects the IDENTICAL event a loopback-era subscriber already consumes — and no driver imports another driver to reach it (the same "no smell in package decisions" discipline as the Phase 87 engine extraction, D-228). The event-type STRING (distributed.bus_envelope) is unchanged, so the generated Protocol docs are byte-identical.

  3. Cross-instance + restart-replay via a background poller; self-dedup in memory. A poller goroutine (started at New, joined at Close) scans the shared store (ListKind, maintenance-scoped, over the entry prefix) on a ticker and projects every entry it has not already delivered onto the local event bus, in ULID order. The instance that publishes an envelope reserves its entry key in an in-memory projected set BEFORE persisting + projects it locally immediately, so the poller never re-projects this instance's own publish. The projected set is NOT persisted: across a restart it is empty, so a fresh instance re-projects the persisted history (restart-replay) — at-least-once, consumers dedupe on (TaskID, Edge, EventID) per the contract. Cross-instance delivery: instance B's poller projects instance A's entries (not in B's set). Poll cadence is the operator-tunable distributed.bus_poll_interval (default 1s). A Postgres LISTEN/NOTIFY push fast-path is a recorded future optimization behind this same driver (the StateStore exposes no change-notification primitive, so delivery is poll-based here).

  4. Shared store via Dependencies.State; no production wiring in this phase. distributed.Dependencies gains a State state.StateStore field (the durable driver reads it; loopback ignores it), mirroring Phase 87's shared-store decision (D-228). Because the MessageBus seam has no production consumer yet (no OpenBus call in assemble/cmd), there is nothing to wire at boot — the field + the bus_poll_interval config are ready for when a production bus consumer lands. The plan's "wire State at OpenBus in assemble.go" line is therefore dropped (no such call exists); the driver is exercised by the conformance suite + the §17 integration test, exactly as loopback is.

  5. Fail loud (§13). New returns an error when no StateStore (or no EventBus) is wired — an operator who selected durable must get durability, never a silent non-durable fallback. A non-serializable envelope fails the Publish loudly (wrapped marshal error), never a silent drop.

Known properties. (1) The poller does a full prefix scan (ListKind) each interval and tracks an in-memory projected set — O(entries) per poll, growing memory; fine for single-instance + modest Postgres multi-instance, with a retention/GC policy for delivered entries as the documented scaling follow-up (the durable event log accumulates the same way; the StateStore contract has no range scan). (2) A restart re-projects the full persisted history (at-least-once; consumers dedupe) rather than resuming from a persisted cursor — the simplest correct model for V1; a persisted per-instance cursor is a future refinement. (3) No production bus consumer exists yet, so a live-Console test is structurally N/A; the real-driver §17 integration test (two instances over one file-backed SQLite store: cross-instance fan-out + a true close/reopen restart-replay) is the end-to-end gate.

§4.3 deviations. (a) StateStore-backed only / defer NATS-Redis (item 1; the operator's steer). (b) Projection-contract promotion to the distributed package (item 2; not in the plan's file list, adopted for the no-driver-imports-driver discipline). (c) No assemble.go wiring — the seam is unconsumed in production (item 4). (d) The live test is a real-process / real-driver integration test, not a Console test (the bus has no Console/Protocol/dev surface).

Tests. The durable driver passes internal/distributed/conformancetest.RunBus verbatim (the D-031 gate: at-least-once local delivery, mandatory identity, publish-after-close, the 128-worker no-race run, goroutine-leak-after-close); restart-replay + cross-instance fan-out (with a no-self-double-project assertion); the D-025 concurrent-reuse + cross-session-isolation gate (3 sessions × 40 = 120 concurrent publishers on one shared bus, each session's subscriber receives exactly its own) + a goroutine-leak test; fail-loud (nil store, nil event bus, unserializable envelope, ctx-cancelled, persist-error); poller error branches (ListKind error, corrupt entry) via an internal test; and the §17 integration test test/integration/durable_bus_test.go over REAL StateStore drivers (in-memory + a file-backed SQLite modelling two instances over one store) with real EventBus instances, identity isolation, and a forced write-error failure mode — all under -race. Coverage on internal/distributed/drivers/durable is 97%.

Cross-references. D-009 (durable distributed backend deferred — realised here), D-031 (the shared MessageBus conformance gate every driver inherits + the (TaskID, Edge, EventID) idempotency contract), D-074 (the durable event log precedent this mirrors — head/entry StateStore keying, fail-loud-on-empty-store), D-228 (the Phase 87 durable tasks sibling: shared-store decision + the no-driver-imports-driver discipline), D-025 (compiled-artifact concurrent-reuse contract — the bus is the reusable artifact), D-027 (the typed-wrapper-over-StateStore persistence pattern). RFC §6.12 (the MessageBus / BusEnvelope contract), §12 (durable backends roadmap). brief 05 (distributed contracts; at-least-once idempotency on (TaskID, Edge, EventID) — brief 05 Q-4). CLAUDE.md §4.4 (the driver seam), §5 (fail-loud, immutable compiled artifacts), §6 (identity isolation), §13 (no silent stub default, no parallel implementations — the projection-contract promotion is the "pick one and deepen it" response), §17 (the real-driver integration test). Plan: docs/plans/phase-86-durable-distributed-bus.md.

Apache-2.0 licensed — see LICENSE.