Configure memory + runtime skills
Two subsystems that look similar but solve different problems:
- Memory — multi-turn context within a session. Lets the agent remember what it said three turns ago without re-reading the whole event log every step.
- Runtime skills — token-savvy, DB-backed playbooks the planner can search by name and inject into a prompt mid-reasoning. Distinct from "operator skills" (the docs/skills/ directory you're reading) — runtime skills are mechanism inside the planner, not docs for humans.
Both subsystems share a key contract: identity-scoped by (tenant, user, session) — the same multi-isolation triple that gates everything else in Harbor. No cross-session leakage. Ever.
1. Memory — strategies + drivers
Memory has two axes you tune independently:
- Strategy (
memory.strategy) — how the planner uses memory each turn. - Driver (
memory.driver) — where memory is stored.
Strategies
| Strategy | When to use |
|---|---|
none (default) | Single-turn agents. No memory; each run starts cold. |
truncation | Chat agents with short windows. Keep last N messages; drop older verbatim. |
rolling_summary | Long-running chat agents. Summarise older turns; keep recent N verbatim. |
rolling_summary is the sweet spot for chatbots — it preserves the conversation arc without blowing the context window. The summariser is the same LLM as the planner (Bifrost reuses the configured provider).
Drivers
| Driver | When to use |
|---|---|
inmem | Dev. Memory dies on harbor dev restart. |
sqlite | Single-node production. Survives restarts. Default for self-hosted agents. |
postgres | Multi-replica production. Use behind a load balancer. |
Example: chat agent with rolling summary on SQLite
memory:
driver: sqlite
dsn: /tmp/harbor-validation/my-agent-memory.sqlite # outside the project dir (WAL trap)
strategy: rolling_summary
budget_tokens: 8000 # max tokens the planner replays per turn (0 = unbounded)
recovery_backlog_max: 16 # bounded queue for the summariser's recovery loop (default 16)budget_tokens is the hard cap — once a conversation exceeds it, older turns are summarised together into one assistant-role message while recent turns stay verbatim. The planner sees: [summary of turns 1-12] [turn 13] [turn 14] ... [turn 18]. recovery_backlog_max bounds the rolling_summary recovery loop's queue; on overflow it drops the oldest and emits memory.recovery_dropped. Both knobs are ignored by the none and truncation strategies.
Opt-in semantic retrieval
memory.retrieval: semantic layers embedding-similarity search ON TOP of the strategy you picked above — it composes with rolling_summary, never replaces it. Turns are embedded as they land (AddTurn) and a SearchTurns surface ranks them by cosine; GetLLMContext keeps its normal summary + recent-turn patch. Vectors persist identity-scoped through the same state store, on all three drivers.
memory:
driver: sqlite
dsn: /tmp/harbor-validation/my-agent-memory.sqlite
strategy: rolling_summary
retrieval: semantic # opt-in; composes with the strategy
retrieval_top_k: 5 # optional result cap (default 5)
retrieval_min_score: 0.0 # cosine similarity floor [-1, 1]; 0.0 is the default
embeddings: # REQUIRED when any retrieval is semantic
provider: openai
model: text-embedding-3-small
api_key: env.OPENAI_API_KEYThe embeddings: block is the embedding model/provider pair — configured separately from the chat llm block (they routinely come from different providers). Enabling a semantic mode without it fails validation loudly, naming the missing keys; there is no silent fallback to non-semantic retrieval and no mock embeddings driver.
When memory.retrieval: semantic is set the run loop calls SearchTurns on every task, applies the retrieval_min_score floor, deduplicates against the recent-turn window, caps each recalled turn at 2 KiB per side, and injects the result into the prompt's <read_only_external_memory> tier. A SearchTurns error fails the run loudly (runtime_fetch_error) — there is no silent fall-back to summary-only.
Identity scoping
Every memory write/read is keyed by (tenant_id, user_id, session_id). The planner cannot read user A's memory from user B's session — the SQL WHERE clause filters before the rows reach the planner. This is enforced at the driver level, not at the planner; even a buggy planner cannot leak cross-session.
2. Runtime skills — DB-backed playbooks the planner searches
Runtime skills are typed, token-savvy reusable patterns the planner can ask for by name mid-reasoning. They originate from two sources:
- Skills.md importer — you write a Skills.md file (one skill per file: YAML frontmatter + a
## Stepsbody) and ingest it withharbor skill import <path>. - In-runtime generator — the planner itself can author a new skill at runtime (e.g. "this kind of question seems common — let me save the steps as a skill") and persist it via the
skill_proposebuilt-in (opt-in; see below).
Both sources land in the same SQLite-backed catalog.
Example: a Skills.md file
One skill per file. The frontmatter trigger: is the planner-visible match cue (mandatory), ## Steps needs at least one item; ## Preconditions and ## Failure modes are optional sections.
---
name: triage-incident
title: Triage an incident
trigger: when a support ticket needs classification
tags: [support, triage]
---
Classify a support ticket into {bug, feature, question} and recommend the next action.
## Steps
- Read the user's report.
- Match against known categories.
- If "bug", pull the last 5 PRs that touched the area.Import / remove with the CLI
$ harbor skill import ./skills/triage-incident.skill.md
imported "triage-incident" (scope=project, steps=3, attachments=0)
store: driver=localdb dsn=/tmp/harbor-validation/my-agent-skills.sqlite
$ harbor skill rm triage-incident
removed "triage-incident"
store: driver=localdb dsn=/tmp/harbor-validation/my-agent-skills.sqliteBehaviour you can rely on:
- The verbs resolve the store from
harbor.yaml'sskills:block — the same storeharbor devserves — and print the resolveddriver=… dsn=…so you can see exactly where the skill landed. Pass--config <path>for a non-default config. - Skills are identity-scoped (
(tenant, user, session)). The verbs default to theharbor devtriple (dev/dev/dev); use--tenant/--user/--sessionfor anything else. - A duplicate name is rejected with exit 1 ("pass --overwrite to replace, or
harbor skill rm <name>first");--overwritereplaces it. An invalid file (missing frontmatter, emptytrigger:, no steps) exits 1 with the validator's reason.rmof a missing name exits 1. - The global
--jsonflag switches both verbs to a machine-readable result ({"result":"imported","driver":…,"dsn":…,"report":{…}}). - Inline attachments (
) resolve relative to the file's own directory and upload to the configured artifact store; paths escaping that directory are rejected.
Go-level: the verbs are thin callers over importer.ImportAndStore — a headless embedder calls the same function (see docs/recipes/use-memory-and-skills-from-go.md).
Once the skills are in the catalog, the planner sees them at reasoning time two ways: a bounded per-turn <skills_context> browse window (the skills directory — see below) and on-demand retrieval via the skill_search / skill_get meta-tools — token-savvy because full skill bodies only enter the prompt when the LLM actually pulls them.
Yaml config
skills:
driver: localdb
dsn: /tmp/harbor-validation/my-agent-skills.sqlite # WAL trap caveat applies
# retrieval: semantic # optional — skill_search ranks by embedding
# # similarity instead of the FTS5/regex/exact
# # ladder (requires the embeddings: block; the
# # capability filter, redaction, and budgeter
# # apply unchanged on top)
directory: # optional — shapes the per-turn <skills_context> block
pinned: [triage-incident] # always listed first, in this order
max_entries: 10 # 0/unset → planner.skills_context_max (default 5)
selection: pinned_then_recent # the one wired value (pinned_then_top is rejected: not yet wired)
tools:
built_in:
- skill_search # the LLM discovers runtime skills by capability text
- skill_get # the LLM pulls the full bodies of named skills
- skill_list # the LLM enumerates the catalog (paged, summary-only)
# - skill_propose # OPT-IN: lets the LLM author + persist new skillsLLM-side discovery via meta-tools
The React planner runs on native provider tool-calling: the LLM doesn't ask "what skills do I have?" in prose — it calls the skill_search built-in when it needs one. The meta-tools are the rich skills handlers (capability filter + redaction + token budgeter) over the SAME store your harbor skill import populated — one source of truth, identity-scoping carries through:
skill_search(query, limit?)— ranked candidates, capability-filtered to the tools this run can actually see (a skill requiring a tool the run isn't granted never surfaces).skill_get(names[], max_tokens?)— full bodies, budget-fit through a tiered ladder (full → drop optional sections → cap steps at 3); an impossibly small budget errs loudly rather than silently truncating.skill_list(scope?, task_type?, tags?, limit?, offset?)— paged enumeration, summary-only.skill_propose({skill, persist})— the in-runtime generator. Deliberately opt-in (list it intools.built_inonly when you want the LLM persisting skills): persisted skills are stampedOrigin=generated, can never overwrite an imported pack skill, and every persist emits a mandatory redactedskill.proposedaudit event.
The per-turn skills directory (<skills_context>)
Independent of the meta-tools, every planner turn carries a compact <skills_context> block produced by the skills directory: a bounded, stable, pinned-then-recent browse window over the catalog (name / title / trigger / task type / pinned flag — never full bodies). The block tells the LLM what exists; pulling content is skill_get's job. Tune it with the skills.directory yaml block above — pinned guarantees your flagship skills are always visible, max_entries caps the budget, and the stable ordering keeps the prompt prefix KV-cache-friendly. Capability filtering applies to the directory too: pinning never bypasses it.
Skill vs tool — when to pick which
- Tool — there's code to run, an API to call, a typed input/output. Build a tool.
- Skill — there's a reasoning pattern the planner should follow (a recipe, a checklist, a domain heuristic). Build a skill.
- Both — many real agents do both. A
triage-incidentskill whose step 4 says "call theticket.find_related_prstool" reaches into both subsystems.
3. Operator-skill vs runtime-skill — the naming clarification
docs/skills/ (what you're reading right now) holds operator playbooks — markdown docs for humans building agents. They are NOT loaded into the planner at runtime; they're adoption material.
internal/skills/ (RFC §6.7) holds the runtime skill subsystem — the SQLite catalog, the Skills.md importer, the in-runtime generator, the planner's mid-reasoning skill lookup path.
The two are unrelated. The glossary entry pins this distinction (docs/glossary.md → "skill (operator)" vs "skill (runtime)"). Don't conflate them.
Common failure modes
- Memory blows the token budget mid-conversation. Lower
budget_tokensOR switch strategy fromtruncationtorolling_summary. The summariser uses ~1500 tokens of LLM per turn but saves ~5000 tokens of payload. harbor devreboots in a loop after enabling memory. Yourmemory.dsnis inside the project directory and the SQLite WAL trap fires. Move the DSN to/tmp/harbor-validation/<project>-memory.sqliteor~/.harbor/<project>-memory.sqlite.harbor skill importfails with "skill name already exists". The catalog rejects duplicate names by default. Re-import with--overwrite, remove the old entry first (harbor skill rm <name>), or rename the skill in the file.- The planner doesn't pick a skill I imported. Either the skill's
trigger:doesn't pattern-match the user's input (write more concrete trigger language), the run can't see a tool the skill requires (required_toolsis capability-filtered — default-deny), orplanner.max_stepsis too low to reach the skill-search turn. Pin it (skills.directory.pinned) to guarantee it's at least visible in every<skills_context>block. - Cross-session memory leakage suspected. It can't happen — the SQL filter is at the driver. If you see it, file a bug with the SQL trace from
telemetry.log_level: debug— a leak would be a P0 security issue.
See also
define-the-agent-yaml— thememory:andskills:blocks in context.add-an-in-process-tool— when a skill becomes "actually run code".observe-with-the-console— the Memory tab + the Skills tab show what the planner saw on each turn.docs/recipes/embed-and-retrieve.md— the embedding client à la carte + both semantic-retrieval opt-ins from Go.- RFC §6.7 — the runtime skill subsystem design.
- RFC §6.6 — the memory subsystem design.