Configure memory + runtime skills

Two subsystems that look similar but solve different problems:

Memory — multi-turn context within a session. Lets the agent remember what it said three turns ago without re-reading the whole event log every step.
Runtime skills — token-savvy, DB-backed playbooks the planner can search by name and inject into a prompt mid-reasoning. Distinct from "operator skills" (the docs/skills/ directory you're reading) — runtime skills are mechanism inside the planner, not docs for humans.

Both subsystems share a key contract: identity-scoped by (tenant, user, session) — the same multi-isolation triple that gates everything else in Harbor. No cross-session leakage. Ever.

1. Memory — strategies + drivers

Memory has two axes you tune independently:

Strategy (memory.strategy) — how the planner uses memory each turn.
Driver (memory.driver) — where memory is stored.

Strategies

Strategy	When to use
`none` (default)	Single-turn agents. No memory; each run starts cold.
`truncation`	Chat agents with short windows. Keep last N messages; drop older verbatim.
`rolling_summary`	Long-running chat agents. Summarise older turns; keep recent N verbatim.

rolling_summary is the sweet spot for chatbots — it preserves the conversation arc without blowing the context window. The summariser is the same LLM as the planner (Bifrost reuses the configured provider).

Drivers

Driver	When to use
`inmem`	Dev. Memory dies on `harbor dev` restart.
`sqlite`	Single-node production. Survives restarts. Default for self-hosted agents.
`postgres`	Multi-replica production. Use behind a load balancer.

Example: chat agent with rolling summary on SQLite

yaml

memory:
  driver: sqlite
  dsn: /tmp/harbor-validation/my-agent-memory.sqlite   # outside the project dir (WAL trap)
  strategy: rolling_summary
  budget_tokens: 8000          # max tokens the planner replays per turn (0 = unbounded)
  recovery_backlog_max: 16     # bounded queue for the summariser's recovery loop (default 16)

budget_tokens is the hard cap — once a conversation exceeds it, older turns are summarised together into one assistant-role message while recent turns stay verbatim. The planner sees: [summary of turns 1-12] [turn 13] [turn 14] ... [turn 18]. recovery_backlog_max bounds the rolling_summary recovery loop's queue; on overflow it drops the oldest and emits memory.recovery_dropped. Both knobs are ignored by the none and truncation strategies.

Opt-in semantic retrieval

memory.retrieval: semantic layers embedding-similarity search ON TOP of the strategy you picked above — it composes with rolling_summary, never replaces it. Turns are embedded as they land (AddTurn) and a SearchTurns surface ranks them by cosine; GetLLMContext keeps its normal summary + recent-turn patch. Vectors persist identity-scoped through the same state store, on all three drivers.

yaml

memory:
  driver: sqlite
  dsn: /tmp/harbor-validation/my-agent-memory.sqlite
  strategy: rolling_summary
  retrieval: semantic        # opt-in; composes with the strategy
  retrieval_top_k: 5         # optional result cap (default 5)
  retrieval_min_score: 0.0   # cosine similarity floor [-1, 1]; 0.0 is the default

embeddings:                  # REQUIRED when any retrieval is semantic
  provider: openai
  model: text-embedding-3-small
  api_key: env.OPENAI_API_KEY

The embeddings: block is the embedding model/provider pair — configured separately from the chat llm block (they routinely come from different providers). Enabling a semantic mode without it fails validation loudly, naming the missing keys; there is no silent fallback to non-semantic retrieval and no mock embeddings driver.

When memory.retrieval: semantic is set the run loop calls SearchTurns on every task, applies the retrieval_min_score floor, deduplicates against the recent-turn window, caps each recalled turn at 2 KiB per side, and injects the result into the prompt's <read_only_external_memory> tier. A SearchTurns error fails the run loudly (runtime_fetch_error) — there is no silent fall-back to summary-only.

Identity scoping

Every memory write/read is keyed by (tenant_id, user_id, session_id). The planner cannot read user A's memory from user B's session — the SQL WHERE clause filters before the rows reach the planner. This is enforced at the driver level, not at the planner; even a buggy planner cannot leak cross-session.

2. Runtime skills — DB-backed playbooks the planner searches

Runtime skills are typed, token-savvy reusable patterns the planner can ask for by name mid-reasoning. They originate from two sources:

Skills.md importer — you write a Skills.md file (one skill per file: YAML frontmatter + a ## Steps body) and ingest it with harbor skill import <path>.
In-runtime generator — the planner itself can author a new skill at runtime (e.g. "this kind of question seems common — let me save the steps as a skill") and persist it via the skill_propose built-in (opt-in; see below).

Both sources land in the same SQLite-backed catalog.

Example: a Skills.md file

One skill per file. The frontmatter trigger: is the planner-visible match cue (mandatory), ## Steps needs at least one item; ## Preconditions and ## Failure modes are optional sections.

markdown

---
name: triage-incident
title: Triage an incident
trigger: when a support ticket needs classification
tags: [support, triage]
---
Classify a support ticket into {bug, feature, question} and recommend the next action.

## Steps

- Read the user's report.
- Match against known categories.
- If "bug", pull the last 5 PRs that touched the area.

Import / remove with the CLI

console

$ harbor skill import ./skills/triage-incident.skill.md
imported "triage-incident" (scope=project, steps=3, attachments=0)
store: driver=localdb dsn=/tmp/harbor-validation/my-agent-skills.sqlite

$ harbor skill rm triage-incident
removed "triage-incident"
store: driver=localdb dsn=/tmp/harbor-validation/my-agent-skills.sqlite

Behaviour you can rely on:

The verbs resolve the store from harbor.yaml's skills: block — the same store harbor dev serves — and print the resolved driver=… dsn=… so you can see exactly where the skill landed. Pass --config <path> for a non-default config.
Skills are identity-scoped ((tenant, user, session)). The verbs default to the harbor dev triple (dev/dev/dev); use --tenant / --user / --session for anything else.
A duplicate name is rejected with exit 1 ("pass --overwrite to replace, or harbor skill rm <name> first"); --overwrite replaces it. An invalid file (missing frontmatter, empty trigger:, no steps) exits 1 with the validator's reason. rm of a missing name exits 1.
The global --json flag switches both verbs to a machine-readable result ({"result":"imported","driver":…,"dsn":…,"report":{…}}).
Inline attachments (![alt](relative/path)) resolve relative to the file's own directory and upload to the configured artifact store; paths escaping that directory are rejected.

Go-level: the verbs are thin callers over importer.ImportAndStore — a headless embedder calls the same function (see docs/recipes/use-memory-and-skills-from-go.md).

Once the skills are in the catalog, the planner sees them at reasoning time two ways: a bounded per-turn <skills_context> browse window (the skills directory — see below) and on-demand retrieval via the skill_search / skill_get meta-tools — token-savvy because full skill bodies only enter the prompt when the LLM actually pulls them.

Yaml config

yaml

skills:
  driver: localdb
  dsn: /tmp/harbor-validation/my-agent-skills.sqlite    # WAL trap caveat applies
  # retrieval: semantic            # optional — skill_search ranks by embedding
  #                                # similarity instead of the FTS5/regex/exact
  #                                # ladder (requires the embeddings: block; the
  #                                # capability filter, redaction, and budgeter
  #                                # apply unchanged on top)
  directory:                       # optional — shapes the per-turn <skills_context> block
    pinned: [triage-incident]      # always listed first, in this order
    max_entries: 10                # 0/unset → planner.skills_context_max (default 5)
    selection: pinned_then_recent  # the one wired value (pinned_then_top is rejected: not yet wired)

tools:
  built_in:
    - skill_search    # the LLM discovers runtime skills by capability text
    - skill_get       # the LLM pulls the full bodies of named skills
    - skill_list      # the LLM enumerates the catalog (paged, summary-only)
    # - skill_propose # OPT-IN: lets the LLM author + persist new skills

LLM-side discovery via meta-tools

The React planner runs on native provider tool-calling: the LLM doesn't ask "what skills do I have?" in prose — it calls the skill_search built-in when it needs one. The meta-tools are the rich skills handlers (capability filter + redaction + token budgeter) over the SAME store your harbor skill import populated — one source of truth, identity-scoping carries through:

skill_search(query, limit?) — ranked candidates, capability-filtered to the tools this run can actually see (a skill requiring a tool the run isn't granted never surfaces).
skill_get(names[], max_tokens?) — full bodies, budget-fit through a tiered ladder (full → drop optional sections → cap steps at 3); an impossibly small budget errs loudly rather than silently truncating.
skill_list(scope?, task_type?, tags?, limit?, offset?) — paged enumeration, summary-only.
skill_propose({skill, persist}) — the in-runtime generator. Deliberately opt-in (list it in tools.built_in only when you want the LLM persisting skills): persisted skills are stamped Origin=generated, can never overwrite an imported pack skill, and every persist emits a mandatory redacted skill.proposed audit event.

The per-turn skills directory (`<skills_context>`)

Independent of the meta-tools, every planner turn carries a compact <skills_context> block produced by the skills directory: a bounded, stable, pinned-then-recent browse window over the catalog (name / title / trigger / task type / pinned flag — never full bodies). The block tells the LLM what exists; pulling content is skill_get's job. Tune it with the skills.directory yaml block above — pinned guarantees your flagship skills are always visible, max_entries caps the budget, and the stable ordering keeps the prompt prefix KV-cache-friendly. Capability filtering applies to the directory too: pinning never bypasses it.

Skill vs tool — when to pick which

Tool — there's code to run, an API to call, a typed input/output. Build a tool.
Skill — there's a reasoning pattern the planner should follow (a recipe, a checklist, a domain heuristic). Build a skill.
Both — many real agents do both. A triage-incident skill whose step 4 says "call the ticket.find_related_prs tool" reaches into both subsystems.

3. Operator-skill vs runtime-skill — the naming clarification

docs/skills/ (what you're reading right now) holds operator playbooks — markdown docs for humans building agents. They are NOT loaded into the planner at runtime; they're adoption material.

internal/skills/ (RFC §6.7) holds the runtime skill subsystem — the SQLite catalog, the Skills.md importer, the in-runtime generator, the planner's mid-reasoning skill lookup path.

The two are unrelated. The glossary entry pins this distinction (docs/glossary.md → "skill (operator)" vs "skill (runtime)"). Don't conflate them.

Common failure modes

Memory blows the token budget mid-conversation. Lower budget_tokens OR switch strategy from truncation to rolling_summary. The summariser uses ~1500 tokens of LLM per turn but saves ~5000 tokens of payload.
harbor dev reboots in a loop after enabling memory. Your memory.dsn is inside the project directory and the SQLite WAL trap fires. Move the DSN to /tmp/harbor-validation/<project>-memory.sqlite or ~/.harbor/<project>-memory.sqlite.
harbor skill import fails with "skill name already exists". The catalog rejects duplicate names by default. Re-import with --overwrite, remove the old entry first (harbor skill rm <name>), or rename the skill in the file.
The planner doesn't pick a skill I imported. Either the skill's trigger: doesn't pattern-match the user's input (write more concrete trigger language), the run can't see a tool the skill requires (required_tools is capability-filtered — default-deny), or planner.max_steps is too low to reach the skill-search turn. Pin it (skills.directory.pinned) to guarantee it's at least visible in every <skills_context> block.
Cross-session memory leakage suspected. It can't happen — the SQL filter is at the driver. If you see it, file a bug with the SQL trace from telemetry.log_level: debug — a leak would be a P0 security issue.

Configure memory + runtime skills ​

1. Memory — strategies + drivers ​

Strategies ​

Drivers ​

Example: chat agent with rolling summary on SQLite ​

Opt-in semantic retrieval ​

Identity scoping ​

2. Runtime skills — DB-backed playbooks the planner searches ​

Example: a Skills.md file ​

Import / remove with the CLI ​

Yaml config ​

LLM-side discovery via meta-tools ​

The per-turn skills directory (<skills_context>) ​

Skill vs tool — when to pick which ​

3. Operator-skill vs runtime-skill — the naming clarification ​

Common failure modes ​

See also ​

Configure memory + runtime skills

1. Memory — strategies + drivers

Strategies

Drivers

Example: chat agent with rolling summary on SQLite

Opt-in semantic retrieval

Identity scoping

2. Runtime skills — DB-backed playbooks the planner searches

Example: a Skills.md file

Import / remove with the CLI

Yaml config

LLM-side discovery via meta-tools

The per-turn skills directory (`<skills_context>`)

Skill vs tool — when to pick which

3. Operator-skill vs runtime-skill — the naming clarification

Common failure modes

See also