Align claude memory spec#26
Closed
fl4p wants to merge 50 commits into
Closed
Conversation
Claude Code's current memory subsystem stores each memory as a node in its memory graph, with frontmatter nested under `metadata:` (node_type/type/originSessionId) rather than the legacy flat top-level `type:`. The flat form is no longer byte-compatible with files Claude Code writes, breaking the shared-store premise. - buildFrontmatter: emit nested `metadata` block (node_type: memory, type, optional originSessionId) - saveMemory + memory_save tool: thread ToolContext.sessionID through as originSessionId provenance - FRONTMATTER_EXAMPLE: show nested metadata.type (matching the spec, which lets the tool add node_type/originSessionId automatically) - parseFrontmatter already reads both nested and legacy flat forms, so existing memories need no migration gap.md: correct the frontmatter row (it claimed parity; it was flat).
Removes the "avoid writing memories that could be viewed as a negative judgement or that are not relevant to the work" sentence from the user type description. The relevance filter is already covered by the when_to_save/how_to_use framing and the "What NOT to save" section.
The "## Recalled Memories" block is injected into the system prompt (OpenCode's equivalent of Claude Code's <system-reminder> memory blocks). Harden its intro so the agent does not follow directions embedded in a memory or let memory content override the user's actual request, and reiterate that memories reflect what was true when written.
…pointer opt-out, harness sidecar - buildFrontmatter stamps a top-level created: (auto-now, caller-overridable so the dream can preserve the oldest source's date); memory_read/memory_list and the memory_save tool surface/accept it. - AUTODREAM_PROMPT rewritten to Claude Code's §F: conservative delete+collapse, immutability (delete-then-save), one-fact-per-file, type discipline, preserve created; mandate the memory_* tools (the store is outside the worktree, so the builtin file tools error). - EXTRACT_PROMPT: user-memory capture is opt-in (OPENCODE_MEMORY_CAPTURE_USER, default off); credential location *references* are allowed by default with secret values never written, opt-out via OPENCODE_MEMORY_REDACT_CRED_POINTERS; PROVENANCE RULE to anchor project memories in a commit hash + file:line. - harness_feedback sidecar tool (Obj1) writes outside the memory dir. - gitignore bun.lock.
The forked extractor replays the real tool calls (so file:line is already visible) but does not surface commit hashes on its own. get_session_commits_block recovers them: it queries the session's time window and touched files from the opencode part table, git-logs every repo the session touched (working dir plus nested checkouts) over that window, and prepends a hash/time/changed-files block to EXTRACT_PROMPT. The PROVENANCE RULE then has the model cite a matching hash. Self-contained embedded python (the hook can't import repo modules), mirroring the validated replay_transcript logic. Opt-out via OPENCODE_MEMORY_PROVENANCE=0. The auto-dream fork is untouched -- it reorganizes existing memories, not a conversation. A/B on the monitor sessions (--fork with vs without the block): with it the extractor cited real nested-repo commits (c42960d89, 2ef9666a4); without it, zero hashes -- so the block is necessary, not redundant.
extract-prompt.test.ts reads bin/opencode-memory and asserts the causal conditions behind the observed extraction behavior, with no model and no keys: reference covers config-locations; a non-obvious durable fact is not skipped; the EVIDENCE RULE is scoped to user/feedback (so a reference aside needs no quote); user-capture is opt-in OFF by default (env :-0 + body instruction + DISABLED branch); secret VALUES never written while cred-location pointers stay ENABLED by default; and the memory-about-memory carve-out is present. Guards the "btw: X is Y" capture-as-reference outcome and the user-suppression / redaction / recursion invariants against a prompt edit silently regressing them. 9 tests, runs under `bun test` (suite 130 -> 139 pass).
…ocations to reference The EXTRACT recursion rule covered a memory system's design/architecture/format but not its STORAGE LOCATION. So a session that set up "project memory under .claude/, secrets in a gitignored .env" produced a feedback memory bundling the .claude/ layout + symlink chain (memory-about-the-memory-system) with the secrets convention -- and mis-typed it feedback. Extend the rule to also exclude where the memory files themselves live (.claude/ layout, symlink chains, git-tracked), and carve out that a credential/config LOCATION the user keeps in the repo is a `reference` pointer (per the REDACTION RULE), not a feedback memory. Re-validated on the two fugu memory-meta sessions: the .claude/-layout meta-content is gone (the memory is now a clean secrets-handling working rule), and the other meta session stays at 0. Grounded in the gold .claude/memory store, which keeps cred locations as `reference` (reference_creds_env_files) and has no .claude-layout memory.
Spawns the real opencode binary against an isolated XDG_CONFIG_HOME holding the bundled plugin and asserts opencode actually loaded it: the plugin's config hook injects a hidden recall agent (opencode-memory-recall), which shows up in `opencode debug config`; `--pure` is the negative control (no plugin -> no agent). Model-free and credential-free, so it needs no auth; skips cleanly when the opencode binary is absent (OPENCODE_BIN overrides) to keep CI green. Closes the gap where every other test exercised the plugin in-process with a synthetic `as never` context, or ran the shell hook against a fake opencode stub -- none proved the real runtime discovers and invokes the plugin.
…ure opt-in
An explicit in-session memory directive ("remember I'm a Go dev", "always check
X", "never do Y") is a deliberate request, not an inferred profile. The
USER-CAPTURE opt-in (off by default) was suppressing such facts in the
post-session backstop even though the live agent honors them -- so an explicit
"remember I'm X" could be silently dropped when the live save was missed
mid-task.
Add an EXPLICIT-DIRECTIVE RULE after the EVIDENCE RULE: a direct directive is
itself the evidence and PIERCES the user-capture default-off (save as type=user
even with capture off), while still NOT piercing the redaction rule or the
secrets/junk exclusions. An incidental aside (user merely mentions their
background) stays suppressed -- only a save directive pierces.
Guarded by a deterministic prompt-invariant test (no model, no keys).
…memory_list EXTRACT now tells the model to link a related memory inline with [[file-name]], matching the always-on system prompt and the gold .claude/memory store (6 of its 7 feedback memories cross-link). Crucially the link target must appear VERBATIM in memory_list output -- not constructed, guessed, or recalled from the topic or another project -- because a first pass with illustrative example slugs caused the model to invent plausible dangling links (it pulled a gold slug it knew from the domain rather than linking the actual store contents). Seed-test validated: with two related memories pre-seeded, a new distinct-but- related memory links [[project_runtime_selectable_pwm_driver_ota_caveat]] (the real related seed, verbatim), does not link the unrelated seed, and invents no slugs. The pre-fix version invented 2 dangling slugs on the same setup.
The extract pass had no granularity rule (only the dream did), so it bundled separable facts into one file -- e.g. a single reference_fugu_firmware_ota_tools packing MQTT-console + creds + device-names + OTA + build + status, where the gold .claude/memory store splits them atomically. Add a ONE-FACT RULE keyed on retrieval intent: parts that would be surfaced by different questions are different memories. Carve-out against over-splitting a single coherent fact -- a fix made of coordinated commits, or a fact with its Why/How-to-apply, stays one file. Validated: a crafted 4-fact ops session (build / OTA / creds / device-names) now produces 4 atomic memories instead of 1 bundle, grouped sensibly by intent (creds + broker together), with no over-atomization.
…ext as the user Post-session --fork extraction feeds the model the agent's own reasoning, undelimited. On the tile-fade session it read an "anchored summary" reasoning block and fabricated a `feedback` memory with an invented verbatim user quote. Add an ATTRIBUTION RULE: reasoning/assistant/tool text is the agent's output — usable for Phase 1 harness feedback, never quotable as the user; user/feedback evidence must be a real verbatim user turn. Reasoning is kept, not stripped, so Obj1 (harness-feedback) signal survives. Validated: tile session 1 fabricated memory -> 0 across 3 fork runs. Adds 5 deterministic prompt-guard tests (extract-prompt.test.ts 15/15 green).
…ut a COMMITS block Local/direct extraction (no COMMITS block injected) echoed the rule's own example hash 'a1b2c3d' as if it were a real commit. Make the example a non-copyable placeholder <commit>, and add: if no COMMITS block is present, still save the memory but omit the hash (never invent/copy/reuse one). The cloud --fork path injects a COMMITS block, so it is unaffected. +2 guards.
…r shipped) PHASE-1 OBSERVATION RULE (harness_feedback must cite an event that ACTUALLY occurred), tighten the catch-all bullet to require an OBSERVED misbehavior (not a code-review opinion about code the session discusses), and a DO/DO-NOT few-shot. Eval: no GLM regression (7/7, harness-feedback fixture 2/3 -> 3/3, precision 92 -> 100); empirically ineffective on local Qwen-30B (still fires spurious harness_feedback on 7/7) — kept for the stronger-model path. +4 prompt-guard tests (extract-prompt.test.ts 21/21).
…gin options
The model for each path was env-only (OPENCODE_MEMORY_MODEL etc.). Make it
declarable persistently in the native opencode surface — the plugin `options`
block in opencode.json — with env vars still overriding.
- src/index.ts: the plugin factory now reads its `options` (2nd arg, the native
PluginOptions bag) and resolves recallModel/recallAgent from it; env wins.
recordPluginOptions always resets on construction (no cross-instance leak).
- bin/opencode-memory: a bulletproof python reader for extractModel/extractAgent/
dreamModel/dreamAgent from the layered opencode.json (global+project, jsonc with
// /* */ comments + trailing commas, all entry shapes: "pkg" | ["pkg",{opts}] |
{package,options}, both `plugin` and `plugins` keys). Precedence: env > options >
default (dream falls back to extract). EOF-safe under `set -e` when python3 is
absent. OPENCODE_MEMORY_PRINT_SETTINGS=1 prints the resolved settings and exits.
- README: documents the options block, precedence, the object-vs-tuple version note.
- test: 3 cases covering options→recall + env override.
All 146 tests pass.
Memory is pinned to the session's repo by default. extraMemoryRoots (an opencode.json plugin option, or OPENCODE_MEMORY_EXTRA_ROOTS env which replaces it) declares additional repos whose memory: - index is surfaced read-only in the system prompt under '## Additional memory index — <path>' (no per-root recall selection), and - can be targeted by every memory_* tool via a new optional 'root' arg (read/search/list and save/delete). A root that is neither the session repo nor a declared extra root is rejected, so the model cannot write memory to an arbitrary path; roots are matched by canonical git root. In-session only — post-session extraction/dream are unchanged. +4 tests (150 pass).
Two user-facing controls for the opencode memory plugin:
1. In-repo memory mode (OPENCODE_MEMORY_LOCAL / opencode.json localMemory):
- off → always the global ~/.claude store
- on → always <repo>/.claude/memory (created if absent)
- auto → use the in-repo folder iff it already exists (default), so
`mkdir .claude/memory` opts a repo in with zero config.
Keyed by canonical git root, so subdirs/worktrees share one store.
getMemoryDir() now resolves this; extraction, dreaming, recall, and the
MEMORY.md index all follow the active directory.
2. Soft index size limit (OPENCODE_MEMORY_INDEX_MAX_LINES / indexMaxLines,
default 200, 0/off disables). When MEMORY.md reaches the limit the agent
is asked, once per session (latched by sessionID), to warn the user and
offer compaction: cluster duplicates, drop stale, shorten entries. This
is advisory and separate from the hard MAX_ENTRYPOINT_LINES truncation.
Both settings: env var > opencode.json options > default. Wrapper docs and
PRINT_SETTINGS surface local.mode and index.max_lines. README + tests
(test/paths.test.ts, test/prompt.test.ts) updated.
…test gaps Fixes from a 5-agent review of the previous commit: - Index warning default now 160 (= 80% of the hard 200-line cap) instead of 200, so the user is warned with lead time BEFORE the index is truncated rather than at the exact line truncation begins. Exposed as DEFAULT_INDEX_MAX_LINES. - parseIndexMaxLines: negative / non-finite values are treated as UNSET (fall back to the next layer) instead of being clamped to 0, so a fat-fingered "-1" can't silently disable the warning the way explicit "0"/"off" does. - getMemoryDir auto-mode now adopts the in-repo folder only when it exists AS A DIRECTORY (statSync isDirectory), so a stray file or symlink at <repo>/.claude/memory can't hijack the store or crash the first write. - README: corrected PRINT_SETTINGS example to show the real placeholder output for plugin-resolved settings (recall/local/index), not idealized values; added an in-repo-memory security caveat (memories may contain sensitive content; secret protection is prompt-level only) and clarified worktree sessions write into the main checkout's store. - Tests: exact at-limit (>=) boundary, plugin-option 0/off disable, negative fallback, fractional floor, stray-file-not-adopted, getMemoryEntrypoint follows mode, reverse env precedence. 179 pass.
Secrets are acceptable in the private global ~/.claude store but not in an in-repo .claude/memory, which may be committed/pushed (a private repo can later go public). So writes to in-repo memory now run through a deterministic credential scrub by default; the global store is untouched. - New src/redact.ts: redactSecrets / scrubMemoryFields, ported 1:1 from the Python eval harness (local_extract.py) incl. the §18a WiFi/PSK hardening. Same 16-case regression suite ported to test/redact.test.ts. - paths.ts: shouldRedactInRepoMemory(worktree) = isInRepoMemory && !opted-in; opt-in via OPENCODE_MEMORY_LOCAL_SECRETS / opencode.json localMemorySecrets (default off). isInRepoMemory resolves the active store WITHOUT creating dirs. - saveMemory scrubs name/description/content for in-repo writes (covers the post-session extraction path too); the scrub flows into the MEMORY.md pointer. - memory_save tool surfaces "🔒 Redacted N credential value(s)" when it catches something in-repo. - Wrapper docs + PRINT_SETTINGS gain local.secrets; README updated (the earlier "prompt-only" caveat is now "scrubbed by default, opt-out available"). - Belt-and-suspenders, not a guarantee (keyword-anchored): keyword-less leaks can still slip through — the prompt rule remains the first line of defense. 204 tests pass.
- docs/: 7 reference docs (architecture, models & extraction quality, local inference, dreaming, benchmarks, secrets, index) mined from the project's own past sessions and reconciled with the design doc. Scrubbed of private session-log ids, infra hostnames, and usernames before publishing. - README: reframed as a fork of kuitos/opencode-claude-memory — replaced the upstream npm badges with fork attribution, added a 'What this fork adds' section (tuned two-phase extraction, in-repo memory, in-repo secret scrub, index size-limit warning, cross-repo memory), linked docs/, fixed the release/license notes.
- README: rewritten from scratch, no emoji, concise and scannable. - docs: drop the kimi-tools/SWE-bench coding bake-off + agent-benchmark fugu content (owned by the kimi-tools / agent-benchmark repos, not this one). benchmarks-and-evals.md → memory-eval.md, scoped to memory-extraction evaluation only. Trimmed the off-topic 40/48 disambiguation in the models doc and the kimi/SWE correction notes in the docs index.
…nary (dev builds); surface it in PRINT_SETTINGS
… turns on resume) Persist the user-turn count at last successful extraction under $STATE_DIR/extract-marks/<session_id>. On resume: skip when no new turns, else inject a boundary note so the extractor mines only the delta and advances the mark on success. Toggle off with OPENCODE_MEMORY_INCREMENTAL=0. Harden two sleep-based wrapper tests with a 20s timeout against parallel load.
- Do not advance the mark on the memory_written_during_session path: has_new_memories() is not project-scoped, so a write in another project could falsely advance the mark and permanently skip turns (also removes a lock-free mark-write race). - Handle a shrunk transcript (count < mark, e.g. compaction): re-baseline with a full extraction instead of skipping forever. - count_user_turns: report unknown (full extraction) when python3 is absent instead of an over-counting grep that could skip real turns. - read_extract_mark: cap digit length to avoid bash arithmetic overflow aborts. - write_extract_mark: prune marks untouched for 90 days. - Add a compaction re-baseline test.
…omparison (local Gemma, GLM, Kimi, gpt-oss, deepseek, qwen)
…ck + incremental caveats - getMemoryDir/isInRepoMemory now apply the local on/off mode only to the session's own repo (pinned via setSessionMemoryRoot). Repos surfaced via extraMemoryRoots resolve as 'auto', so localMemory:on no longer creates a .claude/memory inside a foreign repo just from reading its index. - README: note that Phase-1 harness feedback always stays in the global store even in in-repo mode, the extra-roots/local-mode independence, and the incremental high-water-mark caveats (in-place edits / compaction / turn-index).
…epth) The deterministic credential scrub previously ran only on in-repo writes; the global ~/.claude path relied on the extraction prompt alone. Add an OPT-IN global scrub (OPENCODE_MEMORY_REDACT_GLOBAL / redactGlobalSecrets, default off so 'secrets OK in ~/.claude' stays the default) via a new shouldRedactMemory decision. Isolate redact-integration tests from e2e session-root leak.
…d), revise verdict Hand set saturates; the realistic set (~13 distractors) separates the field. Local Gemma is second-tier at scale (0.78, over-selects); GLM-5.2 worst (0.76); GLM-4.6 leads but with judge advantage; deepseek-v4-flash best non-judge.
…s TL;DR Remove dangling references to the unpublished parent design doc and eval scripts (auto-memory/..., agent-benchmark/..., §N citations) across all docs so they stand alone in the public repo; no parent content copied in. Update secrets-and-redaction.md to reflect the now-shipped programmatic scrub (in-repo default + global opt-in) instead of the stale prompt-only framing.
…cedence Per review: setSessionMemoryRoot is pinned by MemoryPlugin and never cleared, so reset it in the afterEach of every e2e file that runs the plugin (index, tool-titles, recall-prefetch) — neutralizing the cross-file module-state leak at its source rather than at each consumer. Add the env=0-suppresses-plugin reverse precedence test for the global scrub.
… for a dir without an interactive run (for cron / dashboard on_close)
…itles/commit-provenance/fork-cleanup read the active DB, not always opencode.db); document run_maintain caveats
…et opencode flush final turns before forking (OPENCODE_MEMORY_MAINTAIN_SETTLE, default 2s)
…-fork session for the dir by time_updated) opencode 'session list --format json' is unreliable on some builds (observed returning empty for an in-scope dir while the DB held 136 sessions), so on_close maintain silently found nothing. Read the session table directly (OPENCODE_DB-aware, skips fork children); fall back to session list. Fixes the on_close no-op.
… rejects the object form); maintain DB resolver skips archived sessions (time_archived IS NULL)
…a dir's live memory store (collapse dupes, prune stale), bypassing the auto-dream gate; no extraction
…re mutually exclusive (no file/MEMORY.md races); (b) session resolver skips sub-agent/fork sessions by title (most have no parent_id) so maintain/dream target the real interactive session
…m resolver, get_session_target_id, and the auto-dream gate count match sessions in child dirs (lib/taxes, lib/exec), not just the exact root; submodule guard excludes nested git repos (different store)
- intro: 'both tools' -> 'both agents' (the two coding agents share the store) - 'What this fork adds': drop the 'two-phase extraction' framing; lead with post-session extraction, add a 'Dream consolidation' bullet (was missing), and demote harness feedback to a 'Harness-feedback sidecar' bullet (not a 'phase', no longer first). Also drop the surviving 'Phase 1/Phase 2' wording further down - 'How it works': remove the 'dashboard tile' example, switch passive 'can be wired' -> active 'you can wire it', add small sub-headlines (During a session / After a session / When opencode isn't wrapped / Incremental extraction) - clarify that the post-session fork reuses warm session context rather than re-feeding a transcript (matches bin/opencode-memory: '--fork' sees full context)
When no extractModel is configured, the post-session fork now runs on the model
the session itself used, instead of opencode's global default. Forking already
replays the session's conversation context, so matching the model means the
extraction turn lands on a warm prompt cache and only pays for the new turn,
rather than prefilling the whole conversation into a different model.
- add get_session_model_from_db: resolves the session.model column (JSON
{id,providerID}) to an opencode -m string, falling back to the most recent
assistant message's providerID/modelID for older rows lacking session.model
- run_extraction_if_needed: if EXTRACT_MODEL is empty, use the resolved session
model (opt out with OPENCODE_MEMORY_MATCH_SESSION_MODEL=0). An explicit
extractModel/OPENCODE_MEMORY_MODEL still wins (dedicated extractor = better
quality, higher token cost, no shared cache)
- README + header docs: document the warm-cache default and the new env var
maintain inherits this automatically (same extraction path).
The 'During a session' line just said 'surfaces relevant memories via LLM recall' without defining the term. Spell it out: on each user turn a small fast model (the configurable recallModel) reads the message + each memory's one-line description and picks which to inject (up to 5) — an LLM relevance judgment, not keyword or embedding search — run as a non-blocking prefetch so it adds no turn latency.
A more elegant post-session trigger than the dashboard on_close hook or the wrapper: subscribe to opencode's session.idle bus event and debounce-spawn 'opencode-memory maintain --dir <cwd>' when the session goes quiet. Works for ANY launch (dashboard tile, bare CLI, editor) with no on_close config and no dependency on the wrapper. - event hook: on session.idle for a session this instance served (and not a recall-selector child), arm a debounce timer. - chat.message hook: a new turn cancels the pending run, so maintain fires once per quiet period, not every turn. Incremental high-water mark makes repeats cheap. - detached + unref'd spawn; wrapper bin resolved next to dist (../bin) with a PATH fallback; wrapper's per-repo lock serialises overlapping runs. - opt-in (default off) so wrapper users don't double-extract: OPENCODE_MEMORY_MAINTAIN_ON_IDLE=1 or option maintainOnIdle:true; debounce window OPENCODE_MEMORY_MAINTAIN_IDLE_SECONDS (default 30).
Every dream (auto or forced `dream` subcommand) now appends one line to $STATE_DIR/dream-journal.log: UTC timestamp, mode, ok/fail, delete/save counts, model, host session, dir. Answers 'did a dream already run on this store and what did it change?' at a glance — the per-run dream logs live in $TMPDIR and are easy to lose. Counts come from the tool-call lines in the run's dream log (⚙ memory_delete/save), gear-anchored so the prompt's own tool-name mentions aren't miscounted. Best-effort: a journal write never fails the dream.
…build npm runs 'prepare' (not 'prepack') for git/github installs; without it a 'npm i github:...' install ships no dist/ (gitignored) and is broken. Also add 'bin' to the files allowlist so the opencode-memory wrapper is included in packed/installed output.
…ssertion - session-model matching: extraction defaults to the session's own model (warm cache), OPENCODE_MEMORY_MATCH_SESSION_MODEL=0 disables it, and an explicit OPENCODE_MEMORY_MODEL still overrides — driven through the fork-args capture harness with a seeded session.model in the db - dream journal: a forced dream appends one journal line with del/save counts + dir; a failed fork is recorded as fail - publish-config: update files assertion to ["dist","bin"] to match 7926c46 (bin/ ships so GitHub installs can self-build the wrapper)
…inked npm runs 'prepare' before linking devDeps' .bin symlinks, so a bare 'tsc' isn't found during a github install even though node_modules/typescript is present. Fall back to 'node ./node_modules/typescript/bin/tsc'. Local bun builds still hit the first clause via bun's .bin.
npm's global git-dep flow doesn't install dev/type deps before 'prepare', so building at install time is unreliable. Commit prebuilt dist/ and drop the 'prepare' script so 'npm i -g github:...' just copies the files (no toolchain needed). 'prepack' still rebuilds for any future npm publish. When changing src/, rebuild (bun run build) and commit dist/.
The github install does an internal 'npm pack' which fires prepack, re-running the build (without dev/type deps) and corrupting the committed dist. With dist committed, remove prepack so install just packs the prebuilt files verbatim.
…o dreamModel is set (same gate as extraction); fix test env leakage from OPENCODE_DB/OPENCODE_MEMORY_OPENCODE_BIN
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.