agentcairn MCP Server

official

Local-first agent memory: a plain-Markdown Obsidian vault is the source of truth, with a rebuildable DuckDB index for hybrid BM25 + vector + graph recall.

Documentation

πŸͺ¨ agentcairn

CI Security PyPI Python License: Apache-2.0

Local-first memory for AI agents β€” that you can actually read, edit, and own.

cairn Β /kΙ›Ι™n/Β  Β· noun β€” a stack of stones raised to mark a trail or a place worth remembering, left for whoever comes next.

agentcairn gives your coding agent durable, high-quality memory β€” but instead of locking it in an opaque database or a cloud service, your memories live as plain Markdown in an Obsidian vault you own. A fast, rebuildable DuckDB index sits on top for retrieval. Open your vault, read what the agent remembered, fix a wrong fact by hand, or drop in your own notes β€” and the agent picks it all up.

Why agentcairn is different

Most agent-memory systems make a database or cloud store the source of truth and treat files (if any) as a one-way export. agentcairn inverts that:

  • πŸ“‚ Your vault is the source of truth β€” not an export. Memory is human-readable Markdown with frontmatter and [[wikilinks]]. Edit it in Obsidian; the index honors your edits.
  • ♻️ The index is disposable. DuckDB is a rebuildable cache (cairn reindex). Your memory survives a model upgrade, a corrupted index, a schema change, or uninstalling the tool β€” zero data loss, because the truth is just files on disk.
  • 🧠 Non-lossy by construction. The full note is always retained. Distillation only adds derived notes that link back to the source β€” it never silently drops facts it didn't think to extract at write time.
  • πŸ”’ Redaction before every write. Secrets are scrubbed (regex + entropy + URL-credential detection) before anything β€” body, title, or tags β€” reaches the plaintext vault. We write files you can read, so we treat a leaked credential as the worst failure mode.
  • πŸ•ΈοΈ A free, deterministic knowledge graph. Your [[wikilinks]] and frontmatter are the graph β€” no LLM extraction, no hallucinated entities.
  • πŸͺΆ Daemonless, zero external DB. One embedded DuckDB file does semantic vector search, BM25 full-text, and graph traversal. No always-on server, no Neo4j/Postgres/Qdrant, no required cloud key β€” just a cairn CLI and an on-demand MCP server.
  • πŸ” Honestly measured. A reproducible LongMemEval-S + LoCoMo harness ships in benchmarks/ β€” with real numbers, ablations, and explicit caveats instead of one cherry-picked headline (see below).

Install

The easiest way to use agentcairn is the plugin for Claude Code or Codex β€” one install wires up the MCP server, ambient memory (recall at session start, capture at session end), a memory skill, and slash commands (Claude Code):

# Claude Code
claude plugin marketplace add ccf/agentcairn
claude plugin install agentcairn@agentcairn

# Codex (from the Codex plugin marketplace)
codex plugin marketplace add ccf/agentcairn
codex plugin add agentcairn@agentcairn

On install you pick a vault path (default ~/agentcairn); it's auto-created on the first session β€” no Obsidian setup required. From then on agentcairn surfaces relevant memory at the start of each session, distills each session into your vault, and gives you /agentcairn:recall, /remember, /memory, /savings, and /ingest. Nothing to pip-install β€” the plugin runs the published package via uvx.

Not on Claude Code or Codex? agentcairn is also a standalone MCP server + CLI for any host β€” see Using it directly.

How it works

flowchart LR
    T["Session transcripts<br/>(out-of-band)"]
    H["You Β· Obsidian<br/>(hand edits)"]
    V["πŸ“‚ Obsidian vault<br/>Markdown + frontmatter + wikilinks<br/><b>source of truth</b>"]
    I["♻️ DuckDB index<br/>vector + BM25 + graph<br/><b>rebuildable cache</b>"]
    M["MCP tools<br/>remember Β· recall Β· search Β· build_context Β· recent"]

    T -- "redact β†’ judge β†’ distill β†’ consolidate" --> V
    H -- "edit" --> V
    V -- "parse / reconcile-on-spawn" --> I
    I -- "READ_ONLY hybrid recall" --> M
    M -. "remember (redacted write)" .-> V

    classDef truth fill:#eaf1ff,stroke:#317cff,color:#191919;
    classDef cache fill:#f5f5f3,stroke:#999999,color:#191919;
    class V truth
    class I cache
  • Capture reads your agent harness's session transcripts (append-only, already on disk) out-of-band β€” robust by design, with no fragile live hooks β€” then redacts β†’ dedups β†’ judges (semantic durability; optional LLM distillation via CAIRN_JUDGE=anthropic) β†’ gates β†’ distills into the vault, non-lossily. cairn sweep auto-detects every present harness (Claude Code, Codex, Antigravity CLI, and Cursor are all supported, behind a HarnessAdapter seam) so you get unified memory across all four without any extra configuration. On the LLM tier it also consolidates: a new memory that duplicates an existing one is skipped, and a newer version of an evolving fact marks the older note superseded_by (kept + demoted in recall, never deleted) β€” fail-safe, so a wrong call never drops a distinct memory (CAIRN_CONSOLIDATE=0 to disable). Plus an agent-driven remember tool for curated, high-value memories.
  • Retrieval fuses BM25 + semantic vectors with Reciprocal Rank Fusion, applies an optional graph-boost, and degrades gracefully down to keyword-only when no embedding model is available β€” so recall is never silently dead. An optional cross-encoder reranker adds precision.
  • Hybrid intelligence: offline local embeddings (FastEmbed / nomic-embed-text-v1.5 by default) out of the box β€” strong on its own and in the hybrid fusion (with nomic, vector-only edges out BM25 even on short turns; see the benchmark). Set CAIRN_EMBED_MODEL to pick another FastEmbed model, or run CAIRN_EMBEDDER=ollama / a cloud tier to go further.
  • Temporal memory: notes may carry valid_from/valid_until/superseded_by frontmatter. Recall is validity-aware β€” it soft-demotes superseded and expired facts (the current fact wins) without ever hiding them (non-lossy), and annotates each result's status (current/superseded/expired/not_yet_valid) plus an as_of anchor so the agent can reason over time. Inert for notes with no validity fields.
  • Provenance-aware recall: notes carry project/harness provenance, and recall boosts your current project's memories (non-lossy β€” cross-project hits still surface, marked [from: <project>]). Pass --project <repo> to target another repo, or --scope project to hard-filter to just the current one.

Using it directly

The plugin is the easiest path, but agentcairn is just a package β€” use it without Claude Code via the on-demand MCP server (for any MCP host) or the cairn CLI:

uvx agentcairn                                       # on-demand MCP server for any MCP host
cairn ingest --vault ~/vault                         # distill recent agent sessions into the vault
cairn sweep  --vault ~/vault                          # ingest + reindex in one pass (cron-friendly)
cairn schedule install --vault ~/vault                # run sweep automatically every 30 min (launchd on macOS, crontab on Linux)
cairn recall "how did we fix the auth bug?"          # hybrid recall from the CLI
cairn savings                                        # how much context recall has saved you
cairn reindex ~/vault                                # rebuild the index from Markdown (always safe)
cairn doctor                                         # health-check the index

Configuration

All settings live in one file β€” ~/.agentcairn/config.toml β€” with env vars as overrides (precedence: CLI flag > env var > config file > default):

cairn config --init   # scaffold a fully-commented template (chmod 600)
cairn config          # show every setting's effective value and where it came from

For example, enabling the LLM memory judge is two uncommented lines β€” no shell exports needed (the plugin's background sweep reads the file directly):

judge = "anthropic"
anthropic_api_key = "sk-ant-..."

Agents supported

agentcairn works at two levels. Plugin hosts (Claude Code, Codex, and Antigravity) get a first-class plugin β€” a bundled MCP server (recall/search/remember), a memory skill, and (on Claude Code and Codex) ambient session hooks; cairn install <host> installs the plugin by calling the host's own CLI. MCP hosts (everything else) get the same recall/search/remember tools via the portable MCP server; cairn install <host> writes the MCP server config non-destructively (your other servers are preserved, the original is backed up to <config>.bak). The vault stays a single global ~/agentcairn, so memory is shared across every host.

HostSupportSet up withAmbient capture
Claude Code🟒 Plugincairn install claude-codeβœ… recall-at-start + capture-at-end
Codex🟒 Plugincairn install codex◐ recall/remember live; ambient hooks bundled (verifying) 1
CursorπŸ”Œ MCP server + skill + ingestcairn install cursor◐ cairn sweep auto-detects transcripts 2
Claude DesktopπŸ”Œ MCP servercairn install claude-desktopβ€”
VS Code (Copilot)πŸ”Œ MCP servercairn install vscodeβ€”
Gemini CLI 3πŸ”Œ MCP servercairn install geminiβ€”
Antigravity🟒 Plugin + ingestcairn install antigravity◐ cairn sweep auto-detects transcripts 4
Any other MCP hostπŸ”Œ MCP serveruvx agentcairn (paste the cairn install … --print snippet)β€”

cairn install routes by host kind automatically:

cairn install                 # detect installed hosts + preview (writes nothing)
cairn install codex           # install the Codex plugin (shells to `codex plugin …`; strips any stale MCP block from ~/.codex/config.toml)
cairn install antigravity --source ./plugin  # install the Antigravity plugin from a local checkout (see note)
cairn install cursor          # write MCP config + install the memory skill for Cursor
cairn install --all           # configure every detected host
cairn install codex --source /path/to/agentcairn  # use a local checkout instead of the marketplace

MCP hosts take a JSON mcpServers entry (VS Code uses its servers key). Plugin hosts (Claude Code, Codex, Antigravity) install the plugin via the host CLI β€” the MCP server is bundled in the plugin and does not need a separate config entry. If you previously used cairn install antigravity to write an MCP entry to ~/.gemini/config/mcp_config.json, re-running cairn install antigravity removes that stale entry automatically.

Benchmarks measured

We benchmark agentcairn the way we'd want a memory system measured β€” reproducibly, with ablations, and without a single cherry-picked headline number. The harness (benchmarks/) runs LongMemEval-S and LoCoMo through a version-pinned downloader (datasets are never vendored), scores retrieval deterministically (recall/nDCG@k, MRR β€” no API key needed, runs in CI on a synthetic fixture), and offers an opt-in LLM-judged QA layer.

Retrieval β€” LoCoMo

Full LoCoMo set, turn-level, macro-avg, FastEmbed nomic-embed-text-v1.5 (the default embedder):

armrecall@5recall@10MRR
BM25 only0.5270.6040.459
vector only0.5360.6370.433
hybrid (RRF)0.5620.6480.477
hybrid + graph-boost0.5620.6480.477
hybrid + reranker0.6620.7350.608

What we read from this β€” and say out loud:

  • Hybrid beats either arm alone β€” RRF fusion is worth it.
  • The cross-encoder reranker is the biggest lever (+0.10 recall@5 over hybrid); the "ms-marco domain-shift might hurt" worry didn't materialize on conversational data.
  • The embedder default now pulls its weight β€” with nomic, vector-only edges out BM25 (0.536 vs 0.527); switching from the old bge-small default (which trailed at 0.483) closed the gap. A 5-model FastEmbed sweep settled the pick β€” nomic (768-d) wins on quality-per-dim; bigger 1024-d models don't beat it. Full table: benchmarks/README.md.
  • graph-boost is inert on these corpora β€” LoCoMo/LongMemEval have no native [[wikilink]] graph, so the boost has nothing to fire on. It's for real interlinked vaults, not chat logs, and we don't pretend otherwise.

Retrieval β€” LongMemEval-S

Full 500-instance set β€” an easier task with well-separated evidence sessions. Session level is the granularity prior work reports; turn level is the finer, corpus-revealing slice:

armsession r@5session MRRturn r@5turn r@10turn MRR
BM25 only0.9200.9180.6800.7910.638
vector only0.9360.9160.5070.6920.454
hybrid (RRF)0.9540.9380.6400.7980.544
hybrid + reranker0.9690.9630.7880.8910.716

Read honestly:

  • Our 0.969 session recall@5 sits right alongside prior work's β‰ˆ0.95 over the same full 500-question set β€” and at full scale it discriminates (0.920 BM25 β†’ 0.969 reranker) rather than saturating the way a small sample does.
  • The reranker is again the biggest lever β€” turn r@5 0.640 β†’ 0.788, session r@5 0.954 β†’ 0.969.
  • Turn level is corpus-revealing: here BM25-only (0.680) beats the RRF hybrid (0.640) because vector-only is weak on these single-turn evidence spans (0.507); the reranker is what pulls the default ahead. (Contrast LoCoMo, where vector-only edges out BM25.)

Context efficiency

How much smaller is the context agentcairn recalls than the full history you'd otherwise carry into the model? Default config (hybrid + reranker, k=10):

datasetqueriesmean haystackmean recalled (k=10)context reduction
LoCoMo (3 convos)49725,646 tok529 tok51.1Γ— mean / 50.3Γ— median
LongMemEval-S (full 500)470136,552 tok2,207 tok64.7Γ— mean / 61.6Γ— median

Estimate (~4 chars/token), not a billed cost; "haystack" = the full indexed history, "recalled" = the top-k chunks returned. It measures context size, independent of retrieval quality.

QA accuracy

QA-accuracy numbers (LLM-judged) are available too, but use an Anthropic judge rather than the papers' GPT-4o, so they are not comparable to published leaderboards β€” valid for relative ablation signal only. See benchmarks/README.md for how to run it and how to read the numbers.

Roadmap

  • v1 β€” done. The core loop: transcript ingestion β†’ redaction β†’ Markdown β†’ rebuildable DuckDB index β†’ hybrid recall; MCP server + CLI; secret redaction; local embeddings; reproducible benchmark harness.
  • v1.1 β€” next, prioritized by the benchmark above:
    • βœ… Reranker on by default β€” the largest measured retrieval lever; CAIRN_RERANK=0 to disable. (shipped)
    • Ollama embedding tier β€” βœ… local models via CAIRN_EMBEDDER=ollama (CAIRN_EMBED_MODEL/OLLAMA_HOST); cloud (OpenAI/Voyage) still pending.
    • βœ… Bi-temporal validity β€” frontmatter valid_from/valid_until/superseded_by; recall soft-demotes superseded/expired facts (non-lossy β€” never hidden) and annotates each result's currency + an as_of anchor, so the current fact wins and the agent can reason over time. (shipped)
    • In-memory HNSW for large-vault retrieval latency.
  • v2 β€” ◐ Obsidian plugin (agentcairn-obsidian) β€” a vault-native Memory view (list + provenance + currency + graph) for reading/navigating your memory in Obsidian; (MVP shipped; semantic recall stays in the CLI/MCP). MotherDuck cloud sync, optional LLM entity extraction still pending.

Development

agentcairn uses uv exclusively for dependency management and tooling.

Do not use pip, poetry, or global virtual environments.

# First-time setup
uv sync                         # create .venv and install all deps (including dev)
uv run pre-commit install       # install git hooks (ruff + pytest run on every commit)

# Daily use
uv run pytest                   # run the test suite
uv run cairn --help             # run the CLI
uvx agentcairn                  # run the installed tool ephemerally (as the MCP server does)

# Formatting and linting
uv run ruff format .            # format all Python files
uv run ruff check --fix .       # lint with auto-fix
uv run pre-commit run --all-files

# Benchmarks (offline retrieval metrics need no API key)
uv run pytest benchmarks/tests/                                      # offline synthetic-fixture suite
PYTHONPATH=benchmarks uv run --group bench python -m cairn_bench.run --dataset locomo

The MCP server is launched via uvx agentcairn β€” no global install required.

License

Apache License 2.0 β€” permissive, with an explicit patent grant. Copyright Β© 2026 Charles C. Figueiredo.

Footnotes

  1. The Codex plugin installs and its bundled MCP server (recall/search/remember) is verified live in Codex. The ambient session hooks (recall-at-start, capture-at-end) ship in the plugin and use Codex's documented hooks schema, but their on-Codex behaviour isn't yet confirmed end-to-end; capture also happens out-of-band via cairn sweep regardless. ↩

  2. Cursor has no plugin hooks, so ambient capture is out-of-band via cairn sweep (source: Cursor's global globalStorage/state.vscdb SQLite database, cursorDiskKV table, user "bubbles"). Cursor remains an MCP host for output (cairn install cursor β†’ ~/.cursor/mcp.json); there is no Cursor plugin. cairn install cursor also installs the using-agentcairn-memory skill (recall/remember guidance) to ~/.cursor/skills/using-agentcairn-memory/SKILL.md. ↩

  3. Gemini CLI (consumer) transcript ingestion is not supported β€” Google is sunsetting the Gemini CLI (consumer cutoff 2026-06-18) in favour of Antigravity CLI, which agentcairn ingests instead. cairn install gemini (MCP server wiring) remains valid for any Gemini-based host that speaks MCP. ↩

  4. The Antigravity plugin bundles the MCP server + memory skill; cairn install antigravity --source <dir> installs it via agy plugin install and removes any stale mcpServers.agentcairn entry from ~/.gemini/config/mcp_config.json. Note: agy plugin install takes a local directory or a registered marketplace (not a git repo), so point --source at a cloned checkout's plugin/ dir for now. Antigravity has no recognized plugin hooks, so ambient capture is out-of-band via cairn sweep (path: ~/.gemini/antigravity-cli/brain/<uuid>/.system_generated/logs/transcript.jsonl). ↩