ImprintMCP

Autoupdating MCP Vector Memory

Imprint

Imprint

Persistent memory for AI coding tools. 100% local. Zero API cost.

Give Claude Code, Cursor, Codex CLI, Copilot, and Cline a long-term memory.
Stop re-explaining your codebase every session.

imprintmcp.alexandruleca.com →

CI Latest Release License Tokens −70.4% Cost −31.7%


Imprint UI

Why Imprint

  • Remembers what your AI forgets. Decisions, patterns, bug fixes, and architectural choices persist across sessions — searched semantically, not grepped.
  • −70.4% tokens, −31.7% cost. Measured across 150 runs on Claude Code (Sonnet). Your AI searches memory instead of re-reading files. See BENCHMARK.md for raw numbers.
  • Runs 100% locally by default. EmbeddingGemma-300M via ONNX, Qdrant vector DB, Chonkie chunking — all on your machine. No API credits consumed unless you opt in.
  • One command, any host. Wires into Claude Code, Cursor, Codex CLI, Copilot, or Cline via MCP. Same memory, shared across tools.

Runs 100% locally. Zero API credits consumed by default. Everything from embeddings, chunking, tagging, vector search, and the knowledge graph — runs on your machine:

  • Embeddings: EmbeddingGemma-300M via ONNX Runtime (GPU or CPU), no network calls, no per-token cost.
  • Vector store: Qdrant auto-spawned as a local daemon on 127.0.0.1:6333. Your data never leaves the box unless you sync it to another device.
  • Chunking: Chonkie hybrid (tree-sitter CodeChunker + SemanticChunker), pure Python, local.
  • Tagging: deterministic rules + zero-shot cosine similarity against pre-embedded labels. Local LLM call per chunk if you want it.
  • Imprint graph: SQLite on disk for temporal facts.

The ingestion flow: scan dir → detect project → chunk files → embed chunks → tag (lang/layer/kind/domain/topics) → upsert into Qdrant. A Stop hook auto-extracts decisions from Claude transcripts; a PreCompact hook saves context before window compression. Search goes straight to the local vector DB — no round-trip to any provider.

Optional cloud LLM tagging is opt-in only (imprint config set tagger.llm true) if you want more granular topics and are fine spending credits. Providers: Anthropic, OpenAI, Gemini, or fully-local Ollama / vLLM. Leave it off and nothing ever talks to a paid API.

graph TB
    subgraph "Your Machine"
        CC[Claude Code] -->|MCP tools| MCP[Imprint MCP Server]
        MCP -->|HTTP localhost:6333| QDB[(Qdrant Server<br/>auto-spawned daemon)]
        MCP -->|facts| KG[(SQLite<br/>Imprint Graph)]
        CLI[imprint CLI] -->|HTTP| QDB
        CC -->|Stop hook| EXT[Auto-Extract<br/>Decisions]
        CC -->|PreCompact hook| SAVE[Save Before<br/>Compression]
        EXT -->|HTTP| QDB
        EMB[EmbeddingGemma ONNX<br/>GPU/CPU] -->|768-dim vectors| QDB
        TAG[Tagger<br/>lang/layer/kind/domain/topics] -->|payload| QDB
        CHK[Chonkie Hybrid<br/>CodeChunker + SemanticChunker] -->|chunks| EMB
    end

    subgraph "Sync Relay"
        RELAY[imprint relay<br/>WebSocket forwarder]
    end

    subgraph "Other Machine"
        CC2[Claude Code] -->|MCP| MCP2[Imprint MCP]
        MCP2 --> QDB2[(Qdrant Server)]
    end

    CLI -->|sync serve| RELAY
    RELAY -->|sync pull/push| QDB2

    style QDB fill:#1a1a3a,stroke:#60a5fa,color:#fff
    style KG fill:#1a1a3a,stroke:#4ecdc4,color:#fff
    style MCP fill:#0d1117,stroke:#a78bfa,color:#fff
    style RELAY fill:#0d1117,stroke:#ff6b6b,color:#fff
    style EMB fill:#0d1117,stroke:#fbbf24,color:#fff
    style TAG fill:#0d1117,stroke:#34d399,color:#fff
    style CHK fill:#0d1117,stroke:#f472b6,color:#fff

Quick Install

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.ps1 | iex

Pin a specific version, pick the dev channel, or use prebuilt Docker images — see docs/installation.md.

Updating

Once installed, use the built-in updater — no curl, no sudo. data/ (workspaces, Qdrant storage, SQLite graphs, config, gpu_state.json) and .venv/ are always preserved; only the code tree is replaced.

imprint update              # latest stable, asks for confirmation
imprint update --dev        # latest prerelease
imprint update --version v0.3.1
imprint update --check      # show current + latest release and exit
imprint update -y           # skip confirmation (CI / scripts)

Re-running install.sh also works and now prompts before overwriting an existing install. For non-interactive upgrades pass --yes or set IMPRINT_ASSUME_YES=1.

If GPU setup fails once (e.g. Blackwell + old nvcc, or CUDA runtime mismatch) the failure is remembered in data/gpu_state.json so future imprint setup runs skip the broken path silently. After you upgrade the toolchain, force a retry with:

imprint setup --retry-gpu

Supported hosts

imprint setup <target> auto-wires the MCP server into each supported AI coding tool. Run imprint setup all to configure every host that's installed on your machine; missing tools are skipped with a warning, not an error.

TargetWired intoConfig fileEnforcement
claude-codeClaude Code CLI (MCP + hooks + global CLAUDE.md)~/.claude/settings.json + MCP registered via claude mcp addHard (PreToolUse)
cursorCursor IDE (MCP + always-on rule)~/.cursor/mcp.json + ~/.cursor/rules/imprint.mdcText-only (rule)
codexOpenAI Codex CLI~/.codex/config.toml ([mcp_servers.imprint])Text-only
copilotGitHub Copilot (VSCode agent mode), user-global<VSCode user>/mcp.json (servers.imprint)Text-only
clineCline — VSCode extension + standalone CLI<VSCode user>/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json and/or ~/.cline/data/settings/cline_mcp_settings.jsonText-only

imprint disable is symmetric — it tears down the MCP entry from every config file above that still exists (the venv and data are always preserved so re-enabling is fast).

Commands

imprint setup [target]     # install deps, register MCP, configure the chosen host tool
                           #   target: claude-code (default) | cursor | codex | copilot | cline | all
                           # add --retry-gpu to forget a sticky GPU failure and retry ORT / llama-cpp CUDA
imprint update [--version v0.3.1] [--dev] [-y] [--check]
                           # upgrade imprint in place; preserves data/ and .venv/
imprint uninstall [-y] [--keep-data]
                           # full removal: disable + strip CLAUDE.md + delete venv/data/install dir
imprint status             # is everything wired? show enabled/disabled, server pid, memory stats
imprint enable [target]    # re-wire MCP + hooks + start server
                           #   target: claude-code | cursor | codex | copilot | cline | all
imprint disable            # stop server, unregister MCP from every host, strip Claude hooks (data preserved)
imprint ingest <path>      # index project source files (directory or single file)
imprint learn              # index Claude Code conversations + memory files
imprint learn --desktop    # also ingest Claude Desktop / ChatGPT Desktop export zips from Downloads
imprint ingest-url <url>   # fetch URL(s), extract content, and index (html/pdf/etc)
imprint refresh <dir>      # re-index only changed files (mtime-based)
imprint refresh-urls       # re-check stored URLs via ETag/Last-Modified and re-index changed
imprint retag [--project] [--all]
                           # re-run the tagger on existing memories (--all re-tags already-tagged chunks)
# Heavy jobs (ingest/refresh/retag/ingest-url/refresh-urls) serialize via a
# shared queue lock. If another job is already running the CLI exits with an
# error — cancel it from /queue in the UI or kill the PID it reports.
imprint migrate --from WS1 --to WS2 --project NAME | --topic TAG [--dry-run]
                           # move memories between workspaces (preserves vectors)
imprint config             # show all settings with current values
imprint config set <k> <v> # persist a setting (e.g. model.name, qdrant.port)
imprint config get <key>   # show one setting with source + default
imprint config reset <key> # remove override, revert to default
imprint server <cmd>       # manage the local Qdrant daemon: start | stop | status | log
imprint workspace          # list workspaces and show active
imprint workspace switch <name>  # switch to workspace (creates if new)
imprint workspace delete <name>  # delete a workspace and its data
imprint wipe [--force]     # wipe active workspace
imprint wipe --all         # wipe everything (all workspaces)
imprint sync serve [--relay <host>]      # expose KB for peer syncing (default: imprint.alexandruleca.com)
imprint sync <id> --pin <pin>            # sync via default relay (or <host>/<id> / wss://<host>/<id>)
imprint sync export | import <dir>       # snapshot bundle, no re-embed on import
imprint relay              # run the sync relay server
imprint ui [start|stop|status|open|restart|log] [--port N]
                           # dashboard (FastAPI + Next.js); bare `imprint ui` runs foreground
imprint version            # print version

Documentation

TopicFile
Install, versioning, channels, Dockerdocs/installation.md
Components, data flow, Qdrant daemon, lifecycledocs/architecture.md
Embedding pipeline + GPU accelerationdocs/embeddings.md
Chunking strategy + tunablesdocs/chunking.md
Metadata tags, LLM providers, search filtersdocs/tagging.md
Workspaces + project detectiondocs/workspaces.md
MCP tools + automatic updatesdocs/mcp.md
Peer sync, relay server, dashboarddocs/sync.md
Command queue + cancellationdocs/queue.md
All settings (imprint config)docs/configuration.md
Building from source + CI/release flowdocs/building.md
Benchmarks & token savingsBENCHMARK.md

Glossary

Terms used across the docs.

TermDefinition
ChunkA sub-file unit of text (a function, class, markdown section, conversation turn) that gets its own embedding vector. Produced by the chunker.
EmbeddingDense numeric vector (default 768-dim) representing the semantic meaning of a chunk. Similar meanings → nearby vectors.
QdrantThe vector database that stores embeddings + payloads. Runs as an auto-spawned local daemon on 127.0.0.1:6333.
CollectionQdrant's term for a named set of vectors. Each workspace has its own collection (e.g. memories, memories_research).
WorkspaceIsolated memory environment — dedicated Qdrant collection + SQLite DB + WAL. Lets you separate research/staging/prod memories.
Imprint GraphTemporal fact store (SQLite) for structured subject → predicate → object facts with valid_from / ended timestamps.
MCPModel Context Protocol — the open protocol Claude Code uses to call external tools. Imprint ships an MCP server with 12 tools — see docs/mcp.md.
ProjectA codebase identified by a canonical name from its manifest (package.json, go.mod, etc.). Projects get the same identity across machines even if paths differ.
LayerPath-derived tag: api, ui, tests, infra, config, migrations, docs, scripts, cli.
KindFilename-derived tag: source, test, migration, readme, types, module, qa, auto-extract.
DomainContent-derived tag from keyword regex: auth, db, api, math, rendering, ui, testing, infra, ml, perf, security, build, payments.
TopicsFree-form tags from zero-shot cosine similarity or (opt-in) LLM classification — more granular than domain.
IngestionScanning a directory, detecting projects, chunking files, embedding chunks, tagging, and upserting into Qdrant.
RefreshIncremental re-ingest — only re-chunks + re-embeds files whose mtime changed since last run.
QueueSingle-slot FIFO (data/queue.sqlite3 + data/queue.lock) that serializes ingest/refresh/retag/ingest-url so parallel runs can't OOM the box. UI at /queue lists active + queued + history; cancel propagates SIGTERM→SIGKILL to the subprocess's process group, so in-flight LLM tagger calls die with it. See docs/queue.md.
Auto-extractStop hook that parses conversation transcripts after each Claude response and stores Q+A exchanges + decision-like statements.
PreCompact hookSynchronous hook that fires before Claude's context window compresses — instructs Claude to save important context via MCP tools first.
Relay serverStateless WebSocket forwarder (imprint relay) that brokers peer sync between two machines. No vectors cross the wire — only raw content, re-embedded locally on the receiver.
WALWrite-ahead log — append-only wal.jsonl per workspace, used for replay / recovery of memory operations.
Zero-shot taggingClassifying chunks by cosine similarity against pre-embedded label prototypes — no per-chunk LLM call.
Dev / stable channelTwo release tracks. Dev = prerelease on every dev push (vX.Y.Z-dev.N). Stable = conventional-commit release on main merges (vX.Y.Z).

Benchmarks

Imprint reduces Claude Code's token consumption by serving focused semantic search results instead of requiring full file reads. Measured across 15 prompts in 6 categories, 5 runs per prompt per mode, Sonnet primary model.

CategoryPromptsΔ TokensΔ CostNotes
Debugging2−94.2%−68.3%Imprint answers from indexed failure-mode patterns instead of reading the codebase
Cross-project recall2−90.6%−46.9%Patterns spanning multiple indexed projects — impossible without memory
Architecture Q&A5−87.2%−42.6%Questions like "how does chunking work?" served from semantic search
Decision recall2−78.8%−46.1%Why-we-did-X questions served from stored decisions
Creation tasks3+9.9%+15.1%Near parity — code generation still needs codebase context
Session summary1+179.6%+204.1%Outlier: single prompt, ON went on a graph-exploration spree
Overall15−70.4% (10.28M → 3.05M)−31.7% ($2.84 → $1.94)

Numbers are median per prompt, summed across categories. See BENCHMARK.md for per-prompt tables, per-model breakdown, response-quality analysis, and the exact flags used.

Reproduce: bash benchmark/run.sh (full suite, ~$15–25) or bash benchmark/run.sh --subset (one prompt per category, ~$6–10).

Roadmap

  • Local Auto Back-up
  • External Qdrant Instance instead of local db
  • BackUp/Sync to another Qdrant Remote Instance Server
  • Document (pdf, doc, odt, ...etc) ingestion capability
  • Video / Audio ingestion capability
  • URL ingestion capability

License

Imprint is licensed under the Apache License 2.0.

Third-party dependencies retain their own licenses — see THIRD_PARTY_LICENSES.md for the full table.

Default embedding model (EmbeddingGemma-300M) is governed by the Gemma Terms of Use and Prohibited Use Policynot Apache 2.0. Imprint does not bundle weights; they're downloaded at runtime from HuggingFace, where you accept Gemma's terms. Switch to a differently-licensed model (e.g. BGE-M3, MIT) via imprint config set model.name <repo>.

Contact

Questions, feedback, or bug reports? Reach out:

X / Twitter GitHub Issues

Serveurs connexes

NotebookLM Web Importer

Importez des pages web et des vidéos YouTube dans NotebookLM en un clic. Utilisé par plus de 200 000 utilisateurs.

Installer l'extension Chrome