ImprintMCP
Autoupdating MCP Vector Memory
Imprint
Persistent memory for AI coding tools. 100% local. Zero API cost.
Give Claude Code, Cursor, Codex CLI, Copilot, and Cline a long-term memory.
Stop re-explaining your codebase every session.
imprintmcp.alexandruleca.com →

Why Imprint
- Remembers what your AI forgets. Decisions, patterns, bug fixes, and architectural choices persist across sessions — searched semantically, not grepped.
- −70.4% tokens, −31.7% cost. Measured across 150 runs on Claude Code (Sonnet). Your AI searches memory instead of re-reading files. See BENCHMARK.md for raw numbers.
- Runs 100% locally by default. EmbeddingGemma-300M via ONNX, Qdrant vector DB, Chonkie chunking — all on your machine. No API credits consumed unless you opt in.
- One command, any host. Wires into Claude Code, Cursor, Codex CLI, Copilot, or Cline via MCP. Same memory, shared across tools.
Runs 100% locally. Zero API credits consumed by default. Everything from embeddings, chunking, tagging, vector search, and the knowledge graph — runs on your machine:
- Embeddings: EmbeddingGemma-300M via ONNX Runtime (GPU or CPU), no network calls, no per-token cost.
- Vector store: Qdrant auto-spawned as a local daemon on
127.0.0.1:6333. Your data never leaves the box unless you sync it to another device. - Chunking: Chonkie hybrid (tree-sitter CodeChunker + SemanticChunker), pure Python, local.
- Tagging: deterministic rules + zero-shot cosine similarity against pre-embedded labels. Local LLM call per chunk if you want it.
- Imprint graph: SQLite on disk for temporal facts.
The ingestion flow: scan dir → detect project → chunk files → embed chunks → tag (lang/layer/kind/domain/topics) → upsert into Qdrant. A Stop hook auto-extracts decisions from Claude transcripts; a PreCompact hook saves context before window compression. Search goes straight to the local vector DB — no round-trip to any provider.
Optional cloud LLM tagging is opt-in only (imprint config set tagger.llm true) if you want more granular topics and are fine spending credits. Providers: Anthropic, OpenAI, Gemini, or fully-local Ollama / vLLM. Leave it off and nothing ever talks to a paid API.
graph TB
subgraph "Your Machine"
CC[Claude Code] -->|MCP tools| MCP[Imprint MCP Server]
MCP -->|HTTP localhost:6333| QDB[(Qdrant Server<br/>auto-spawned daemon)]
MCP -->|facts| KG[(SQLite<br/>Imprint Graph)]
CLI[imprint CLI] -->|HTTP| QDB
CC -->|Stop hook| EXT[Auto-Extract<br/>Decisions]
CC -->|PreCompact hook| SAVE[Save Before<br/>Compression]
EXT -->|HTTP| QDB
EMB[EmbeddingGemma ONNX<br/>GPU/CPU] -->|768-dim vectors| QDB
TAG[Tagger<br/>lang/layer/kind/domain/topics] -->|payload| QDB
CHK[Chonkie Hybrid<br/>CodeChunker + SemanticChunker] -->|chunks| EMB
end
subgraph "Sync Relay"
RELAY[imprint relay<br/>WebSocket forwarder]
end
subgraph "Other Machine"
CC2[Claude Code] -->|MCP| MCP2[Imprint MCP]
MCP2 --> QDB2[(Qdrant Server)]
end
CLI -->|sync serve| RELAY
RELAY -->|sync pull/push| QDB2
style QDB fill:#1a1a3a,stroke:#60a5fa,color:#fff
style KG fill:#1a1a3a,stroke:#4ecdc4,color:#fff
style MCP fill:#0d1117,stroke:#a78bfa,color:#fff
style RELAY fill:#0d1117,stroke:#ff6b6b,color:#fff
style EMB fill:#0d1117,stroke:#fbbf24,color:#fff
style TAG fill:#0d1117,stroke:#34d399,color:#fff
style CHK fill:#0d1117,stroke:#f472b6,color:#fff
Quick Install
Linux / macOS:
curl -fsSL https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.sh | bash
Windows (PowerShell):
irm https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.ps1 | iex
Pin a specific version, pick the dev channel, or use prebuilt Docker images — see docs/installation.md.
Updating
Once installed, use the built-in updater — no curl, no sudo. data/ (workspaces, Qdrant storage, SQLite graphs, config, gpu_state.json) and .venv/ are always preserved; only the code tree is replaced.
imprint update # latest stable, asks for confirmation
imprint update --dev # latest prerelease
imprint update --version v0.3.1
imprint update --check # show current + latest release and exit
imprint update -y # skip confirmation (CI / scripts)
Re-running install.sh also works and now prompts before overwriting an existing install. For non-interactive upgrades pass --yes or set IMPRINT_ASSUME_YES=1.
If GPU setup fails once (e.g. Blackwell + old nvcc, or CUDA runtime mismatch) the failure is remembered in data/gpu_state.json so future imprint setup runs skip the broken path silently. After you upgrade the toolchain, force a retry with:
imprint setup --retry-gpu
Supported hosts
imprint setup <target> auto-wires the MCP server into each supported AI coding tool. Run imprint setup all to configure every host that's installed on your machine; missing tools are skipped with a warning, not an error.
| Target | Wired into | Config file | Enforcement |
|---|---|---|---|
claude-code | Claude Code CLI (MCP + hooks + global CLAUDE.md) | ~/.claude/settings.json + MCP registered via claude mcp add | Hard (PreToolUse) |
cursor | Cursor IDE (MCP + always-on rule) | ~/.cursor/mcp.json + ~/.cursor/rules/imprint.mdc | Text-only (rule) |
codex | OpenAI Codex CLI | ~/.codex/config.toml ([mcp_servers.imprint]) | Text-only |
copilot | GitHub Copilot (VSCode agent mode), user-global | <VSCode user>/mcp.json (servers.imprint) | Text-only |
cline | Cline — VSCode extension + standalone CLI | <VSCode user>/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json and/or ~/.cline/data/settings/cline_mcp_settings.json | Text-only |
imprint disable is symmetric — it tears down the MCP entry from every config file above that still exists (the venv and data are always preserved so re-enabling is fast).
Commands
imprint setup [target] # install deps, register MCP, configure the chosen host tool
# target: claude-code (default) | cursor | codex | copilot | cline | all
# add --retry-gpu to forget a sticky GPU failure and retry ORT / llama-cpp CUDA
imprint update [--version v0.3.1] [--dev] [-y] [--check]
# upgrade imprint in place; preserves data/ and .venv/
imprint uninstall [-y] [--keep-data]
# full removal: disable + strip CLAUDE.md + delete venv/data/install dir
imprint status # is everything wired? show enabled/disabled, server pid, memory stats
imprint enable [target] # re-wire MCP + hooks + start server
# target: claude-code | cursor | codex | copilot | cline | all
imprint disable # stop server, unregister MCP from every host, strip Claude hooks (data preserved)
imprint ingest <path> # index project source files (directory or single file)
imprint learn # index Claude Code conversations + memory files
imprint learn --desktop # also ingest Claude Desktop / ChatGPT Desktop export zips from Downloads
imprint ingest-url <url> # fetch URL(s), extract content, and index (html/pdf/etc)
imprint refresh <dir> # re-index only changed files (mtime-based)
imprint refresh-urls # re-check stored URLs via ETag/Last-Modified and re-index changed
imprint retag [--project] [--all]
# re-run the tagger on existing memories (--all re-tags already-tagged chunks)
# Heavy jobs (ingest/refresh/retag/ingest-url/refresh-urls) serialize via a
# shared queue lock. If another job is already running the CLI exits with an
# error — cancel it from /queue in the UI or kill the PID it reports.
imprint migrate --from WS1 --to WS2 --project NAME | --topic TAG [--dry-run]
# move memories between workspaces (preserves vectors)
imprint config # show all settings with current values
imprint config set <k> <v> # persist a setting (e.g. model.name, qdrant.port)
imprint config get <key> # show one setting with source + default
imprint config reset <key> # remove override, revert to default
imprint server <cmd> # manage the local Qdrant daemon: start | stop | status | log
imprint workspace # list workspaces and show active
imprint workspace switch <name> # switch to workspace (creates if new)
imprint workspace delete <name> # delete a workspace and its data
imprint wipe [--force] # wipe active workspace
imprint wipe --all # wipe everything (all workspaces)
imprint sync serve [--relay <host>] # expose KB for peer syncing (default: imprint.alexandruleca.com)
imprint sync <id> --pin <pin> # sync via default relay (or <host>/<id> / wss://<host>/<id>)
imprint sync export | import <dir> # snapshot bundle, no re-embed on import
imprint relay # run the sync relay server
imprint ui [start|stop|status|open|restart|log] [--port N]
# dashboard (FastAPI + Next.js); bare `imprint ui` runs foreground
imprint version # print version
Documentation
| Topic | File |
|---|---|
| Install, versioning, channels, Docker | docs/installation.md |
| Components, data flow, Qdrant daemon, lifecycle | docs/architecture.md |
| Embedding pipeline + GPU acceleration | docs/embeddings.md |
| Chunking strategy + tunables | docs/chunking.md |
| Metadata tags, LLM providers, search filters | docs/tagging.md |
| Workspaces + project detection | docs/workspaces.md |
| MCP tools + automatic updates | docs/mcp.md |
| Peer sync, relay server, dashboard | docs/sync.md |
| Command queue + cancellation | docs/queue.md |
All settings (imprint config) | docs/configuration.md |
| Building from source + CI/release flow | docs/building.md |
| Benchmarks & token savings | BENCHMARK.md |
Glossary
Terms used across the docs.
| Term | Definition |
|---|---|
| Chunk | A sub-file unit of text (a function, class, markdown section, conversation turn) that gets its own embedding vector. Produced by the chunker. |
| Embedding | Dense numeric vector (default 768-dim) representing the semantic meaning of a chunk. Similar meanings → nearby vectors. |
| Qdrant | The vector database that stores embeddings + payloads. Runs as an auto-spawned local daemon on 127.0.0.1:6333. |
| Collection | Qdrant's term for a named set of vectors. Each workspace has its own collection (e.g. memories, memories_research). |
| Workspace | Isolated memory environment — dedicated Qdrant collection + SQLite DB + WAL. Lets you separate research/staging/prod memories. |
| Imprint Graph | Temporal fact store (SQLite) for structured subject → predicate → object facts with valid_from / ended timestamps. |
| MCP | Model Context Protocol — the open protocol Claude Code uses to call external tools. Imprint ships an MCP server with 12 tools — see docs/mcp.md. |
| Project | A codebase identified by a canonical name from its manifest (package.json, go.mod, etc.). Projects get the same identity across machines even if paths differ. |
| Layer | Path-derived tag: api, ui, tests, infra, config, migrations, docs, scripts, cli. |
| Kind | Filename-derived tag: source, test, migration, readme, types, module, qa, auto-extract. |
| Domain | Content-derived tag from keyword regex: auth, db, api, math, rendering, ui, testing, infra, ml, perf, security, build, payments. |
| Topics | Free-form tags from zero-shot cosine similarity or (opt-in) LLM classification — more granular than domain. |
| Ingestion | Scanning a directory, detecting projects, chunking files, embedding chunks, tagging, and upserting into Qdrant. |
| Refresh | Incremental re-ingest — only re-chunks + re-embeds files whose mtime changed since last run. |
| Queue | Single-slot FIFO (data/queue.sqlite3 + data/queue.lock) that serializes ingest/refresh/retag/ingest-url so parallel runs can't OOM the box. UI at /queue lists active + queued + history; cancel propagates SIGTERM→SIGKILL to the subprocess's process group, so in-flight LLM tagger calls die with it. See docs/queue.md. |
| Auto-extract | Stop hook that parses conversation transcripts after each Claude response and stores Q+A exchanges + decision-like statements. |
| PreCompact hook | Synchronous hook that fires before Claude's context window compresses — instructs Claude to save important context via MCP tools first. |
| Relay server | Stateless WebSocket forwarder (imprint relay) that brokers peer sync between two machines. No vectors cross the wire — only raw content, re-embedded locally on the receiver. |
| WAL | Write-ahead log — append-only wal.jsonl per workspace, used for replay / recovery of memory operations. |
| Zero-shot tagging | Classifying chunks by cosine similarity against pre-embedded label prototypes — no per-chunk LLM call. |
| Dev / stable channel | Two release tracks. Dev = prerelease on every dev push (vX.Y.Z-dev.N). Stable = conventional-commit release on main merges (vX.Y.Z). |
Benchmarks
Imprint reduces Claude Code's token consumption by serving focused semantic search results instead of requiring full file reads. Measured across 15 prompts in 6 categories, 5 runs per prompt per mode, Sonnet primary model.
| Category | Prompts | Δ Tokens | Δ Cost | Notes |
|---|---|---|---|---|
| Debugging | 2 | −94.2% | −68.3% | Imprint answers from indexed failure-mode patterns instead of reading the codebase |
| Cross-project recall | 2 | −90.6% | −46.9% | Patterns spanning multiple indexed projects — impossible without memory |
| Architecture Q&A | 5 | −87.2% | −42.6% | Questions like "how does chunking work?" served from semantic search |
| Decision recall | 2 | −78.8% | −46.1% | Why-we-did-X questions served from stored decisions |
| Creation tasks | 3 | +9.9% | +15.1% | Near parity — code generation still needs codebase context |
| Session summary | 1 | +179.6% | +204.1% | Outlier: single prompt, ON went on a graph-exploration spree |
| Overall | 15 | −70.4% (10.28M → 3.05M) | −31.7% ($2.84 → $1.94) |
Numbers are median per prompt, summed across categories. See BENCHMARK.md for per-prompt tables, per-model breakdown, response-quality analysis, and the exact flags used.
Reproduce:
bash benchmark/run.sh(full suite, ~$15–25) orbash benchmark/run.sh --subset(one prompt per category, ~$6–10).
Roadmap
- Local Auto Back-up
- External Qdrant Instance instead of local db
- BackUp/Sync to another Qdrant Remote Instance Server
- Document (pdf, doc, odt, ...etc) ingestion capability
- Video / Audio ingestion capability
- URL ingestion capability
License
Imprint is licensed under the Apache License 2.0.
Third-party dependencies retain their own licenses — see THIRD_PARTY_LICENSES.md for the full table.
Default embedding model (EmbeddingGemma-300M) is governed by the Gemma Terms of Use and Prohibited Use Policy — not Apache 2.0. Imprint does not bundle weights; they're downloaded at runtime from HuggingFace, where you accept Gemma's terms. Switch to a differently-licensed model (e.g. BGE-M3, MIT) via imprint config set model.name <repo>.
Contact
Questions, feedback, or bug reports? Reach out:
Related Servers
LynxPrompt MCP
Browse, search, and manage AI configuration blueprints and prompt hierarchies via LynxPrompt
HubSpot MCP
Access and manage HubSpot CRM data through a standardized interface using the HubSpot API.
Propstack MCP Server
Connect AI assistants to Propstack real estate CRM. Search contacts, manage properties, track deals, schedule viewings, and manage your entire real estate pipeline through natural language. 47 tools covering the complete Propstack API.
ActivityWatch MCP Server
An MCP server for ActivityWatch, allowing interaction with your personal time tracking data.
Mercado Pago
Mercado Pago's official MCP server, offering tools to interact with our API, simplifying tasks and product integration.
DeepL
Translate text using the DeepL API.
MCP Organizze
Organizze Finance Manager MCP Server. Create transactions, consult balances, and manage goals.
Markdownify MCP Server
A server that converts various file types, including documents, images, audio, and web pages, into Markdown format.
Hilanet MCP
Provides HR-related tools for a corporate dystopia.
MCP Ashby Connector
An MCP server for integrating with the Ashby Applicant Tracking System (ATS).