CPersona MCP Server

Persistenter KI-Speicherserver mit 3-stufiger hybrider Suche, Konfidenzbewertung und 16 Tools. Keine LLM-Abhängigkeit.

Dokumentation

cpersona

MCP Memory Server

Give Claude persistent memory across sessions. Single SQLite file. 27 tools. Zero LLM dependency.

License: MIT Python Tests

Quick Start · Features · Architecture · All Tools · Zenn Book (JP)


Standalone repository — This is the standalone version for use with Claude Desktop, Claude Code, and any MCP client. If you are a ClotoCore user, install CPersona from the in-app marketplace (ClotoHub) instead — it distributes this same repository.

The Problem

Claude forgets everything between sessions. Every conversation starts from zero — no context about your project, your preferences, or what you discussed yesterday.

cpersona fixes this. It's an MCP server that stores memories in a local SQLite file and retrieves them through hybrid search. Claude remembers you.

Quick Start

Prerequisites: Python 3.11+ (and uv for the one-command path).

1. Install cpersona

uvx cpersona          # run directly, no install step
# or
pip install cpersona  # then the `cpersona` command is on your PATH
From source (for development)
git clone https://github.com/Cloto-dev/cpersona.git
cd cpersona
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install .

Run it with python -m cpersona (or python server.py).

2. Set up Embedding Server (Recommended)

cpersona's hybrid search works best with an embedding server for vector similarity. cpersona is embedding-server-agnostic: point CPERSONA_EMBEDDING_URL (see step 3) at any HTTP endpoint that implements the following minimal contract.

POST /embed
Request:  { "texts": ["string", ...] }        # non-empty array, max 100 per batch
Response: { "embeddings": [[float, ...], ...], "dimensions": <int> }

The reference server is CEmbedding (MIT) — it runs jina-v5-nano on-device (CPU) and exposes exactly this endpoint:

git clone https://github.com/Cloto-dev/CEmbedding.git && cd CEmbedding
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install ".[onnx]"
python download_model.py --model jina-v5-nano
EMBEDDING_PROVIDER=onnx_jina_v5_nano python server.py   # serves http://127.0.0.1:8401/embed

cpersona was tuned and benchmarked against jina-v5-nano (33M params, 768d), so CEmbedding reproduces the numbers below. Any other server that satisfies the contract above works too.

Without an embedding server, cpersona falls back to FTS5 + keyword search only. Vector search (the strongest retrieval layer) will be disabled.

3. Configure your MCP client

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "embedding": {
      "command": "/path/to/.venv/bin/python",
      "args": ["/path/to/servers/embedding/server.py"],
      "env": {
        "EMBEDDING_PROVIDER": "onnx_jina_v5_nano",
        "EMBEDDING_HTTP_PORT": "8401"
      }
    },
    "cpersona": {
      "command": "uvx",
      "args": ["cpersona"],
      "env": {
        "CPERSONA_DB_PATH": "/home/you/.claude/cpersona.db",
        "EMBEDDING_MODE": "http",
        "EMBEDDING_HTTP_URL": "http://127.0.0.1:8401/embed"
      }
    }
  }
}

Windows: use C:/Users/you/.claude/cpersona.db for the DB path. No embedding server yet? Drop the two EMBEDDING_* lines (or set EMBEDDING_MODE=none) — cpersona runs on FTS5 + keyword and tells you when it's degraded.

Claude Code:

claude mcp add-json cpersona '{"type":"stdio","command":"uvx","args":["cpersona"],"env":{"CPERSONA_DB_PATH":"/home/you/.claude/cpersona.db","EMBEDDING_MODE":"http","EMBEDDING_HTTP_URL":"http://127.0.0.1:8401/embed"}}' -s user

(The embedding server in step 2 is registered separately; it is currently installed from source.)

That's it. Claude now has persistent memory. Ask it to store something and recall it in a later session.

Features

Hybrid Search — Three independent retrieval strategies run in parallel and merge results via Reciprocal Rank Fusion (RRF):

LayerMethodStrength
VectorCosine similarity (jina-v5-nano, 768d)Semantic meaning
FTS5SQLite full-text search with trigram tokenizerExact terms, names, IDs
KeywordFallback pattern matchingEdge cases, partial matches

Memory Types:

  • Declarative memory — Individual facts, decisions, instructions stored via store
  • Episodic memory — Conversation summaries archived via archive_episode
  • Profile memory — Accumulated user/project attributes via update_profile

Confidence Scoring — Each recalled memory gets a confidence score combining:

  • Cosine similarity (semantic relevance)
  • Dynamic time decay (adapts to corpus time range — a 1-year-old corpus and a 1-day-old corpus use different decay curves)
  • Recall boost (frequently useful memories surface more easily, with natural fade-out)
  • Completion factor (resolved topics decay faster)

Zero LLM Dependency — cpersona is a pure data server. It never calls an LLM internally. All summarization and extraction is performed by the calling agent. This means zero API costs from cpersona itself, deterministic behavior, and no hidden latency.

Additional capabilities:

  • Agent namespace isolation — multiple agents share one DB without interference
  • Background task queue — DB-persisted, crash-recoverable async processing
  • JSONL export/import — full memory portability between environments
  • Agent-to-agent memory merge — atomic copy/move with deduplication
  • Auto-calibration — statistical threshold tuning via null distribution z-score (no labels needed)
  • Health check — 16 automated detections with auto-repair (contamination, duplicates, FTS desync, invalid data, stale tasks, empty content, invalid sources)
  • Deep check — semantic data quality analysis (anonymous source recovery, short content, stale profiles, orphaned episodes)
  • Memory protection — lock/unlock to prevent accidental deletion or editing
  • Recent recall penalty — suppresses echo chamber effect for frequently recalled memories
  • stdio + Streamable HTTP transport
  • Single-file SQLite — no external database required

Architecture

                         ┌─────────────────────────────────────┐
                         │            MCP Host                 │
                         │   (Claude Desktop / Claude Code)    │
                         └──────────────┬──────────────────────┘
                                        │ MCP (JSON-RPC)
                         ┌──────────────▼──────────────────────┐
                         │           cpersona                  │
                         │         (server.py)                 │
                         │                                     │
                         │  ┌─────────┐  ┌─────────┐          │
                         │  │  store   │  │ recall  │  ...     │
                         │  └────┬────┘  └────┬────┘          │
                         │       │             │               │
                         │  ┌────▼─────────────▼────────────┐  │
                         │  │         SQLite DB              │  │
                         │  │                                │  │
                         │  │  memories    (content + embed) │  │
                         │  │  episodes    (summaries)       │  │
                         │  │  profiles    (attributes)      │  │
                         │  │  memories_fts (FTS5 index)     │  │
                         │  │  episodes_fts (FTS5 index)     │  │
                         │  │  task_queue   (async jobs)     │  │
                         │  └────────────────────────────────┘  │
                         │                                      │
                         └──────────────┬───────────────────────┘
                                        │ HTTP
                         ┌──────────────▼──────────────────────┐
                         │       Embedding Server              │
                         │  (jina-v5-nano ONNX, 768d)          │
                         └─────────────────────────────────────┘

Recall flow (RRF mode):

Query → ┌── Vector search (cosine similarity)  ──┐
        ├── FTS5 search (episodes + memories)    ──┼── RRF merge → Confidence scoring → Top-K
        └── Keyword fallback                     ──┘

Benchmarks

Tested on LMEB (Long-term Memory Evaluation Benchmark) — 22 evaluation tasks measuring memory retrieval quality:

Embedding ModelParamsDimensionsMean NDCG@10
MiniLM-L6-v222M38436.88
e5-small33M38446.36
jina-v5-nano33M76854.14

jina-v5-nano achieves +47% improvement over the MiniLM baseline.

All Tools

ToolDescription
storeStore a message in agent memory
recallRecall relevant memories (vector + FTS5 + keyword, RRF merge)
recall_with_contextRecall with external conversation context (auto-dedup)
get_profileGet current agent profile
update_profileSave pre-computed agent profile
archive_episodeArchive conversation episode with summary and keywords
list_memoriesList recent memories
list_episodesList archived episodes
update_memoryUpdate memory content (rejects if locked)
lock_memoryLock memory to prevent deletion/editing
unlock_memoryUnlock memory to allow deletion/editing
delete_memoryDelete a single memory (ownership enforced)
delete_episodeDelete a single episode (ownership enforced)
delete_agent_dataDelete all data for an agent
calibrate_thresholdAuto-calibrate vector search threshold via z-score
set_recall_precisionSet an agent's recall precision (knob 3) and recalibrate its gate
get_recall_precisionRead an agent's effective recall precision (knob 3)
pause_persistenceTurn writes into no-ops for an opt-in TTL window
resume_persistenceRe-enable persistence immediately
persistence_statusReport whether persistence is paused and the TTL remaining
migrate_channel_axisRe-channel bridge-type memories to their concrete channel
export_memoriesExport to JSONL (memories, episodes, profiles)
import_memoriesImport from JSONL (idempotent via msg_id dedup)
merge_memoriesMerge one agent's data into another (atomic, with dedup)
get_queue_statusBackground task queue status
check_health16-point database health check with auto-repair
deep_checkDeep semantic data quality analysis with auto-repair

Configuration

All settings via environment variables with sensible defaults:

VariableDefaultDescription
CPERSONA_DB_PATH./cpersona.dbSQLite database path
CPERSONA_EMBEDDING_MODEhttpEmbedding mode (http or disabled)
CPERSONA_EMBEDDING_URLhttp://127.0.0.1:8401/embedEmbedding server URL
CPERSONA_VECTOR_SEARCH_MODEremoteVector search mode
CPERSONA_RECALL_MODErrfRecall fusion strategy (rrf, rsf, or cascade)
CPERSONA_RRF_K60RRF smoothing parameter
CPERSONA_CONFIDENCE_ENABLEDfalseInclude confidence metadata in results
CPERSONA_AUTO_CALIBRATEfalseAuto-calibrate on startup
CPERSONA_TASK_QUEUE_ENABLEDfalseEnable background task queue
CPERSONA_RECENT_RECALL_PENALTY0.7Penalty for recently recalled memories
CPERSONA_RECENT_RECALL_WINDOW_MIN5Window (minutes) for recent recall penalty

Recall fusion mode (CPERSONA_RECALL_MODE)

  • rrf (default) — Reciprocal Rank Fusion: merges the vector + FTS channels by rank only. Robust and scale-free, but discards score magnitude.
  • rsf — Relative Score Fusion: per-query min-max-normalizes each channel's raw score (cosine for vector, bm25 for keyword) and sums them, so the keyword channel's bm25 magnitude survives the merge. Recommended for topic-drift-prone or space-less language (e.g. Japanese) contexts, where that magnitude is the discriminating signal rrf flattens away (≈ Weaviate's relativeScoreFusion; see the ClotoCore RECALL_CONTAMINATION_AB_2026-06-14 report §10–12). Caveat: min-max normalization can over-cut small, closely-scored result sets when autocut is enabled — rrf remains the default until that interaction is hardened.
  • cascade — Sequential channel fill (legacy).

Stats

  • ~5,600 LOC Python across focused modules
  • 146 tests across 12 test modules
  • Schema v10 (auto-migrating)
  • MIT License

Works With

cpersona is an MCP server — it works with any MCP-compatible host:

Part of ClotoCore

cpersona is the memory layer of ClotoCore, an open-source AI agent platform written in Rust. While cpersona is fully standalone (MIT license), it was designed to give AI agents persistent, searchable memory within the ClotoCore ecosystem.

Learn More

License

MIT — free to use from any MCP host without restriction.