memory-v2
Brain-inspired persistent memory MCP server with hybrid BM25+vector search, ACT-R activation scoring, FadeMem decay, and knowledge graphs — 17 tools, fully local via Ollama, zero API keys.
memory-v2
Brain-Inspired Persistent Memory for AI Coding Assistants
Why memory-v2?
Most AI memory systems are just databases with a search API. memory-v2 is a cognitive architecture -- it models how human memory actually works.
| Feature | memory-v2 | Flat-file / CLAUDE.md | Claude-brain | Zep / Graphiti |
|---|---|---|---|---|
| Search | Hybrid BM25 + vector (fused) | None / grep | FTS5 keyword (auto) + vector (manual) | Vector + graph |
| Ranking | ACT-R activation (cognitive science) | None | FTS5 rank * recency | Recency + embedding similarity |
| Forgetting | FadeMem decay (STM/LTM promotion) | Never forgets, overflows | Never forgets | TTL-based expiry |
| Knowledge structure | Graph (Leiden communities, PPR) | Flat markdown | Flat tables | Graph (Graphiti) |
| Consolidation | CogCanvas 6-step compression | Manual | None | None |
| Governance | Protected memories (immune to decay) | None | None | None |
| Multi-agent | Authority chains, conflict detection | None | None | None |
| Privacy | 100% local (Ollama, zero API keys) | Local | Local | Cloud or self-hosted |
| MCP tools | 17 | 0 | Read-only MCP | Varies |
Get running in 60 seconds:
# Install
pip install memory-v2
# Start the MCP server
memory-v2-server --db ~/memory.db
# Or add to Claude Code settings.json
{
"mcpServers": {
"memory": {
"command": "memory-v2-server",
"args": ["--db", "~/memory.db"]
}
}
}
Your AI assistant now has persistent memory that survives compactions, decays gracefully, and builds a knowledge graph as you work.
Abstract
memory-v2 is a cognitively-grounded persistent memory system designed for AI coding assistants operating over long-lived, multi-session workflows. It replaces the first-generation claude-memory system with a ground-up rewrite that fuses techniques from cognitive psychology (ACT-R activation theory, power-law forgetting), information retrieval (BM25/vector hybrid search with Reciprocal Rank Fusion), and graph-based knowledge representation (Leiden community detection, Personalized PageRank retrieval) into a single SQLite-backed store.
The system serves as a Model Context Protocol (MCP) server exposing 17 tools, allowing any MCP-compatible AI client -- Claude Code, Claude Desktop, or custom agents -- to store, search, decay, compress, and share memories without external API calls. All embedding and LLM inference runs locally through Ollama, requiring zero API keys and transmitting zero data off-machine.
memory-v2 was built by a practitioner who ran 59 compaction cycles on the v1 system and decided the architecture needed to be rethought from first principles. The result is a system where memories compete for survival through activation scores, where protected content is exempt from decay, where multiple agents coordinate through authority chains, and where a knowledge graph discovers connections the flat search index cannot.
Table of Contents
- Why memory-v2?
- Abstract
- Architecture Overview
- Theoretical Foundations
- Database Schema
- MCP Server and Tools
- Subsystem Reference
- Worked Examples
- Installation
- Configuration
- Usage
- v1 to v2 Migration
- Repository Structure
- Related Work
- References
- Built With
- License
Architecture Overview
+-----------------------------+
| MCP Client Layer |
| (Claude Code / Desktop / |
| any MCP-compatible agent) |
+-------------+---------------+
|
FastMCP Protocol (stdio)
|
+-------------v---------------+
| server.py (17 tools) |
| add | search | graph_search |
| extract | compact | decay |
| agent_sync | check_integrity |
+---+-----+-----+-----+------+
| | | |
+---------------+ | | +---------------+
| | | |
+----------v---------+ +------v-----v------+ +----------v---------+
| db.py | | scoring.py | | knowledge_graph.py |
| SQLite + sqlite-vec | | ACT-R activation | | NetworkX DiGraph |
| + FTS5 | | FadeMem decay | | Leiden communities |
| | | Cosine similarity | | PPR retrieval |
| memories (rows) | | | | |
| memory_fts (BM25) | | base_level_act() | | extract_entities() |
| memory_vec (768-d) | | spreading_act() | | detect_communities |
| graveyard | | importance_score()| | ppr_search() |
| agent_offsets | | decay_value() | | visualize_graph() |
| conflicts | | retrieval_prob() | | |
| compaction_receipts | | | | |
+----------+----------+ +-------------------+ +----------+---------+
| |
| +-------------------+ |
+--------------+ embeddings.py +---------------+
| Ollama |
| nomic-embed-text |
| 768-dim vectors |
| Local file cache |
+-------------------+
|
+-----------------------+-----------------------+
| | |
+----------v---------+ +---------v----------+ +---------v----------+
| extraction.py | | compaction.py | | multi_agent.py |
| 2-pass LLM pipeline| | CogCanvas 6-step | | Authority chain |
| Fact extraction | | Protected extract | | Consumer offsets |
| Novelty checking | | Selective deletion | | Conflict detection |
| Action decision | | Summary + verify | | Kafka-style sync |
+--------------------+ +--------------------+ +--------------------+
|
+----------v---------+ +--------------------+
| vault_indexer.py | | security.py |
| Markdown chunking | | 13 credential pats |
| SHA-256 delta detect| | 6 injection pats |
| Frontmatter parsing | | SHA-256 manifests |
| Incremental indexing| | Allowlist filtering |
+--------------------+ +--------------------+
The architecture follows a layered design where the MCP server (server.py) acts as the sole entry point for AI clients. All 17 tools delegate to specialized subsystems. The database layer (db.py) owns the single SQLite file, which contains three co-located indexes: relational rows in memories, BM25 full-text search in memory_fts (FTS5), and 768-dimensional vector embeddings in memory_vec (sqlite-vec). The scoring layer applies cognitive activation formulas on top of search results. The knowledge graph lives in a separate NetworkX pickle file and provides multi-hop discovery that the flat index cannot.
Every embedding and LLM call routes through Ollama, which runs locally. The system never phones home.
Theoretical Foundations
ACT-R Activation Model
The Adaptive Control of Thought -- Rational (ACT-R) framework, developed by John Anderson and colleagues at Carnegie Mellon, provides the core theory for how memories compete for retrieval. The central claim is that human memory retrieval is a rational adaptation to the statistical structure of the environment: items that have been used recently and frequently are more likely to be needed again (Anderson & Schooler, 1991).
memory-v2 implements the ACT-R activation equation as a scoring overlay on search results. Each memory has an activation value that determines its probability of being retrieved. Activation has three components: base-level activation (how often and how recently the memory was accessed), spreading activation (contextual priming from the current query), and noise (stochastic variation that prevents deterministic behavior).
Base-Level Activation
The base-level activation approximates the full rational analysis using the optimized closed-form:
B_i = ln(n / (1 - d)) - d * ln(L)
Where:
n= total access count for the memoryd= decay parameter (fixed at 0.5, the canonical ACT-R value)L= lifetime of the memory in hours (time since creation)
This approximation avoids storing the full access history while preserving the key property: activation rises with frequency and falls with time, following a power law.
Spreading Activation
Context tags from the current query prime associated memories:
S_i = SUM_j [ W_j * (S_max - ln(fan_j)) ]
Where:
W_j = 1 / |context_tags|(attention weight, divided equally among context sources)S_max = 1.6(maximum associative strength)fan_j= number of memories sharing tagj(the "fan" of the source)
The key insight: tags that appear on many memories provide less activation (they are less discriminating), while rare tags provide more. This is the ACT-R equivalent of IDF weighting.
Noise
Activation includes a stochastic noise term drawn from the logistic distribution:
epsilon ~ Logistic(0, s) where s = 0.25
Generated as:
epsilon = s * ln(u / (1 - u)) where u ~ Uniform(0, 1)
This noise prevents the system from becoming deterministic and allows occasionally surprising retrievals -- a property that mirrors human memory.
Full Activation and Retrieval Probability
The complete activation equation:
A_i = B_i + S_i + epsilon
The probability of successful retrieval given activation:
P(retrieve) = 1 / (1 + exp(-(A_i - tau) / s))
Where:
tau = -0.5(retrieval threshold)s = 0.25(noise scale, same as the noise parameter)
This is a sigmoid function centered at the threshold. Memories with activation well above the threshold are almost certainly retrieved; memories well below are almost certainly forgotten.
Protected floor: Memories marked as protected receive an activation floor of tau + 1.0 = 0.5, ensuring they are always retrievable regardless of age or access pattern.
Final Reranking
After hybrid search produces an initial ranked list, ACT-R activation is used to rerank:
final_score = 0.6 * hybrid_rrf_score + 0.4 * (activation / 10.0)
The activation is divided by 10 to normalize it into the same range as the RRF score (typically 0.0 to 0.03). The 60/40 split weights lexical+semantic relevance above cognitive activation, while still allowing frequently-accessed, contextually-primed memories to rise.
FadeMem Decay System
While ACT-R handles retrieval competition at query time, FadeMem handles the background lifecycle of memories. It implements a dual-layer memory architecture (Short-Term Memory and Long-Term Memory) with power-law decay, importance-based promotion/demotion, and archival.
Importance Scoring
Each memory's importance is a weighted combination of three signals:
I(t) = 0.4 * relevance + 0.3 * frequency + 0.3 * recency
Where:
relevance= contextual relevance score (0.5 during background sweeps when no query context is available)frequency = log(access_count + 1) / log(max_access_count + 1)(log-normalized access frequency)recency = exp(-decay_rate * hours_since_access)(exponential recency decay)
The weights (0.4, 0.3, 0.3) reflect a design choice: what a memory is about matters slightly more than how often or how recently it was accessed.
Power-Law Decay Function
The core decay function:
v(t) = v(0) * exp(-lambda * t^beta)
Where:
v(0)= initial memory strengthlambda= per-memory decay rate (default0.1)t= time since last access in hoursbeta= layer-dependent exponent:beta_LTM = 0.8(sub-linear -- LTM memories decay slower than exponential)beta_STM = 1.2(super-linear -- STM memories decay faster than exponential)
The beta parameter is the key innovation over standard exponential decay. At beta < 1, the decay curve bends upward relative to exponential, meaning old memories decay more slowly the older they get -- they are "hardened" by time. At beta > 1, the curve bends downward, meaning new memories that fail to consolidate decay acceleratingly.
Dual-Layer Promotion and Demotion
Memories move between layers based on importance thresholds:
Promotion: STM --> LTM when importance >= 0.7
Demotion: LTM --> STM when importance <= 0.3
Archive: STM --> grave when importance < 0.1 AND age > 30 days
The gap between 0.3 and 0.7 is a hysteresis zone: memories in this range stay in their current layer. This prevents oscillation at the boundary.
Protected Tags
Memories tagged with any of the following are immune to decay and archival:
correction, decision, identity, emotional_anchor,
commitment, exact_value, chain_of_command, person
These represent categories of information where loss would be harmful regardless of access frequency.
Hybrid Search with Reciprocal Rank Fusion
memory-v2 runs two independent search algorithms and merges them:
- BM25 keyword search via SQLite FTS5 -- excels at exact term matching, file names, error codes
- Vector similarity search via sqlite-vec (768-dim, cosine distance) -- excels at semantic similarity
The merge uses Reciprocal Rank Fusion (Cormack et al., 2009), which has been shown to outperform individual ranking methods and Condorcet fusion:
score(d) = SUM_i [ 1 / (k + rank_i(d)) ]
Where:
k = 60(the RRF constant; higher values reduce the influence of top-ranked results)rank_i(d)= position of documentdin thei-th ranked list (0-indexed)- The sum runs over both BM25 and vector result lists
Documents appearing in both lists receive scores from both. Documents appearing in only one list receive a score from that list alone (the other term is 0). The result is a fused ranking that captures both lexical precision and semantic breadth.
Each search retrieves limit * 3 candidates before fusion to ensure adequate recall. The fused results are then passed to the ACT-R scoring layer for final reranking.
Knowledge Graph and Personalized PageRank
The knowledge graph is a NetworkX directed graph (DiGraph) that represents entities and relationships extracted from vault documents. It provides a complementary retrieval path: while flat search finds documents containing similar words or vectors, graph traversal discovers structurally related concepts even when they share no lexical or embedding similarity.
Schema
10 node types:
person, project, concept, decision, tool,
event, emotion, conversation, chunk, community
11 edge types:
discussed_in, decided, built, uses, part_of,
related_to, preceded_by, caused, felt, evolved_from, member_of
Entities are normalized to lowercase. Edges carry weight (incremented on repeated observation), temporal metadata (valid_from, valid_until), confidence scores, and source file provenance.
Entity Extraction
Entity and relationship extraction uses a local LLM (default: qwen2.5:3b via Ollama) with a structured prompt that produces JSON output. The model is instructed to normalize names, skip trivial relationships, and use canonical forms. The first 4000 characters of each document are processed (respecting the small model's context window).
Leiden Community Detection
The graph is partitioned using the Leiden algorithm (Traag et al., 2019), which guarantees well-connected communities -- an improvement over the earlier Louvain method that could produce arbitrarily badly connected communities. Implementation uses python-igraph and leidenalg (optional dependencies).
Community assignments are stored as node attributes and surfaced through the list_topics MCP tool, providing an automatic clustering of the knowledge base without manual taxonomy.
HippoRAG-Style Personalized PageRank Retrieval
The graph_search tool implements Personalized PageRank (PPR) retrieval inspired by HippoRAG (2024):
- Seed identification: Extract entities from the query; match them to graph nodes by word overlap. If no direct match, fall back to embedding similarity against node names.
- Personalization vector: Construct a uniform distribution over seed nodes (all others get weight 0).
- PPR computation:
nx.pagerank(G_undirected, alpha=0.85, personalization=p) - Result extraction: Return top-K nodes by PPR score, including community membership, mention count, and source provenance.
The teleport probability alpha = 0.85 means 85% of the random walk follows edges and 15% teleports back to seed nodes. This strikes the standard balance between exploration and relevance.
The key advantage over flat search: PPR discovers nodes reachable by multi-hop traversal from the query entities, even if those nodes share no embedding or lexical similarity with the query. This enables "what else is connected to this?" reasoning.
Database Schema
All persistent state (except the knowledge graph pickle) lives in a single SQLite database file.
Entity-Relationship Diagram
erDiagram
memories {
INTEGER id PK
TEXT content
TEXT content_type
TEXT source_file
INTEGER source_line
TEXT author
INTEGER authority_level
REAL confidence
INTEGER protected
TEXT tags
TEXT created_at
TEXT updated_at
TEXT last_accessed_at
INTEGER access_count
REAL activation_score
REAL importance_score
REAL decay_rate
INTEGER archived
INTEGER supersedes FK
TEXT content_hash
}
memory_fts {
TEXT content
TEXT tags
TEXT source_file
}
memory_vec {
INTEGER id PK
BLOB embedding
}
graveyard {
INTEGER id PK
INTEGER memory_id FK
TEXT content
TEXT metadata
TEXT archived_at
TEXT reason
REAL last_activation_score
}
agent_offsets {
TEXT agent_id PK
INTEGER last_read_line
TEXT last_read_time
}
file_hashes {
TEXT file_path PK
TEXT content_hash
TEXT indexed_at
INTEGER chunk_count
}
conflicts {
INTEGER id PK
INTEGER memory_id_a FK
INTEGER memory_id_b FK
TEXT agent_a
TEXT agent_b
TEXT description
INTEGER resolved
TEXT resolved_by
TEXT created_at
}
compaction_receipts {
INTEGER id PK
TEXT timestamp
INTEGER original_tokens
INTEGER compressed_tokens
REAL ratio
INTEGER protected_items_extracted
REAL verification_score
TEXT vault_files_updated
TEXT receipt_data
}
memories ||--o{ graveyard : "archived to"
memories ||--|| memory_fts : "FTS5 index"
memories ||--|| memory_vec : "vector index"
memories ||--o{ conflicts : "involved in"
memories ||--o| memories : "supersedes"
Table Details
| Table | Purpose | Index Strategy |
|---|---|---|
memories | Core storage. One row per memory (or vault chunk). | B-tree on content_type, archived, protected, source_file, created_at, activation_score |
memory_fts | FTS5 virtual table over content, tags, source_file. Auto-synced via AFTER INSERT/UPDATE/DELETE triggers. | Inverted index (BM25) |
memory_vec | vec0 virtual table holding 768-dim float32 embeddings. One row per memory, keyed by id. | HNSW-like approximate NN (sqlite-vec internal) |
graveyard | Archived memories. Preserves full content and metadata for potential rehydration. | None (append-only audit log) |
agent_offsets | Kafka-style consumer offsets. Each agent tracks its last-read position in the changelog. | Primary key on agent_id |
file_hashes | SHA-256 hashes for incremental vault indexing. Skips unchanged files on re-index. | Primary key on file_path |
conflicts | Contradiction log. Records when two agents write conflicting memories. | Sequential scan (low volume) |
compaction_receipts | Audit trail for every compaction run. Stores ratios, verification scores, protected item counts. | Sequential scan |
SQLite Configuration
PRAGMA journal_mode = WAL; -- Write-Ahead Logging for concurrent reads
PRAGMA foreign_keys = ON; -- Enforce referential integrity
The sqlite-vec extension is loaded at connection time via sqlite_vec.load(conn). The database file is protected by a 10-second timeout for lock contention.
MCP Server and Tools
memory-v2 exposes 17 tools through the Model Context Protocol via FastMCP. The server runs over stdio (standard MCP transport) and can be registered with any MCP-compatible client.
Tool Reference
CRUD Operations
| Tool | Parameters | Description |
|---|---|---|
add_memory | content, content_type?, tags?, source?, protected?, author?, confidence? | Store a new memory with embedding, novelty checking, and credential scanning. Returns memory_id, novelty, importance, protected status. |
get | memory_id | Retrieve a single memory by ID. Updates access timestamp and count (retrieval strengthening). |
update | memory_id, content?, tags?, protected? | Update content, tags, or protection status. Re-embeds if content changes. |
forget | memory_id, reason? | Archive to graveyard. Protected memories cannot be forgotten. |
Search Operations
| Tool | Parameters | Description |
|---|---|---|
search | query, limit?, content_type? | Hybrid BM25 + vector search with RRF fusion and ACT-R reranking. The primary search tool. |
keyword_search | keywords, limit? | Pure BM25 keyword search. Preferred for exact terms, file names, error codes. |
Graph Operations
| Tool | Parameters | Description |
|---|---|---|
graph_search | query, top_k? | Personalized PageRank traversal. Discovers related concepts via multi-hop graph walk. |
graph_stats_tool | (none) | Node/edge counts, type distributions, top entities by mention count, graph density. |
System Operations
| Tool | Parameters | Description |
|---|---|---|
stats | (none) | Total active, archived, protected, graveyard counts, files indexed, type distribution. |
list_recent | hours?, limit? | Recently created memories, sorted by created_at descending. |
list_topics | limit? | Topic clusters from Leiden communities (if graph built) or file structure fallback. |
reindex | force? | Incremental vault re-indexing. Only processes files whose SHA-256 hash changed. force=True rebuilds all. |
Advanced Operations
| Tool | Parameters | Description |
|---|---|---|
extract_from_conversation | text, source?, author? | Two-pass LLM pipeline: extract facts, then decide ADD/UPDATE/DELETE/NONE for each. |
compact_text | text, max_ratio?, verify? | CogCanvas 6-step compression with protected content extraction and faithfulness verification. |
decay_sweep | (none) | FadeMem background maintenance: update importance scores, promote/demote layers, archive dead memories. |
agent_sync | agent_id | Kafka-style changelog sync. Returns unread entries and updates the agent's consumer offset. |
check_conflicts | (none) | List all unresolved inter-agent memory conflicts. |
check_integrity | (none) | Verify vault files against SHA-256 manifest. Reports modified, new, and missing files. |
Server Initialization
On startup, the server:
- Registers all 17 tools with FastMCP
- Spawns a background thread to pre-warm the Ollama embedding model (avoids ~55-second cold-start penalty on first query)
- Lazily initializes the SQLite connection on first tool call
- Runs over stdio with
mcp.run(show_banner=False)
Subsystem Reference
Embedding Layer
File: src/memory_v2/embeddings.py
| Property | Value |
|---|---|
| Model | nomic-embed-text (via Ollama) |
| Dimensions | 768 |
| Backend | Ollama local inference |
| Caching | SHA-256 keyed file cache in ~/.memory-v2/cache/ |
| Batch API | embed_batch(texts: list[str]) for vault indexing |
The embedding layer is intentionally thin: four functions (embed_text, embed_batch, embed_with_cache, get_client). Embedding model selection is configurable via MEMORY_V2_EMBED_MODEL for users who want to swap in a different Ollama model.
The get_embedder() function in __init__.py provides lazy initialization with a throwaway warmup call to avoid cold-start latency on the first real query.
Vault Indexer
File: src/memory_v2/vault_indexer.py
The vault indexer converts a directory of markdown files into searchable memory chunks.
Chunking strategy:
- Target: 500 tokens per chunk (~2000 characters at 4 chars/token)
- Overlap: 50 tokens between consecutive chunks
- Split hierarchy: paragraph boundaries first, sentence boundaries for oversized paragraphs
- Frontmatter extraction: YAML metadata (
type,tags,author) parsed and applied to chunks
Incremental indexing:
- Each file's SHA-256 hash is stored in
file_hashes - On re-index, unchanged files are skipped entirely
- Changed files have their old chunks deleted before re-chunking
- The
force=Trueflag bypasses hash checking for full rebuilds
Content type detection:
- Derived from vault folder structure:
conversations/->episode,decisions/->decision,people/->person,origins/->identity - Overridable via frontmatter
type:field
Changelog handling:
changelog.mdis indexed separately (one memory per dated entry)- Each entry matching
[YYYY-MM-DD]becomes anepisodetype memory - Batch-embedded for efficiency
Auto-Extraction Pipeline
File: src/memory_v2/extraction.py
The extraction pipeline converts unstructured conversation text into discrete, typed memories through a two-pass LLM process.
Pass 1 -- Fact Extraction:
The local LLM (qwen2.5:3b) receives the conversation text and a structured prompt targeting 7 categories:
- Personal preferences
- Personal details (names, relationships, dates)
- Plans and intentions
- Project details (status, architecture, decisions)
- Technical specifics (file paths, ports, configs, errors)
- Corrections
- Emotional/relational beats
The prompt explicitly instructs the model to extract only from user messages, produce self-contained statements, and skip trivial content.
Pass 2 -- Memory Action Decision:
For each extracted fact:
- Credential scan: Block storage if credentials detected (13 patterns)
- Embedding: Generate 768-dim vector via Ollama
- Novelty check: Search for similar existing memories
- Similarity > 0.92:
NONE(duplicate, skip) - Similarity 0.75--0.92: Invoke LLM to decide
ADD,UPDATE, orDELETE - Similarity < 0.75:
ADD(sufficiently novel)
- Similarity > 0.92:
- Content type detection: Automated via keyword heuristics (
correction,decision,episode,fact) - Emotional detection: 20 emotion keywords + 3 regex patterns; emotional content is auto-protected
- Storage: Execute the decided action with appropriate importance scoring
CogCanvas Compaction Pipeline
File: src/memory_v2/compaction.py
Inspired by the CogCanvas framework for cognitive compression in agents, this pipeline reduces verbose text while preserving critical information. It follows a 6-step process:
Step 1 -- Pre-Extract Protected Content: Scan every line for protected patterns (corrections, decisions, emotional markers, exact values, commitments, configured names). Matching lines are extracted verbatim and set aside.
Step 2 -- Selective Deletion: Remove expendable content: numbered URL lists, shell command output, repeated information markers ("as I mentioned"), boilerplate ("let me know if you need help").
Step 3 -- Structured Summary:
If the remaining text exceeds 2000 characters, invoke the local LLM to summarize. The prompt explicitly preserves factual claims, names, dates, numbers, file paths, and causal relationships. Compression target is bounded by max_ratio (default 3:1).
Step 4 -- Faithfulness Verification: A second LLM pass compares the summary against the original, checking for:
- Hallucinations (claims not in the original)
- Missing facts (important information dropped)
- Overall faithfulness score (0.0 to 1.0)
If hallucinations are detected and the score falls below 0.7, the system reverts to the deletion-only output rather than trust the unfaithful summary.
Step 5 -- Receipt Generation: Every compaction produces an auditable receipt: original/compressed token counts, compression ratio, protected items count, verification score, hallucination count.
Step 6 -- Receipt Storage:
Receipts are persisted to the compaction_receipts table for historical analysis.
Step 7 (bonus) -- Rehydration:
The rehydrate() function can attempt to recover full content from the vault or graveyard if the compressed version is insufficient.
Multi-Agent Coordination
File: src/memory_v2/multi_agent.py
memory-v2 supports multiple AI agents sharing a single memory database through three mechanisms:
Authority Chain
A configurable hierarchy determines who can overwrite whom:
{
"human": 1, # Manual edits -- highest authority
"primary_agent": 2, # Primary AI agent (e.g., Claude Code)
"coordinator": 3, # Coordinating agent
"worker": 4, # Subordinate worker agent
"indexer": 5, # Automated vault indexer
"extractor": 6, # Automated fact extraction
}
Lower number = higher authority. When a write conflicts with an existing memory:
- Higher authority: overwrites the existing memory
- Equal or lower authority: logs a conflict and stores both versions
The hierarchy is overridable via the MEMORY_V2_AUTHORITY_CHAIN environment variable (JSON format).
Kafka-Style Consumer Offsets
Each agent has a consumer offset tracked in the agent_offsets table. The agent_sync tool reads unread changelog entries and advances the offset -- the same pattern used by Apache Kafka consumer groups, adapted to a file-based changelog.
This allows agents to boot up, catch up on what other agents wrote since their last session, and proceed without re-reading the entire history.
Conflict Detection and Resolution
When an incoming memory contradicts an existing one (detected by antonym keyword pairs: "not"/"is", "false"/"true", "never"/"always", etc.), the system:
- Checks authority levels
- Either overwrites (higher authority) or logs the conflict
- Conflicts surface through the
check_conflictstool for human review
Security Module
File: src/memory_v2/security.py
Credential Scanning (13 Patterns)
Every memory stored through add_memory or extract_from_conversation is scanned against 13 regex patterns:
| Pattern | Target |
|---|---|
api_key/secret/token/password=... | Generic credential assignments |
| Base64 strings (40+ chars) | Encoded secrets |
sk-[a-zA-Z0-9]{20,} | OpenAI API keys |
sk-ant-[a-zA-Z0-9-]{20,} | Anthropic API keys |
ghp_[a-zA-Z0-9]{36} | GitHub Personal Access Tokens |
gho_[a-zA-Z0-9]{36} | GitHub OAuth tokens |
bearer [token] | Bearer authentication tokens |
xoxb-... / xoxp-... | Slack bot/user tokens |
AKIA[0-9A-Z]{16} | AWS access keys |
-----BEGIN PRIVATE KEY----- | RSA/EC private keys |
mongodb://... | MongoDB connection URIs |
postgres://... | PostgreSQL connection URIs |
An allowlist prevents false positives on redacted keys (sk-abc...), example placeholders, and the system's own SHA-256 content hashes.
Behavior on detection: The memory is blocked -- not stored. The tool returns an error message instructing the caller to remove sensitive data.
Injection Detection (6 Patterns)
Content from external sources is scanned for prompt injection attempts:
"ignore previous instructions"
"you are now a"
"system: you"
"forget everything/all/your"
"new instructions:"
"override previous/system/all"
The system warns but does not strip -- the caller (the AI agent) makes the final decision.
Integrity Manifests
The check_integrity tool generates and verifies SHA-256 manifests for all vault markdown files, detecting:
- Modified files (hash mismatch)
- New files (not in manifest)
- Missing files (in manifest but not on disk)
Worked Examples
ACT-R Activation Walkthrough
Consider a memory stored 48 hours ago with 5 accesses, tagged ["python", "debugging"]. The current query context includes the tag "python", which appears on 20 memories total.
Step 1: Base-level activation
n = 5 (access count)
L = 48 (hours since creation)
d = 0.5
B_i = ln(5 / (1 - 0.5)) - 0.5 * ln(48)
= ln(10) - 0.5 * ln(48)
= 2.3026 - 0.5 * 3.8712
= 2.3026 - 1.9356
= 0.3670
Step 2: Spreading activation
Context tags: ["python"] (1 tag, so W_j = 1.0)
Memory tags: ["python", "debugging"]
The tag "python" is shared. Its fan is 20.
S_i = 1.0 * (1.6 - ln(20))
= 1.0 * (1.6 - 2.9957)
= max(-1.3957, 0)
= 0.0
The fan of 20 is too high -- the associative strength is negative, so it clips to 0. This tag is too common to provide useful priming. If we had used a rarer tag like "asyncio" with a fan of 3:
S_i = 1.0 * (1.6 - ln(3))
= 1.0 * (1.6 - 1.0986)
= 0.5014
Step 3: Noise
Draw u = 0.73 from Uniform(0,1):
epsilon = 0.25 * ln(0.73 / 0.27)
= 0.25 * ln(2.7037)
= 0.25 * 0.9946
= 0.2487
Step 4: Full activation (using the common "python" tag)
A_i = 0.3670 + 0.0 + 0.2487 = 0.6157
Step 5: Retrieval probability
P(retrieve) = 1 / (1 + exp(-(0.6157 - (-0.5)) / 0.25))
= 1 / (1 + exp(-1.1157 / 0.25))
= 1 / (1 + exp(-4.4628))
= 1 / (1 + 0.01155)
= 0.9886
This memory has a 98.9% retrieval probability -- high activation from frequent access and favorable noise.
FadeMem Decay Walkthrough
Consider three memories during a decay sweep:
| Memory | Access Count | Max Access (global) | Hours Since Access | Current Layer |
|---|---|---|---|---|
| A | 12 | 50 | 2 | LTM |
| B | 3 | 50 | 168 (1 week) | STM |
| C | 1 | 50 | 1440 (2 months) | STM |
Memory A (active LTM memory):
frequency = log(12 + 1) / log(50 + 1) = log(13) / log(51) = 2.565 / 3.932 = 0.652
recency = exp(-0.1 * 2) = exp(-0.2) = 0.819
I(A) = 0.4 * 0.5 + 0.3 * 0.652 + 0.3 * 0.819
= 0.200 + 0.196 + 0.246
= 0.641
Importance 0.641 is in the hysteresis zone (0.3 -- 0.7). Memory A stays in LTM. No action.
Memory B (neglected STM memory):
frequency = log(3 + 1) / log(51) = log(4) / log(51) = 1.386 / 3.932 = 0.352
recency = exp(-0.1 * 168) = exp(-16.8) ~ 0.0000005
I(B) = 0.4 * 0.5 + 0.3 * 0.352 + 0.3 * 0.0000005
= 0.200 + 0.106 + 0.000
= 0.306
Importance 0.306 is just above the demotion threshold (0.3). Still in hysteresis zone -- stays in STM.
Memory C (ancient, barely-accessed):
frequency = log(1 + 1) / log(51) = log(2) / log(51) = 0.693 / 3.932 = 0.176
recency = exp(-0.1 * 1440) = exp(-144) ~ 0.0
I(C) = 0.4 * 0.5 + 0.3 * 0.176 + 0.3 * 0.0
= 0.200 + 0.053 + 0.000
= 0.253
Importance 0.253 is below demotion threshold (0.3). Memory C is demoted to STM (already there). Since 0.253 > archive threshold 0.1, it is NOT archived yet. But if it had importance below 0.1 and age > 30 days (true at 60 days), it would be archived to the graveyard.
Hybrid Search Walkthrough
Query: "Python asyncio event loop configuration"
BM25 results (FTS5 MATCH, ordered by BM25 rank):
| Rank | Memory ID | Content (truncated) |
|---|---|---|
| 0 | 42 | "The asyncio event loop configuration for..." |
| 1 | 88 | "Python 3.12 changed the default event loop..." |
| 2 | 15 | "Configuration files should use TOML format..." |
Vector results (sqlite-vec, ordered by embedding distance):
| Rank | Memory ID | Content (truncated) |
|---|---|---|
| 0 | 88 | "Python 3.12 changed the default event loop..." |
| 1 | 42 | "The asyncio event loop configuration for..." |
| 2 | 107 | "uvloop provides a faster event loop replacement..." |
RRF fusion (k = 60):
ID 42: BM25 contribution = 1/(60+0+1) = 0.01639
Vec contribution = 1/(60+1+1) = 0.01613
RRF score = 0.03252
ID 88: BM25 contribution = 1/(60+1+1) = 0.01613
Vec contribution = 1/(60+0+1) = 0.01639
RRF score = 0.03252
ID 15: BM25 contribution = 1/(60+2+1) = 0.01587
Vec contribution = 0 (not in vector top-K)
RRF score = 0.01587
ID 107: BM25 contribution = 0 (not in BM25 top-K)
Vec contribution = 1/(60+2+1) = 0.01587
RRF score = 0.01587
IDs 42 and 88 tie at 0.03252. Both appeared in both result sets, confirming they are relevant on both lexical and semantic axes. IDs 15 and 107 each appeared in only one, receiving half the maximum RRF score.
After ACT-R reranking (applying the 0.6 * hybrid + 0.4 * (activation / 10) formula), memory 88 might rise above 42 if it has been accessed more frequently, or 42 might win if it has stronger contextual tag overlap.
Installation
Prerequisites
- Python 3.9+
- Ollama running locally with
nomic-embed-textand (optionally)qwen2.5:3bpulled - sqlite-vec (installed automatically via pip)
From PyPI
pip install memory-v2
From Source
git clone https://github.com/Haustorium12/memory-v2.git
cd memory-v2
pip install -e .
With Graph Dependencies
Leiden community detection and PyVis visualization require optional dependencies:
pip install memory-v2[graph]
With Development Dependencies
pip install memory-v2[all]
Ollama Setup
# Install Ollama (https://ollama.com/download)
ollama pull nomic-embed-text # Required: 768-dim embeddings
ollama pull qwen2.5:3b # Optional: fact extraction + compaction
The embedding model is required for all search and storage operations. The LLM model is required only for extract_from_conversation, compact_text, and build_kg (knowledge graph entity extraction).
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
MEMORY_V2_DB | ~/.memory-v2/memory.db | Path to the SQLite database file |
MEMORY_V2_GRAPH | ~/.memory-v2/knowledge_graph.pickle | Path to the NetworkX graph pickle |
MEMORY_V2_CACHE | ~/.memory-v2/cache | Directory for embedding file cache |
MEMORY_V2_VAULT | ~/.memory-v2/vault | Root directory of the markdown vault |
MEMORY_V2_EMBED_MODEL | nomic-embed-text | Ollama model for embeddings |
MEMORY_V2_LLM_MODEL | qwen2.5:3b | Ollama model for extraction/compaction |
MEMORY_V2_PROTECTED_NAMES | (empty) | Comma-separated names to protect from compaction |
MEMORY_V2_AUTHORITY_CHAIN | (see below) | JSON authority hierarchy |
MEMORY_V2_KG_HASHES | ~/.memory-v2/kg_file_hashes.json | File hash registry for incremental KG builds |
MCP Client Configuration
Claude Code (settings.json)
{
"mcpServers": {
"memory-v2": {
"command": "memory-v2-server",
"env": {
"MEMORY_V2_DB": "C:\\Users\\you\\.memory-v2\\memory.db",
"MEMORY_V2_VAULT": "C:\\Users\\you\\vault"
}
}
}
}
Claude Desktop (claude_desktop_config.json)
{
"mcpServers": {
"memory-v2": {
"command": "memory-v2-server",
"args": [],
"env": {
"MEMORY_V2_DB": "/home/you/.memory-v2/memory.db",
"MEMORY_V2_VAULT": "/home/you/vault"
}
}
}
}
Custom Authority Chain
export MEMORY_V2_AUTHORITY_CHAIN='{"human": 1, "primary_agent": 2, "coordinator": 3, "worker": 4}'
Usage
As an MCP Server
The primary interface. Start the server:
memory-v2-server
The server communicates over stdio using the MCP protocol. Typically, you configure it in your MCP client's settings rather than running it manually.
As a CLI Tool
For scripting and non-MCP workflows:
# Database statistics
memory-v2 stats
# Store a memory
memory-v2 add '{"content": "Python 3.12 uses per-interpreter GIL", "content_type": "fact", "tags": ["python", "concurrency"]}'
# Search
memory-v2 search "Python GIL changes"
# Keyword search
memory-v2 keyword "GIL"
# Get by ID
memory-v2 get 42
# Update
memory-v2 update '{"memory_id": 42, "content": "Python 3.13 makes per-interpreter GIL stable"}'
# Archive
memory-v2 forget '{"memory_id": 42, "reason": "outdated"}'
# Recent memories
memory-v2 recent
# Reindex vault
memory-v2 reindex
Building the Knowledge Graph
The knowledge graph is built from vault files using a standalone CLI that processes each document through LLM entity extraction:
# Incremental build (skip unchanged files)
memory-v2-build-kg
# Full rebuild
memory-v2-build-kg --full
The builder:
- Reads all markdown files from the vault
- Extracts entities and relationships via Ollama
- Adds them to the NetworkX graph (merging duplicate nodes, incrementing edge weights)
- Saves progress every 25 files (checkpoint)
- Runs Leiden community detection
- Generates a PyVis interactive HTML visualization (
graph.html) - Prints graph statistics and top entities
v1 to v2 Migration
memory-v2 is a complete rewrite. It shares zero code with claude-memory v1. The table below summarizes all architectural changes.
| Feature | v1 (claude-memory) | v2 (memory-v2) |
|---|---|---|
| Storage | ChromaDB (embedded document store) | SQLite + sqlite-vec + FTS5 (single file, WAL mode) |
| Search | Weighted vector + BM25 (separate systems) | Reciprocal Rank Fusion combining BM25 and vector in a single query pipeline |
| Decay Model | Ebbinghaus (5 biological mechanisms: decay curve, evergreen exemptions, salience weighting, retrieval strengthening, consolidation) | FadeMem (power-law decay with layer-dependent beta exponents, STM/LTM promotion/demotion with hysteresis, importance-weighted archival) |
| Activation | None | Full ACT-R: base-level + spreading activation + logistic noise + retrieval probability |
| Knowledge Graph | None | NetworkX directed graph with Leiden community detection and Personalized PageRank retrieval |
| Interface | CLI only | MCP server (17 tools) + CLI |
| Memory Types | File chunks only | 6 structured types: fact, episode, decision, correction, identity, person |
| Multi-Agent | None | Authority chain + Kafka-style consumer offsets + conflict detection and logging |
| Security | None | 13 credential patterns + allowlist, 6 injection patterns, SHA-256 integrity manifests |
| Compaction | External watcher process | Built-in CogCanvas 6-step pipeline with faithfulness verification and auditable receipts |
| Extraction | Manual | Two-pass LLM pipeline with novelty checking, emotion detection, and automated content typing |
| Embeddings | SentenceTransformers / OpenAI | Ollama nomic-embed-text (local, zero API keys) |
| LLM | OpenAI API | Ollama qwen2.5:3b (local, zero API keys) |
| Dependencies | chromadb, openai, sentence-transformers | sqlite-vec, fastmcp, ollama, networkx, numpy |
| Data Format | ChromaDB internal (opaque) | SQLite (inspectable, portable, single file) |
| Database File | ChromaDB directory | Single .db file (typically < 50 MB for ~1000 memories) |
Why the Rewrite?
The v1 system worked for 6 months and survived 59 compaction cycles. Its limitations became clear through daily use:
- ChromaDB lock contention: Two agents trying to write simultaneously would deadlock
- No structured types: Everything was a "chunk" -- no way to distinguish decisions from facts
- No graph relationships: Could not answer "what is related to X?" without embedding similarity
- Ebbinghaus decay was too simple: No dual-layer architecture, no importance-weighted survival
- CLI-only: Required shell access; could not be used by Claude Desktop or web-based agents
- External dependencies for core functions: Compaction required a separate watcher process
- API key dependency: Embeddings required OpenAI or local SentenceTransformers (heavyweight)
Repository Structure
memory-v2/
|
+-- src/memory_v2/
| +-- __init__.py # Package init, lazy embedder warmup
| +-- db.py # SQLite + sqlite-vec + FTS5 (schema, CRUD, hybrid search)
| +-- server.py # FastMCP server (17 tools, background warmup thread)
| +-- embeddings.py # Ollama nomic-embed-text (embed, batch, cache)
| +-- scoring.py # ACT-R activation + FadeMem decay + cosine similarity
| +-- knowledge_graph.py # NetworkX graph, Leiden communities, PPR retrieval, PyVis viz
| +-- vault_indexer.py # Markdown vault indexing (chunk, embed, delta detect)
| +-- extraction.py # Two-pass LLM fact extraction pipeline
| +-- compaction.py # CogCanvas 6-step compression with verification
| +-- multi_agent.py # Authority chain, consumer offsets, conflict detection
| +-- security.py # 13 credential patterns, 6 injection patterns, manifests
| +-- build_kg.py # Standalone knowledge graph builder CLI
| +-- cli.py # CLI wrapper for non-MCP usage
|
+-- docs/ # Additional documentation
+-- examples/ # Usage examples
+-- tests/ # Test suite (pytest)
|
+-- pyproject.toml # Package metadata, dependencies, entry points
+-- CHANGELOG.md # Release history
+-- LICENSE # MIT License
+-- README.md # This file
Entry Points
Defined in pyproject.toml:
| Command | Target | Purpose |
|---|---|---|
memory-v2 | memory_v2.cli:main | CLI tool |
memory-v2-server | memory_v2.server:run | MCP server |
memory-v2-build-kg | memory_v2.build_kg:main | Knowledge graph builder |
Related Work
memory-v2 draws on and is informed by a growing body of work on memory systems for language model agents:
- HippoRAG (2024) -- Neurobiological RAG architecture using knowledge graphs and Personalized PageRank for retrieval. memory-v2's
graph_searchtool implements this retrieval pattern. arXiv. - A-MEM (NeurIPS 2025 Workshop) -- Agentic memory framework exploring structured memory for LLM agents.
- CogCanvas -- Cognitive compression framework for agents. memory-v2's compaction pipeline adapts its "extract, delete, summarize, verify" methodology.
- Focus (arXiv:2601.07190) -- Active context compression for large language models.
- ReadAgent -- Reading-focused memory agent for long documents.
- AgeMem -- Age-aware memory management for language agents.
- EverMemOS -- Persistent memory operating system for agents.
- Memoria -- Memory-augmented agent framework.
- PlugMem -- Pluggable memory modules for LLM agents.
- AriGraph -- Graph-structured memory for task-oriented agents.
- Zep / Graphiti -- Production memory infrastructure for AI applications, using knowledge graphs.
- TITANS -- Training large language models with memory layers.
References
-
Anderson, J.R. & Schooler, L.J. (1991). Reflections of the environment in memory. Psychological Science, 2(6), 396--408.
-
Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., & Qin, Y. (2004). An integrated theory of the mind. Psychological Review, 111(4), 1036--1060.
-
Ebbinghaus, H. (1885/1913). Memory: A Contribution to Experimental Psychology. (Trans. H.A. Ruger & C.E. Bussenius). New York: Teachers College, Columbia University.
-
Cormack, G.V., Clarke, C.L.A., & Buettcher, S. (2009). Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods. SIGIR '09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval.
-
HippoRAG: Neurobiological RAG Architecture (2024). arXiv preprint.
-
A-MEM: Agentic Memory. NeurIPS 2025 Workshop.
-
CogCanvas: Cognitive Compression for Agents.
-
Focus: Active Context Compression. arXiv:2601.07190.
-
Traag, V.A., Waltman, L., & van Eck, N.J. (2019). From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9, 5233.
Built With
| Component | Role | Why |
|---|---|---|
| Ollama | Local LLM inference (embedding + extraction) | Zero API keys, zero data exfiltration, runs on consumer hardware |
| SQLite | Core database | Single file, zero config, WAL mode for concurrent reads, universally deployed |
| sqlite-vec | Vector similarity search | Embeds ANN search directly in SQLite, no separate vector DB process |
| SQLite FTS5 | BM25 full-text search | Built into SQLite, auto-synced via triggers, battle-tested BM25 implementation |
| FastMCP | MCP server framework | Clean decorator-based tool registration, handles stdio transport |
| NetworkX | Knowledge graph | Mature graph library, built-in PageRank, serializable to pickle |
| python-igraph + leidenalg | Community detection | Leiden algorithm guarantees well-connected communities |
| NumPy | Vector operations | Fast cosine similarity, embedding serialization |
| PyVis | Graph visualization | Interactive HTML output for knowledge graph exploration |
License
MIT License. Copyright (c) 2026 Haustorium12.
See LICENSE for full text.
memory-v2 was built because AI assistants deserve memories that actually work like memories.
Related Servers
CData Reckon
A read-only MCP server for Reckon, enabling LLMs to query live data using the CData JDBC Driver.
Scite
Answers grounded in science
Unofficial ChEMBL MCP Server
Access the ChEMBL chemical database for drug discovery, chemical informatics, and bioactivity research using specialized tools via its REST API.
FDIC BankFind MCP Server
Provides structured U.S. banking data from the FDIC BankFind API for AI tools and workflows.
SQLite
MCP server for SQLite files. Supports Datasette-compatible metadata!
CData Connect Cloud
Query and manage data through CData Connect Cloud, providing a unified interface to various data sources.
MCP KQL Server
Execute KQL queries using Azure authentication. Requires Azure CLI login.
Fantasy Premier League
Access Fantasy Premier League (FPL) data and tools, including player information, team details, and gameweek data.
Act! CRM MCP Server
A read-only MCP server for Act! CRM, enabling LLMs to query live data using the CData JDBC Driver.
Tiger MCP
Control Postgres database services running on Tiger Cloud with TimescaleDB and other PostgreSQL extensions.