ai-memory

Persistent memory for any AI assistant. Zero token cost until recall. Stores memories in local SQLite, ranks by 6-factor scoring, returns results 79% smaller than JSON. Works with Claude, ChatGPT, Grok, Cursor, Windsurf, and any MCP client.

        _
   __ _(_)      _ __ ___   ___ _ __ ___   ___  _ __ _   _
  / _` | |___  | '_ ` _ \ / _ \ '_ ` _ \ / _ \| '__| | | |
 | (_| | |___| | | | | | |  __/ | | | | | (_) | |  | |_| |
  \__,_|_|     |_| |_| |_|\___|_| |_| |_|\___/|_|   \__, |
                universal AI memory                   |___/

CI Rust License: MIT SQLite Tests MCP Crates.io Version

ai-memory is a persistent memory system for AI assistants. It works with any AI that supports MCP -- Claude, ChatGPT, Grok, Llama, and more. It stores what your AI learns in a local SQLite database, ranks memories by relevance when recalling, and auto-promotes important knowledge to permanent storage. Install it once, and every AI assistant you use remembers your architecture, your preferences, your corrections -- forever.

Zero token cost until recall. Unlike built-in memory systems (Claude Code auto-memory, ChatGPT memory) that load your entire memory into every conversation -- burning tokens and money on every message -- ai-memory uses zero context tokens until the AI explicitly calls memory_recall. Only relevant memories come back, ranked by a 6-factor scoring algorithm. TOON format (Token-Oriented Object Notation) cuts response tokens by another 40-60% by eliminating repeated field names -- 3 memories in JSON = 1,600 bytes; in TOON = 626 bytes (61% smaller); in TOON compact = 336 bytes (79% smaller). For Claude Code users: disable auto-memory ("autoMemoryEnabled": false in settings.json) and replace it with ai-memory to stop paying for 200+ lines of memory context on every single message.


Compatible AI Platforms

ai-memory integrates with any AI platform that supports the Model Context Protocol (MCP). MCP is the universal standard for connecting AI assistants to external tools and data sources.

PlatformIntegration MethodConfig FormatStatus
Claude Code (Anthropic)MCP stdioJSON (~/.claude.json or .mcp.json)Fully supported
Codex CLI (OpenAI)MCP stdioTOML (~/.codex/config.toml)Fully supported
Gemini CLI (Google)MCP stdioJSON (~/.gemini/settings.json)Fully supported
Grok CLI (xAI)MCP stdioJSON (~/.grok/user-settings.json)Deep integration
Grok API (xAI)MCP remote HTTPSAPI-levelFully supported
Cursor IDEMCP stdioJSON (~/.cursor/mcp.json)Fully supported
Windsurf (Codeium)MCP stdioJSON (~/.codeium/windsurf/mcp_config.json)Fully supported
Continue.devMCP stdioYAML (~/.continue/config.yaml)Fully supported
Llama Stack (META)MCP remote HTTPYAML / Python SDKFully supported
OpenClawMCP stdioJSON (mcp.servers in config)Fully supported
Any MCP clientMCP stdio or HTTPVariesUniversal

MCP is the primary integration layer. For AI platforms that do not yet support MCP natively, the HTTP API (20 endpoints on localhost) and the CLI (25 commands) provide universal access -- any AI, script, or automation that can make HTTP calls or run shell commands can use ai-memory.


Install in 60 Seconds

Pre-built binaries require no dependencies. Building from source needs Rust and a C compiler.

Fastest: Pre-built binary (no Rust required)

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/alphaonedev/ai-memory-mcp/main/install.sh | sh

# Ubuntu (PPA)
sudo add-apt-repository ppa:jbridger2021/ai-memory && sudo apt install ai-memory

# Fedora/RHEL (COPR)
sudo dnf copr enable alpha-one-ai/ai-memory && sudo dnf install ai-memory

# Windows (PowerShell)
irm https://raw.githubusercontent.com/alphaonedev/ai-memory-mcp/main/install.ps1 | iex

Step 1: Install Rust (skip if using pre-built binaries)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Follow the prompts, then restart your terminal (or run source ~/.cargo/env).

Step 2: From source (requires Rust)

Latest release from Crates.io:

cargo install ai-memory

Latest from the git repository:

cargo install --git https://github.com/alphaonedev/ai-memory-mcp.git

This compiles the binary and puts it in your PATH. It takes a minute or two.

Build dependencies for source builds:

  • Ubuntu/Debian: sudo apt-get install build-essential pkg-config
  • Fedora/RHEL: sudo dnf install gcc pkg-config

Step 3: Connect your AI

Configuration varies by platform. Find yours below:

Claude Code (Anthropic)

Claude Code supports three MCP configuration scopes:

ScopeFileApplies to
User (global)~/.claude.json — add mcpServers keyAll projects on your machine
Project (shared).mcp.json in project root (checked into git)Everyone on the project
Local (private)~/.claude.json — under projects."/path".mcpServersOne project, just you

User scope (recommended — works everywhere):

Add the mcpServers key to ~/.claude.json (macOS/Linux) or %USERPROFILE%\.claude.json (Windows):

{
  "mcpServers": {
    "memory": {
      "command": "ai-memory",
      "args": ["--db", "~/.claude/ai-memory.db", "mcp", "--tier", "semantic"]
    }
  }
}

Note: ~/.claude.json likely already exists with other settings. Merge the mcpServers key into the existing file — do not overwrite it.

Project scope (shared with team):

Create .mcp.json in your project root:

{
  "mcpServers": {
    "memory": {
      "command": "ai-memory",
      "args": ["--db", "~/.claude/ai-memory.db", "mcp", "--tier", "semantic"]
    }
  }
}

Windows paths: Use forward slashes or escaped backslashes in --db. Example: "--db", "C:/Users/YourName/.claude/ai-memory.db".

Tier flag: The --tier flag selects the feature tier: keyword, semantic (default), smart, or autonomous. Smart and autonomous tiers require Ollama running locally. The --tier flag must be passed in the args — the config.toml tier setting is not used when the MCP server is launched by an AI client.

Important: MCP servers are not configured in settings.json or settings.local.json — those files do not support mcpServers.

Make Claude proactively use ai-memory: Add a CLAUDE.md file to your project root with ai-memory directives. This ensures Claude recalls context at the start of every conversation and stores findings as it works. See the CLAUDE.md integration guide for a copy-paste template and placement options.

OpenAI Codex CLI

Add to ~/.codex/config.toml (global) or .codex/config.toml (project). Windows: %USERPROFILE%\.codex\config.toml. Override with CODEX_HOME env var.

[mcp_servers.memory]
command = "ai-memory"
args = ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
enabled = true

Or add via CLI: codex mcp add memory -- ai-memory --db ~/.local/share/ai-memory/memories.db mcp --tier semantic

Notes: Codex uses TOML format with underscored key mcp_servers (not camelCase, not hyphenated). Supports env (key/value pairs), env_vars (list to forward), enabled_tools, disabled_tools, startup_timeout_sec, tool_timeout_sec. Use /mcp in the TUI to view server status. See Codex MCP docs.

Google Gemini CLI

Add to ~/.gemini/settings.json (user) or .gemini/settings.json (project). Windows: %USERPROFILE%\.gemini\settings.json.

{
  "mcpServers": {
    "memory": {
      "command": "ai-memory",
      "args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"],
      "timeout": 30000
    }
  }
}

Or add via CLI: gemini mcp add memory ai-memory -- --db ~/.local/share/ai-memory/memories.db mcp --tier semantic

Notes: Avoid underscores in server names (use hyphens). Tool names are auto-prefixed as mcp_memory_<toolName>. Env vars in the env field support $VAR / ${VAR} (all platforms) and %VAR% (Windows). Gemini sanitizes sensitive patterns from inherited env unless explicitly declared. Add "trust": true to skip confirmation prompts. CLI management: gemini mcp list/remove/enable/disable. See Gemini CLI MCP docs.

Cursor IDE

Add to ~/.cursor/mcp.json (global) or .cursor/mcp.json (project). Windows: %USERPROFILE%\.cursor\mcp.json. Project config overrides global for same-named servers.

{
  "mcpServers": {
    "memory": {
      "command": "ai-memory",
      "args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
    }
  }
}

Notes: Restart Cursor after editing mcp.json. Verify server status in Settings > Tools & MCP (green dot = connected). Supports env, envFile, and ${env:VAR_NAME} interpolation (env var interpolation can be unreliable for shell profile variables — use envFile as workaround). ~40 tool limit across all MCP servers. See Cursor MCP docs.

Windsurf (Codeium)

Add to ~/.codeium/windsurf/mcp_config.json (global only — no project-level scope). Windows: %USERPROFILE%\.codeium\windsurf\mcp_config.json.

{
  "mcpServers": {
    "memory": {
      "command": "ai-memory",
      "args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
    }
  }
}

Notes: Supports ${env:VAR_NAME} interpolation in command, args, env, serverUrl, url, and headers. 100 tool limit across all MCP servers. Can also add via MCP Marketplace or Settings > Cascade > MCP Servers. See Windsurf MCP docs.

Continue.dev

Add to ~/.continue/config.yaml (user) or .continue/mcpServers/ directory in project root (per-server YAML/JSON files). Windows: %USERPROFILE%\.continue\config.yaml.

mcpServers:
  - name: memory
    command: ai-memory
    args:
      - "--db"
      - "~/.local/share/ai-memory/memories.db"
      - "mcp"
      - "--tier"
      - "semantic"

Notes: MCP tools only work in agent mode. Supports ${{ secrets.SECRET_NAME }} for secret interpolation. Project-level .continue/mcpServers/ directory auto-detects JSON configs from other tools (Claude Code, Cursor, etc.). See Continue MCP docs.

Grok CLI (AlphaOne fork — deep integration with auto-recall)

The AlphaOne fork of grok-cli has built-in ai-memory support with session-scoped MCP connections, automatic memory recall on session start, compaction summary storage, and memory-aware system prompts.

Add to ~/.grok/user-settings.json:

{
  "mcp": {
    "servers": [
      {
        "id": "ai-memory",
        "label": "AI Memory",
        "enabled": true,
        "transport": "stdio",
        "command": "ai-memory",
        "args": ["mcp", "--tier", "semantic"]
      }
    ]
  }
}

Features: Auto-recall on session start (injects relevant memories into system prompt), compaction summaries stored as mid-tier memories, MCP tools available in all modes (agent, plan, ask), session-scoped connections (no per-message cold starts). Uses --tier semantic by default (local embeddings, no Ollama required). See grok-cli docs for full setup.

xAI Grok API (API-level, remote MCP)

Grok connects to MCP servers over HTTPS (remote only, no stdio). No config file — servers are specified per API request.

ai-memory serve --host 127.0.0.1 --port 9077
# Expose via HTTPS reverse proxy (nginx, caddy, cloudflare tunnel, etc.)

Then add the MCP server to your Grok API call:

curl https://api.x.ai/v1/responses \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-3",
    "tools": [{
      "type": "mcp",
      "server_url": "https://your-server.example.com/mcp",
      "server_label": "memory",
      "server_description": "Persistent AI memory with recall and search",
      "allowed_tools": ["memory_store", "memory_recall", "memory_search"]
    }],
    "input": "What do you remember about our project?"
  }'

Requirements: HTTPS required. server_label is required. Supports Streamable HTTP and SSE transports. Optional: allowed_tools, authorization, headers. Works with xAI SDK, OpenAI-compatible Responses API, and Voice Agent API. See xAI Remote MCP docs.

META Llama (via Llama Stack)

Llama Stack registers MCP servers as toolgroups. No standardized config file path — deployment-specific.

ai-memory serve --host 127.0.0.1 --port 9077

Python SDK:

client.toolgroups.register(
    provider_id="model-context-protocol",
    toolgroup_id="mcp::memory",
    mcp_endpoint={"uri": "http://localhost:9077/sse"}
)

Or declaratively in run.yaml:

tool_groups:
  - toolgroup_id: mcp::memory
    provider_id: model-context-protocol
    mcp_endpoint:
      uri: "http://localhost:9077/sse"

Notes: Supports ${env.VAR_NAME} interpolation in run.yaml. Transport is migrating from SSE to Streamable HTTP. See Llama Stack Tools docs.

OpenClaw

Add via CLI or edit the OpenClaw config directly. Config uses mcp.servers (not mcpServers).

openclaw mcp set memory '{"command":"ai-memory","args":["--db","~/.local/share/ai-memory/memories.db","mcp","--tier","semantic"]}'

Or add to your OpenClaw config file:

{
  "mcp": {
    "servers": {
      "memory": {
        "command": "ai-memory",
        "args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
      }
    }
  }
}

Notes: OpenClaw uses mcp.servers key (not mcpServers). CLI management: openclaw mcp list, openclaw mcp show, openclaw mcp set, openclaw mcp unset. Supports stdio, remote URL, and Streamable HTTP transports. Prefer --token-file over inline secrets. See OpenClaw MCP docs.

Any other MCP client

ai-memory speaks MCP over stdio (JSON-RPC 2.0). Point your client at:

command: ai-memory
args: ["--db", "/path/to/ai-memory.db", "mcp"]

For HTTP-only clients, start the REST API:

ai-memory serve
# 20 endpoints at http://127.0.0.1:9077/api/v1/

Step 4: Done. Test it.

Restart your AI assistant. If using MCP, it now has 17 memory tools. Ask it: "Store a memory that my favorite language is Rust." Then in a new conversation, ask: "What is my favorite language?" It will remember.


What Does It Do?

AI assistants forget everything between conversations. ai-memory fixes that.

It runs as an MCP (Model Context Protocol) tool server -- a background process that your AI talks to natively. When your AI learns something important, it stores it. When it needs context, it recalls relevant memories ranked by a 6-factor scoring algorithm. Memories live in three tiers:

  • Short-term (6 hours) -- throwaway context like current debugging state
  • Mid-term (7 days) -- working knowledge like sprint goals and recent decisions
  • Long-term (permanent) -- architecture, user preferences, hard-won lessons

Memories that keep getting accessed automatically promote from mid to long-term. Each recall extends the TTL. Priority increases with usage. The system is self-curating.

Beyond MCP, ai-memory also exposes a full HTTP REST API (20 endpoints on port 9077) and a complete CLI (25 commands) for direct interaction, scripting, and integration with any AI platform or tool.


Features

Core

  • MCP tool server -- 17 tools over stdio JSON-RPC, compatible with any MCP client
  • Three-tier memory -- short (6h TTL), mid (7d TTL), long (permanent)
  • Full-text search -- SQLite FTS5 with ranked retrieval
  • Hybrid recall -- FTS5 keyword + cosine similarity with fixed 0.6 semantic / 0.4 keyword (60/40) blend weights
  • 6-factor recall scoring -- FTS relevance + priority + access frequency + confidence + tier boost + recency decay
  • Auto-promotion -- memories accessed 5+ times promote from mid to long
  • TTL extension -- each recall extends expiry (short +1h, mid +1d)
  • Priority reinforcement -- +1 every 10 accesses (max 10)
  • Contradiction detection -- warns when storing memories that conflict with existing ones
  • Deduplication -- upsert on title+namespace, tier never downgrades
  • Confidence scoring -- 0.0-1.0 certainty factored into ranking

Organization

  • Namespaces -- isolate memories per project (auto-detected from git remote)
  • Memory linking -- typed relations: related_to, supersedes, contradicts, derived_from
  • Consolidation -- merge multiple memories into a single long-term summary
  • Auto-consolidation -- group by namespace+tag, auto-merge groups above threshold
  • Contradiction resolution -- mark one memory as superseding another, demote the loser
  • Forget by pattern -- bulk delete by namespace + FTS pattern + tier
  • Source tracking -- tracks origin: user, claude, hook, api, cli, import, consolidation, system
  • Tagging -- comma-separated tags with filter support

Interfaces

  • 20 HTTP endpoints -- full REST API on 127.0.0.1:9077 (works with any AI or tool)
  • 25 CLI commands -- complete CLI with identical capabilities
  • 17 MCP tools -- native integration for any MCP-compatible AI
  • Interactive REPL shell -- recall, search, list, get, stats, namespaces, delete with color output
  • JSON output -- --json flag on all CLI commands

Operations

  • Multi-node sync -- pull, push, or bidirectional merge between database files
  • Import/Export -- full JSON roundtrip preserving memory links
  • Garbage collection -- automatic background expiry every 30 minutes
  • Graceful shutdown -- SIGTERM/SIGINT checkpoints WAL for clean exit
  • Deep health check -- verifies DB accessibility and FTS5 integrity
  • Shell completions -- bash, zsh, fish
  • Man page -- ai-memory man generates roff to stdout
  • Time filters -- --since/--until on list and search
  • Human-readable ages -- "2h ago", "3d ago" in CLI output
  • Color CLI output -- ANSI tier labels (red/yellow/green), priority bars, bold titles, cyan namespaces

Quality

  • 161 tests -- 118 unit tests across all 15 modules (db 29, mcp 12, config 9, main 9, mine 9, validate 8, reranker 7, color 6, errors 6, models 6, toon 6, embeddings 5, hnsw 4, llm 2) + 43 integration tests. 15/15 modules have unit tests — 95%+ coverage.
  • LongMemEval benchmark -- 97.8% R@5 (489/500), 99.0% R@10, 99.8% R@20 on ICLR 2025 LongMemEval-S dataset. 499/500 at R@20. Pure FTS5 keyword achieves 97.0% R@5 in 2.2 seconds (232 q/s). LLM query expansion pushes to 97.8% R@5. Zero cloud API costs. See benchmark details.
  • MCP Prompts -- recall-first and memory-workflow prompts teach AI clients to use memory proactively
  • TOON-default -- recall/list/search responses use TOON compact by default (79% smaller than JSON)
  • Criterion benchmarks -- insert, recall, search at 1K scale
  • GitHub Actions CI/CD -- fmt, clippy, test, build on Ubuntu + macOS, release on tag

ML and LLM Dependencies (semantic tier+)

  • candle-core, candle-nn, candle-transformers -- Hugging Face Candle ML framework for native Rust inference
  • hf-hub -- download models from Hugging Face Hub
  • tokenizers -- Hugging Face tokenizers for text preprocessing
  • instant-distance -- approximate nearest neighbor search
  • reqwest -- HTTP client for Ollama API communication (smart/autonomous tiers)

Architecture

ai-memory architecture diagram


Benchmark

LongMemEval benchmark results

Evaluated on the ICLR 2025 LongMemEval-S dataset (500 questions, 6 categories). Pure FTS5 keyword tier achieves 97.0% R@5 in 2.2 seconds. LLM query expansion (smart tier) pushes to 97.8% R@5. All inference runs locally — zero cloud API calls, zero cost.

TierR@5SpeedDependencies
keyword97.0%232 q/sNone
semantic97.4%45 q/sEmbedding model (~100MB)
smart97.8%12 q/sOllama + Gemma 4 E2B

Integration Methods

MCP (Primary -- for MCP-compatible AI platforms)

MCP is the recommended integration. Your AI gets 17 native memory tools with zero glue code. Configure the MCP server in your AI platform's config:

{
  "mcpServers": {
    "memory": {
      "command": "ai-memory",
      "args": ["--db", "~/.claude/ai-memory.db", "mcp"]
    }
  }
}

HTTP API (Universal -- for any AI or tool)

Start the HTTP server for REST API access. Any AI, script, or automation that can make HTTP calls can use this:

ai-memory serve
# 20 endpoints at http://127.0.0.1:9077/api/v1/

CLI (Universal -- for scripting and direct use)

The CLI works standalone or as a building block for AI integrations that run shell commands:

ai-memory store --tier long --title "Architecture decision" --content "We use PostgreSQL"
ai-memory recall "database choice"
ai-memory search "PostgreSQL"

Feature Tiers

ai-memory supports 4 feature tiers, selected at startup with ai-memory mcp --tier <tier>. Higher tiers add ML capabilities at the cost of disk and RAM:

TierRecall MethodExtra CapabilitiesApprox. Overhead
keywordFTS5 onlyBaseline 13 tools0 MB
semanticFTS5 + cosine similarity (hybrid)MiniLM-L6-v2 embeddings (384-dim), HNSW index, 14 tools~256 MB
smartHybrid + LLM query expansion+ nomic-embed-text (768-dim) + Gemma 4 E2B via Ollama: memory_expand_query, memory_auto_tag, memory_detect_contradiction, 17 tools~1 GB
autonomousHybrid + LLM expansion + cross-encoder reranking+ Gemma 4 E4B via Ollama, neural cross-encoder (ms-marco-MiniLM), memory reflection, 17 tools~4 GB

Capability Matrix

Every capability mapped to its minimum tier. Each tier includes all capabilities from the tiers below it.

Capabilitykeywordsemanticsmartautonomous
Search & Recall
FTS5 keyword searchYesYesYesYes
Semantic embedding (cosine similarity)--YesYesYes
Hybrid recall (FTS5 + cosine, 60/40 semantic/keyword blend)--YesYesYes
HNSW nearest-neighbor index--YesYesYes
LLM query expansion (memory_expand_query)----YesYes
Neural cross-encoder reranking------Yes
Memory Management
Store, update, delete, promote, linkYesYesYesYes
Manual consolidationYesYesYesYes
Auto-consolidation (LLM summary)----YesYes
Auto-tagging (memory_auto_tag)----YesYes
Contradiction detection (memory_detect_contradiction)----YesYes
Autonomous memory reflection------Yes
Models
Embedding model--MiniLM-L6-v2 (384d)nomic-embed-text (768d)nomic-embed-text (768d)
LLM----gemma4:e2b (~7.2GB)gemma4:e4b (~9.6GB)
Resources
RAM0 MB~256 MB~1 GB~4 GB
External dependenciesNoneNoneOllamaOllama
MCP tools exposed13141717

Semantic tier (default) bundles the Candle ML framework and downloads the all-MiniLM-L6-v2 model on first run (~90 MB). Smart and autonomous tiers require Ollama running locally.

Tiers gate features, not models. The --tier flag controls which tools are exposed. The LLM model is independently configurable via llm_model in ~/.config/ai-memory/config.toml. For example, run autonomous tier (all 17 tools + reranker) with the faster e2b model:

# ~/.config/ai-memory/config.toml
tier = "autonomous"        # all features enabled
llm_model = "gemma4:e2b"   # faster model (46 tok/s vs 26 tok/s for e4b)

The --tier flag must be passed in the MCP args -- the config.toml tier setting is not used when the server is launched by an AI client.

# Keyword (default)
ai-memory mcp

# Semantic -- hybrid recall with embeddings
ai-memory mcp --tier semantic

# Smart -- adds LLM-powered query expansion, auto-tagging, contradiction detection
ai-memory mcp --tier smart

# Autonomous -- adds cross-encoder reranking
ai-memory mcp --tier autonomous

The memory_capabilities tool reports the active tier, loaded models, and available capabilities at runtime.


MCP Tools

These 17 tools are available to any MCP-compatible AI when configured as an MCP server:

ToolDescription
memory_storeStore a new memory (deduplicates by title+namespace, reports contradictions)
memory_recallRecall memories relevant to a context (fuzzy OR search, ranked by 6 factors)
memory_searchSearch memories by exact keyword match (AND semantics)
memory_listList memories with optional filters (namespace, tier, tags, date range)
memory_getGet a specific memory by ID with its links
memory_updateUpdate an existing memory by ID (partial update)
memory_deleteDelete a memory by ID
memory_promotePromote a memory to long-term (permanent, clears expiry)
memory_forgetBulk delete by pattern, namespace, or tier
memory_linkCreate a typed link between two memories
memory_get_linksGet all links for a memory
memory_consolidateMerge multiple memories into one long-term summary
memory_statsGet memory store statistics
memory_capabilitiesReport active feature tier, loaded models, and available capabilities
memory_expand_queryUse LLM to expand search query into related terms (smart+ tier)
memory_auto_tagUse LLM to auto-generate tags for a memory (smart+ tier)
memory_detect_contradictionUse LLM to check if two memories contradict (smart+ tier)

HTTP API

20 endpoints on 127.0.0.1:9077. Start with ai-memory serve.

MethodEndpointDescription
GET/api/v1/healthHealth check (verifies DB + FTS5 integrity)
GET/api/v1/memoriesList memories (supports namespace, tier, tags, since, until, limit)
POST/api/v1/memoriesCreate a memory
POST/api/v1/memories/bulkBulk create memories (with limits)
GET/api/v1/memories/{id}Get a memory by ID
PUT/api/v1/memories/{id}Update a memory by ID
DELETE/api/v1/memories/{id}Delete a memory by ID
POST/api/v1/memories/{id}/promotePromote a memory to long-term
GET/api/v1/searchAND keyword search
GET/api/v1/recallRecall by context (GET with query params)
POST/api/v1/recallRecall by context (POST with JSON body)
POST/api/v1/forgetBulk delete by pattern/namespace/tier
POST/api/v1/consolidateConsolidate memories into one
POST/api/v1/linksCreate a link between memories
GET/api/v1/links/{id}Get links for a memory
GET/api/v1/namespacesList all namespaces
GET/api/v1/statsMemory store statistics
POST/api/v1/gcTrigger garbage collection
GET/api/v1/exportExport all memories + links as JSON
POST/api/v1/importImport memories + links from JSON

CLI Commands

25 commands. Run ai-memory <command> --help for details on any command.

CommandDescription
mcpRun as MCP tool server over stdio (primary integration path)
serveStart the HTTP daemon on port 9077
storeStore a new memory (deduplicates by title+namespace)
updateUpdate an existing memory by ID
recallFuzzy OR search with ranked results + auto-touch (supports --tier for hybrid recall). Max 200 items per request.
searchAND search for precise keyword matches. Max 200 items per request.
getRetrieve a single memory by ID (includes links)
listBrowse memories with filters (namespace, tier, tags, date range). Max 200 items per request.
deleteDelete a memory by ID
promotePromote a memory to long-term (clears expiry)
forgetBulk delete by pattern + namespace + tier
linkLink two memories (related_to, supersedes, contradicts, derived_from)
consolidateMerge multiple memories into one long-term summary
resolveResolve a contradiction: mark winner, demote loser
shellInteractive REPL with color output
syncSync memories between two database files (pull/push/merge)
auto-consolidateGroup memories by namespace+tag, merge groups above threshold
gcRun garbage collection on expired memories
statsOverview of memory state (counts, tiers, namespaces, links, DB size)
namespacesList all namespaces with memory counts
exportExport all memories and links as JSON
importImport memories and links from JSON (stdin)
completionsGenerate shell completions (bash, zsh, fish)
manGenerate roff man page to stdout
mineImport memories from historical conversations (Claude, ChatGPT, Slack exports)

The top-level ai-memory binary also accepts global flags:

FlagDescription
--db <path>Database path (default: ai-memory.db, or $AI_MEMORY_DB)
--jsonJSON output on all commands (machine-parseable output)

The store subcommand accepts additional flags:

FlagDescription
--source / -SWho created this memory (user, claude, hook, api, cli, import, consolidation, system). Default: cli
--expires-atRFC3339 expiry timestamp
--ttl-secsTTL in seconds (alternative to --expires-at)

The mcp subcommand accepts an additional flag:

FlagDescription
--tier <keyword|semantic|smart|autonomous>Feature tier (default: semantic). See Feature Tiers.

Recall Scoring

Every recall query ranks memories by 6 factors:

score = (fts_relevance * -1)
      + (priority * 0.5)
      + (MIN(access_count, 50) * 0.1)
      + (confidence * 2.0)
      + tier_boost
      + recency_decay
FactorWeightNotes
FTS relevance-1.0xSQLite FTS5 rank (negative = better match)
Priority0.5xUser-assigned 1-10 scale
Access count0.1xHow often recalled (capped at 50 for scoring)
Confidence2.0x0.0-1.0 certainty score
Tier boost+3.0 / +1.0 / +0.0long / mid / short
Recency decay1/(1 + days*0.1)Recent memories rank higher

Memory Tiers

TierTTLUse CaseExamples
short6 hoursThrowaway contextCurrent debugging state, temp variables, error traces
mid7 daysWorking knowledgeSprint goals, recent decisions, current branch purpose
longPermanentHard-won knowledgeArchitecture, user preferences, corrections, conventions

Automatic Behaviors

  • TTL extension on recall: short memories get +1 hour, mid memories get +1 day
  • Auto-promotion: mid-tier memories accessed 5+ times promote to long (expiry cleared)
  • Priority reinforcement: every 10 accesses, priority increases by 1 (capped at 10)
  • Contradiction detection: warns when a new memory conflicts with an existing one in the same namespace
  • Deduplication: upsert on title+namespace; tier never downgrades on update

Security

ai-memory includes hardening across all input paths:

  • Transaction safety -- all multi-step database operations use transactions; no partial writes on failure
  • FTS injection prevention -- user input is sanitized before reaching FTS5 queries; special characters are escaped
  • Error sanitization -- internal database paths and system details are stripped from error responses; clients see structured error types (NOT_FOUND, VALIDATION_FAILED, DATABASE_ERROR, CONFLICT)
  • Body size limits -- HTTP request bodies are capped at 50 MB via Axum's DefaultBodyLimit
  • Bulk operation limits -- bulk create endpoints enforce maximum batch sizes to prevent resource exhaustion
  • CORS -- permissive CORS layer enabled for localhost development workflows
  • Input validation -- every write path validates title length, content length, namespace format, source values, priority range (1-10), confidence range (0.0-1.0), tag format, tier values, relation types, and ID format
  • Link validation in sync -- all links are validated (both IDs, relation type, no self-links) before import during sync operations
  • Thread-safe color -- terminal color detection uses AtomicBool for safe concurrent access
  • Local-only HTTP -- the HTTP server binds to 127.0.0.1 by default; not exposed to the network
  • WAL mode -- SQLite Write-Ahead Logging for safe concurrent reads during writes

Documentation

GuideAudience
Installation GuideGetting it running (includes MCP setup for multiple AI platforms)
User GuideAI assistant users who want persistent memory
Developer GuideBuilding on or contributing to ai-memory
Admin GuideDeploying, monitoring, and troubleshooting
GitHub PagesVisual overview with animated diagrams

License

Copyright (c) 2026 AlphaOne LLC. All rights reserved.

Licensed under the MIT License.

THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Related Servers

NotebookLM Web Importer

Import web pages and YouTube videos to NotebookLM with one click. Trusted by 200,000+ users.

Install Chrome Extension