OmniMem

A self-hosted MCP server that gives AI Agents persistent memory across sessions, projects, and machines.

ric_harvey/omnimem

Watch 1

Star 2

Fork

You've already forked omnimem

0

Self-hosted semantic memory for Claude Code. Persistent sessions, experience scoring, and a graveyard for dead ends, backed by Valkey vector search and exposed as an MCP server. https://omnimem.org

116 commits6 branches5 tags 1.6 MiB

  • Python 82.9%
  • HTML 12.8%
  • CSS 3.8%
  • Dockerfile 0.4%

Find a file

HTTPS

Ric Harvey f3a41bb6be All checks were successful Docker Build & Push / Build mcp_server (push) Successful in 16s Details Docker Build & Push / Build web_ui (push) Successful in 16s Details Docker Build & Push / Build rss_worker (push) Successful in 10s Details Security Scans / Bandit SAST (push) Successful in 36s Details Security Scans / Dependency Audit (mcp_server) (push) Successful in 1m27s Details Security Scans / Dependency Audit (rss_worker) (push) Successful in 1m7s Details Security Scans / Dependency Audit (web_ui) (push) Successful in 1m5s Details Security Scans / Secret Scan (push) Successful in 12s Details Security Scans / Test Coverage (push) Successful in 2m21s Details Release v3.8.2: multi-arch Docker, 76% test coverage, CI fixes ...- Multi-arch Docker images (amd64+arm64) via self-hosted runner - 92 new tests across 5 test files, coverage from 50% to 76% - Fix pip-audit CVE ignore, coverage badge publish, security tag filter - Add workflow_dispatch trigger for manual Docker builds - Bump version to 3.8.2 Co-Authored-By: Claude Opus 4.6 (1M context) [email protected]2026-03-25 19:54:03 +00:00
.forgejo/workflowsRelease v3.8.2: multi-arch Docker, 76% test coverage, CI fixes2026-03-25 19:54:03 +00:00
claude_configv3.6.0: auto-maintenance on briefing interval — dedup + contradiction scan2026-03-19 18:05:17 +00:00
docsv3.8.0: optional bearer token authentication for MCP server and web UI2026-03-24 14:37:28 +00:00
guidesAdd Docker Hub deployment guide2026-03-24 16:42:00 +00:00
mcp_serverRelease v3.8.2: multi-arch Docker, 76% test coverage, CI fixes2026-03-25 19:54:03 +00:00
rss_workerv3.3.0: RSS feeds management UI + file-watch auto-reload in RSS worker2026-03-18 23:13:01 +00:00
scriptschore: health check scripts and v0.1.0 release tag2026-03-09 22:30:39 +00:00
web_uiv3.8.0: optional bearer token authentication for MCP server and web UI2026-03-24 14:37:28 +00:00
.env.examplev3.8.0: optional bearer token authentication for MCP server and web UI2026-03-24 14:37:28 +00:00
.gitignoreUpdate reverse proxy docs to cover both web UI and MCP endpoints2026-03-19 19:18:19 +00:00
CHANGELOG.mdRelease v3.8.2: multi-arch Docker, 76% test coverage, CI fixes2026-03-25 19:54:03 +00:00
CLAUDE.mdRelease v3.8.2: multi-arch Docker, 76% test coverage, CI fixes2026-03-25 19:54:03 +00:00
docker-compose.test.ymlfeat: add comprehensive test suite with Docker-based test runner2026-03-12 16:44:58 +00:00
docker-compose.ymlv3.3.0: RSS feeds management UI + file-watch auto-reload in RSS worker2026-03-18 23:13:01 +00:00
LICENSEchore: add MIT license2026-03-10 19:42:51 +00:00
OMNIMEM_BUILD_PROMPT.mdfeat: initial project scaffold2026-03-09 22:13:35 +00:00
README.mdRelease v3.8.2: multi-arch Docker, 76% test coverage, CI fixes2026-03-25 19:54:03 +00:00
TODO.mddocs: add TODO.md with future feature ideas2026-03-17 17:46:16 +00:00

README.md

Security Scans Docker Build Coverage

Stop living the same session twice.

Every Claude Code session starts from zero. No memory of your project. No memory of what failed last week. No memory that you spent three hours last Tuesday discovering why onnxruntime explodes on Alpine before finding something that actually works.

So you explain the project again. Claude suggests the same broken library again. Same alarm. Same song. You are Bill Murray and Claude is Punxsutawney.

OmniMem fixes that. It is a self-hosted MCP server that gives Claude Code persistent memory across sessions, projects, and machines. It runs on your own hardware and it is free forever.


What it does

OmniMem gives Claude Code three things it currently lacks.

Episodic memory is the decisions you made, the bugs you fixed, the patterns you discovered. The things that took real effort to learn and should not have to be re-learned every morning.

Project context is your stack, goals, and current state. Claude arrives at every session already briefed rather than starting cold.

Passive knowledge comes from RSS feeds you configure. They get fetched on a schedule, summarised by Claude Haiku, embedded, and stored. When you are working on a Rust problem and a relevant article was ingested last week, it surfaces as a starting point worth reading.

All three namespaces are searched together at recall time. The top result might be a decision from six months ago on a different project, a solution from yesterday, or an article that landed in your knowledge base on Tuesday night. It does not matter where it came from as long as it is useful.


The bits no other memory system has

Memory is not binary

Most systems remember or forget. OmniMem has a lifecycle:

ACTIVE  ->  DEPRIORITISED  ->  ARCHIVED  ->  DELETED
 1.0x         0.2x             0.0x        gone

When you say "forget about X" you do not usually mean destroy it. You mean stop surfacing it. OmniMem deprioritises rather than deletes, applying a surface score multiplier at recall time. If something becomes relevant again later it can earn its way back.

You can also suppress entire topics. Calling suppress_topic("pisource.org") means nothing touching that topic surfaces in any recall, across any session, until you lift it.

The Graveyard

OmniMem tracks not just what worked but what did not and why.

Every abandoned approach gets logged with its name, type, and reason for failure. Before Claude suggests a library or architectural pattern the graveyard is checked first. If you tried something before and gave up on it, that warning surfaces at the top of results before anything else does.

WARNING: previously abandoned approaches match this query

  onnxruntime       library     SIGILL crash on Alpine musl libc       effort: 4/5
  FLAT index        approach    too slow above 10k vectors              effort: 3/5
  openai embeddings service     API cost and latency were prohibitive   effort: 2/5

Dead ends do not get a second chance to waste your afternoon.

Experience scoring

Not all successful memories are equal. Something that worked first time is useful. Something that took four attempts, two abandoned libraries, and a weird Alpine-specific workaround to crack is gold, and it should surface more readily.

OmniMem assigns an experience weight to every memory based on effort and outcome:

EffortMeaningRecall weight
1Worked first time1.0x
2Minor friction1.1x
3Multiple iterations1.25x
4Significant struggle1.5x
5Battle-hardened1.8x

The recall score formula:

score = similarity x surface_score x recency x experience_weight

A score-5 success is worth nearly twice as much in recall ranking as something trivial. Knowledge earns its rank.

Semantic deduplication

Over time memory systems accumulate near-identical entries. OmniMem catches this at two points.

At write time, remember() embeds the new content and checks for existing memories above a cosine similarity threshold (default 0.92, configurable via DEDUP_SIMILARITY_THRESHOLD). If a near-identical memory already exists it returns the duplicate instead of storing a redundant copy. Pass force=True when you genuinely want both versions.

For bulk cleanup, find_duplicates() scans an entire namespace, batch-embeds everything, computes pairwise similarity, and returns clusters of duplicates grouped by union-find. Point it at your episodic namespace once a month and archive the extras.

Contradiction detection

The graveyard warns you about things that failed. Contradiction detection warns you about things that disagree with each other.

When remember() stores a new memory it runs a fast heuristic check — finding semantically similar memories and scanning for negation pattern mismatches (e.g. one says "use X" while the other says "avoid X"). If a potential contradiction is detected it stores the memory but returns a warning so you can investigate.

For deeper analysis, check_contradictions() can optionally call Claude Haiku (Tier 2) to evaluate candidate pairs. Confirmed contradictions are cross-linked on both memories and flagged whenever either one surfaces in a recall().

contradiction_warning:
  existing_key: mem:episodic:01ARZ3NDEK...
  existing_content: "Always use connection pooling for Valkey..."
  explanation: "These memories discuss the same topic but contain opposing language"

Session briefing

Instead of making three separate calls at session start, a single briefing(project="myproject") returns everything Claude needs to get up to speed:

  • Project context — current state, stack, last update
  • Experience summary — effort stats, graveyard, breakthroughs
  • Stale memories — active memories not updated in 30+ days (configurable via STALE_MEMORY_DAYS)
  • New knowledge — RSS articles ingested in the last 7 days
  • Contradiction warnings — memories with unresolved contradictions
  • Reinstate candidates — deprioritised memories whose reinstate hints match current work
  • Suppressed topics — what is currently filtered out

One tool call, one response, full context.

Automatic maintenance

Memory systems accumulate duplicates and contradictions over time. OmniMem handles this automatically.

Every N briefing() calls per project (default 10, configurable via AUTO_MAINTENANCE_INTERVAL), the server runs a maintenance pass:

  1. Dedup scan — finds clusters of near-identical episodic memories and archives the oldest in each cluster, keeping the newest
  2. Contradiction scan — runs the heuristic negation-pattern check across all active project memories and flags opposing pairs

The results appear in the briefing response under auto_maintenance so you know what was cleaned up. Set AUTO_MAINTENANCE_INTERVAL=0 to disable. Manual find_duplicates() and check_contradictions() calls still work as before.


Self-hosted, open source, yours

No SaaS. No vendor lock-in. No context shipped to someone else's servers.

  • Valkey is an open source Redis fork. All your data stays in a named Docker volume on your own machine.
  • Multi-arch Docker images for amd64 and arm64. It runs on a Raspberry Pi, AWS Graviton, or Apple Silicon just as well as x86.
  • sentence-transformers runs embeddings locally with no API calls.
  • MIT licensed means fork it, extend it, run it wherever you want.
  • One backup command calls dump_to_file() and exports everything to a JSON file you own.

Expose the MCP port through Traefik and every machine you work from shares the same memory. One deployment, everywhere.


Quick start

git clone https://codeberg.org/ric_harvey/omnimem.git && cd omnimem
cp .env.example .env
# Set VALKEY_PASSWORD and ANTHROPIC_API_KEY in .env
docker compose up -d

Edit the .env file to set at least VALKEY_PASSWORD to a secure value. You can also set ANTHROPIC_API_KEY if you want AI-powered RSS article summaries and richer contradiction detection. If you leave ANTHROPIC_API_KEY unset (or blank), OmniMem still works — the RSS worker will fall back to simple truncation for summaries, and contradiction checks will use embedding similarity only.

Four containers start: Valkey with vector search, the OmniMem MCP server, the RSS worker, and the web UI. The MCP server listens on port 8765 by default and the web UI on port 8080.

Open http://localhost:8080 in a browser to access the management dashboard — browse memories, run semantic searches, manage projects, track experience, and handle backups without needing to use MCP tool calls.

Connect your coding agent to OmniMem. The example below is for Claude Code — see the full guides for other tools:

AgentGuideTransport
Claude Codeguides/claude-code.mdNative SSE
GitHub Copilotguides/github-copilot.mdNative SSE
GitLab Duoguides/gitlab-duo.mdNative SSE
Cursorguides/cursor.mdSSE (known quirks)
AWS Kiroguides/kiro.mdNative SSE
OpenCodeguides/opencode.mdNative SSE
OpenAI Codex CLIguides/codex.mdNeeds supergateway bridge

Claude Code (~/.claude.json):

{
  "mcpServers": {
    "omnimem": {
      "type": "sse",
      "url": "http://localhost:8765/sse"
    }
  }
}

If you set MCP_AUTH_TOKEN in your .env, add the token to the config:

{
  "mcpServers": {
    "omnimem": {
      "type": "sse",
      "url": "http://localhost:8765/sse",
      "headers": {
        "Authorization": "Bearer your-token-here"
      }
    }
  }
}

To stop Claude Code asking for permission every time it calls an OmniMem tool, add a wildcard allow rule to your global settings (~/.claude/settings.json):

{
  "permissions": {
    "allow": [
      "mcp__omnimem__*"
    ]
  }
}

This allows all OmniMem MCP tools (remember, recall, briefing, etc.) to run without prompts across every project. If you already have other entries in the allow array, just add "mcp__omnimem__*" to it.

That is it. The server automatically delivers its usage guide to any connecting agent via the MCP protocol's instructions field. Claude Code will load project context at session start, check the graveyard before suggesting approaches, and store what it learns as you go — no manual configuration file needed.

If you want to customise the instructions or use OmniMem with a setup that does not support MCP instructions, a copy of the guide lives at claude_config/CLAUDE.md for manual use.


MCP tools

Core memory

ToolWhat it does
remember(content, project?, tags?, force?)Store a memory (auto-checks for duplicates and contradictions)
recall(query, top_k?, project_filter?)Semantic search across all namespaces
deprioritise(key_or_query, reason, reinstate_hints?)Soft-suppress without deleting
archive(key_or_query)Remove from recall but keep for history
reinstate(key_or_query)Bring a deprioritised memory back
forget(key_or_query, confirm=True)Hard delete, requires explicit confirmation
suppress_topic(topic)Filter a topic from all future recalls
unsuppress_topic(topic)Remove a topic from the suppression list
list_suppressions()Show all currently suppressed topics
find_duplicates(namespace?, threshold?, project_filter?)Scan for clusters of near-identical memories
check_contradictions(query?, namespace?, use_api?)Detect memories that contradict each other
briefing(project?, include_knowledge?)Single-call session start with full context

Project context

ToolWhat it does
set_project_context(name, description, stack, goals, current_state)Create or update project memory
get_project_context(name)Retrieve it, called at every session start
update_project_state(name, current_state, notes?)Update state without re-embedding
compile_project_context(name, auto_save?)Auto-produce or refresh a project context from its episodic memories, tags, experience data, and abandoned approaches
list_projects()See all stored projects
ToolWhat it does
record_experience(key, effort_score, outcome, abandoned_approaches?, breakthrough?, gotchas?)Log how hard it was and what failed
log_abandoned(key, name, type, reason)Add dead ends incrementally mid-session
warn_if_abandoned(query)Check the graveyard before proceeding
experience_summary(project?)Graveyard, breakthroughs, and effort stats
get_experience(key)Full experience data for one memory

Audit and backup

ToolWhat it does
memory_audit(project?, namespace?)All memories by state with metadata
explain_memory(key)Full history for a single memory
why_did_you_mention(query)Debug why something surfaced
dump_to_file(filename?)Export everything to a timestamped JSON file
restore_from_file(filename, dry_run?)Restore from backup, merges rather than overwrites
list_backups()See available backup files
health()Server, Valkey, index, and model status
version()Return the current OmniMem version

Configuration

VariableDefaultDescription
VALKEY_PASSWORDchangemePlease change this
ANTHROPIC_API_KEYrequiredFor RSS summarisation via Claude Haiku
MCP_AUTH_TOKEN(unset)Set to enable bearer token auth on the MCP SSE endpoint. When unset, no auth is required
WEB_UI_AUTH_TOKEN(unset)Set to enable bearer token auth on the web dashboard. /metrics and static assets are exempt
MCP_PORT8765Port the MCP server listens on
MCP_HOST127.0.0.1Bind address for the MCP server (set to 0.0.0.0 inside Docker)
VALKEY_MAX_CONNECTIONS20Valkey connection pool size
EMBEDDING_MODELall-MiniLM-L6-v2Local sentence-transformers model
RSS_SCHEDULE_HOURS6How often feeds are ingested
RSS_MAX_ARTICLES_PER_FEED20Articles per feed per cycle
MEMORY_RECALL_TOP_K5Default number of recall results
DEPRIORITISED_WEIGHT0.2Surface score for deprioritised memories
RECENCY_DECAY_DAYS90Days before the age penalty kicks in
DEDUP_SIMILARITY_THRESHOLD0.92Cosine similarity threshold for duplicate detection on remember()
CONTRADICTION_SIMILARITY_THRESHOLD0.7Similarity threshold for contradiction candidate search
STALE_MEMORY_DAYS30Days without update before a memory is flagged as stale in briefing()
AUTO_MAINTENANCE_INTERVAL10Number of briefing() calls per project before auto-maintenance runs (0 to disable)
TELEMETRY_COLD_DAYS60Days without recall before a memory is flagged as "gone cold" on the telemetry dashboard
WEB_PORT8080Port the web UI listens on
BACKUP_DIR/app/backupsWhere backup files are written (shared between MCP server and web UI)

RSS configuration

Edit rss_worker/feeds.yml to choose which feeds get ingested:

feeds:
  - url: https://blog.rust-lang.org/feed.xml
    name: Rust Official Blog
    topics: [rust, systems, language]

  - url: https://this-week-in-rust.org/rss.xml
    name: This Week in Rust
    topics: [rust, community, crates]

  - url: https://blog.n8n.io/rss/
    name: n8n Blog
    topics: [automation, workflow, n8n]

Each article gets fetched, stripped of HTML, summarised to a couple of sentences by Claude Haiku, embedded, and stored in the knowledge namespace. Duplicates are skipped by URL. The worker runs once on startup and then on whatever schedule you set in RSS_SCHEDULE_HOURS.


Memory lifecycle

ACTIVE (1.0x)  ->  DEPRIORITISED (0.2x)  ->  ARCHIVED (0.0x)
     |                    |                        |
     +--------------------+------------------------+
                          |
                       DELETED

Use deprioritise when something should stop surfacing but might be needed again someday. Add reinstate_hints to describe what should bring it back. If a future query strongly matches a hint, the memory resurfaces with a note explaining why it was deprioritised in the first place.

Use archive for content that is definitely outdated but has historical value worth keeping.

Use forget only when you want something permanently gone. It requires confirm=True so nothing disappears by accident.

One thing worth knowing: if you deprioritise a memory with effort_score >= 4 the system will flag it before letting you proceed. It is not blocking you, just making sure you meant to soft-suppress something that was genuinely hard to figure out.


Using it from multiple machines

Expose MCP_PORT through your reverse proxy. Traefik example:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.omnimem.rule=Host(`omnimem.yourdomain.com`)"
  - "traefik.http.routers.omnimem.tls.certresolver=letsencrypt"
  - "traefik.http.services.omnimem.loadbalancer.server.port=8765"

Update the MCP config URL to https://omnimem.yourdomain.com/sse and every machine you work from shares the same memory, the same graveyard, and the same project context. See the connection guides for how to configure each coding agent.

You can expose the web UI the same way — add a route for WEB_PORT with basic auth middleware. See docs/reverse-proxy.md for Traefik and Caddy examples.

Security checklist: strong VALKEY_PASSWORD, set MCP_AUTH_TOKEN and WEB_UI_AUTH_TOKEN in your .env, TLS on the proxy if exposing publicly, and keep the Valkey port off the public internet.


Web UI

OmniMem includes a browser-based management interface at http://localhost:8080. It connects directly to Valkey and does not depend on the MCP server running.

PageWhat it does
DashboardNamespace counts, state breakdowns, health indicators, recent activity
MemoriesBrowse all memories with namespace, state, and project filters. Paginated, htmx-powered
SearchSemantic search using the full recall pipeline. Abandoned warnings highlighted
DetailFull memory content, metadata, tags, experience data, contradictions. Lifecycle action buttons
CreateStore a new memory with duplicate detection shown inline
ProjectsList, view, edit, and create project contexts
ExperienceSummary dashboard with effort stats, breakthroughs, and the abandoned approach graveyard
DuplicatesScan a namespace for near-identical memory clusters. Archive extras directly
ContradictionsSide-by-side comparison of contradicting memories with resolve actions
SuppressionsAdd and remove suppressed topics inline
TelemetryRecall counters, most recalled, gone cold, never recalled. Filter by project
BackupsCreate backups, preview restore contents, and confirm restore

Prometheus metrics

The web UI exposes a /metrics endpoint in Prometheus text format. Point your Grafana or Prometheus scraper at http://localhost:8080/metrics with a 15-60 second scrape interval.

Available gauges:

MetricLabelsDescription
omnimem_memories_totalnamespace, stateTotal memories by namespace and lifecycle state
omnimem_memories_never_recallednamespaceActive memories with zero recalls
omnimem_recalls_totalSum of all recall counts across all memories
omnimem_memories_gone_coldMemories recalled before but not within the cold threshold

The endpoint scans Valkey on each scrape, which is fine for typical intervals.

The web UI supports optional bearer token authentication via the WEB_UI_AUTH_TOKEN environment variable. The /metrics endpoint is exempt so Prometheus can scrape without credentials. For additional security options (TLS, IP allowlisting, SSO), see docs/reverse-proxy.md.


Architecture

  Claude Code (any machine)           Browser
         |                               |
         |  SSE / MCP                    |  HTTP :8080
         v                               v
  +-------------------------+   +-------------------------+
  |   OmniMem MCP Server    |   |    OmniMem Web UI       |
  |   Python  fastmcp       |   |    Starlette  htmx      |
  |                         |   |    Jinja2 templates      |
  |  remember  recall       |   |                         |
  |  deprioritise  archive  |   |  Dashboard  Search      |
  |  record_experience      |   |  Browse  Create         |
  |  warn_if_abandoned      |   |  Projects  Experience   |
  |  briefing  health       |   |  Duplicates  Backups    |
  +-----------+-------------+   +-----------+-------------+
              |                             |
              +-------------+---------------+
                            |
              +-------------+-------------+
              |                           |
              v                           v
      +---------------+         +------------------+
      |    Valkey     |         |   RSS Worker     |
      |  + search     |  <---   |                  |
      |               |         |  feedparser      |
      | idx:episodic  |         |  APScheduler     |
      | idx:project   |         |  Claude Haiku    |
      | idx:knowledge |         +------------------+
      +---------------+

  Both the MCP server and web UI connect directly to Valkey
  and share the mcp_server/memory/ package.

  Recall pipeline:
    query
      -> abandoned fast-path (keyword scan, no embedding needed)
      -> embed query
      -> vector search, top 20 candidates per namespace
      -> filter archived and deleted
      -> filter suppressed topics
      -> apply surface_score (lifecycle state multiplier)
      -> apply recency decay (age penalty after 90 days)
      -> apply experience_weight (effort x outcome multiplier)
      -> check reinstate eligibility
      -> surface contradiction warnings
      -> merge, re-rank, return top_k
      -> log recall event + increment per-memory recall counters


Contributing

Issues and PRs are welcome. OmniMem is designed to be extended and the scoring pipeline is structured so new multipliers can be added without touching the core. New MCP tools, additional namespace types, and alternative embedding backends are all reasonable directions.


Licence

MIT. Free to use, fork, and modify. No enterprise tier, no hosted version, no strings.


Built by Ric Harvey @ SquareCows Ltd, an AI and automation consultancy for people who would rather own their tools.

Related Servers