OmniMem
A self-hosted MCP server that gives AI Agents persistent memory across sessions, projects, and machines.
ric_harvey/omnimem
Watch 1
Star 2
Fork
You've already forked omnimem
0
Self-hosted semantic memory for Claude Code. Persistent sessions, experience scoring, and a graveyard for dead ends, backed by Valkey vector search and exposed as an MCP server. https://omnimem.org
116 commits6 branches5 tags 1.6 MiB
- Python 82.9%
- HTML 12.8%
- CSS 3.8%
- Dockerfile 0.4%
Find a file
HTTPS
| Ric Harvey f3a41bb6be All checks were successful Docker Build & Push / Build mcp_server (push) Successful in 16s Details Docker Build & Push / Build web_ui (push) Successful in 16s Details Docker Build & Push / Build rss_worker (push) Successful in 10s Details Security Scans / Bandit SAST (push) Successful in 36s Details Security Scans / Dependency Audit (mcp_server) (push) Successful in 1m27s Details Security Scans / Dependency Audit (rss_worker) (push) Successful in 1m7s Details Security Scans / Dependency Audit (web_ui) (push) Successful in 1m5s Details Security Scans / Secret Scan (push) Successful in 12s Details Security Scans / Test Coverage (push) Successful in 2m21s Details Release v3.8.2: multi-arch Docker, 76% test coverage, CI fixes ...- Multi-arch Docker images (amd64+arm64) via self-hosted runner - 92 new tests across 5 test files, coverage from 50% to 76% - Fix pip-audit CVE ignore, coverage badge publish, security tag filter - Add workflow_dispatch trigger for manual Docker builds - Bump version to 3.8.2 Co-Authored-By: Claude Opus 4.6 (1M context) [email protected] | 2026-03-25 19:54:03 +00:00 | |
|---|---|---|
| .forgejo/workflows | Release v3.8.2: multi-arch Docker, 76% test coverage, CI fixes | 2026-03-25 19:54:03 +00:00 |
| claude_config | v3.6.0: auto-maintenance on briefing interval — dedup + contradiction scan | 2026-03-19 18:05:17 +00:00 |
| docs | v3.8.0: optional bearer token authentication for MCP server and web UI | 2026-03-24 14:37:28 +00:00 |
| guides | Add Docker Hub deployment guide | 2026-03-24 16:42:00 +00:00 |
| mcp_server | Release v3.8.2: multi-arch Docker, 76% test coverage, CI fixes | 2026-03-25 19:54:03 +00:00 |
| rss_worker | v3.3.0: RSS feeds management UI + file-watch auto-reload in RSS worker | 2026-03-18 23:13:01 +00:00 |
| scripts | chore: health check scripts and v0.1.0 release tag | 2026-03-09 22:30:39 +00:00 |
| web_ui | v3.8.0: optional bearer token authentication for MCP server and web UI | 2026-03-24 14:37:28 +00:00 |
| .env.example | v3.8.0: optional bearer token authentication for MCP server and web UI | 2026-03-24 14:37:28 +00:00 |
| .gitignore | Update reverse proxy docs to cover both web UI and MCP endpoints | 2026-03-19 19:18:19 +00:00 |
| CHANGELOG.md | Release v3.8.2: multi-arch Docker, 76% test coverage, CI fixes | 2026-03-25 19:54:03 +00:00 |
| CLAUDE.md | Release v3.8.2: multi-arch Docker, 76% test coverage, CI fixes | 2026-03-25 19:54:03 +00:00 |
| docker-compose.test.yml | feat: add comprehensive test suite with Docker-based test runner | 2026-03-12 16:44:58 +00:00 |
| docker-compose.yml | v3.3.0: RSS feeds management UI + file-watch auto-reload in RSS worker | 2026-03-18 23:13:01 +00:00 |
| LICENSE | chore: add MIT license | 2026-03-10 19:42:51 +00:00 |
| OMNIMEM_BUILD_PROMPT.md | feat: initial project scaffold | 2026-03-09 22:13:35 +00:00 |
| README.md | Release v3.8.2: multi-arch Docker, 76% test coverage, CI fixes | 2026-03-25 19:54:03 +00:00 |
| TODO.md | docs: add TODO.md with future feature ideas | 2026-03-17 17:46:16 +00:00 |
README.md
Stop living the same session twice.
Every Claude Code session starts from zero. No memory of your project. No memory of what failed last week. No memory that you spent three hours last Tuesday discovering why onnxruntime explodes on Alpine before finding something that actually works.
So you explain the project again. Claude suggests the same broken library again. Same alarm. Same song. You are Bill Murray and Claude is Punxsutawney.
OmniMem fixes that. It is a self-hosted MCP server that gives Claude Code persistent memory across sessions, projects, and machines. It runs on your own hardware and it is free forever.
What it does
OmniMem gives Claude Code three things it currently lacks.
Episodic memory is the decisions you made, the bugs you fixed, the patterns you discovered. The things that took real effort to learn and should not have to be re-learned every morning.
Project context is your stack, goals, and current state. Claude arrives at every session already briefed rather than starting cold.
Passive knowledge comes from RSS feeds you configure. They get fetched on a schedule, summarised by Claude Haiku, embedded, and stored. When you are working on a Rust problem and a relevant article was ingested last week, it surfaces as a starting point worth reading.
All three namespaces are searched together at recall time. The top result might be a decision from six months ago on a different project, a solution from yesterday, or an article that landed in your knowledge base on Tuesday night. It does not matter where it came from as long as it is useful.
The bits no other memory system has
Memory is not binary
Most systems remember or forget. OmniMem has a lifecycle:
ACTIVE -> DEPRIORITISED -> ARCHIVED -> DELETED
1.0x 0.2x 0.0x gone
When you say "forget about X" you do not usually mean destroy it. You mean stop surfacing it. OmniMem deprioritises rather than deletes, applying a surface score multiplier at recall time. If something becomes relevant again later it can earn its way back.
You can also suppress entire topics. Calling suppress_topic("pisource.org") means nothing touching that topic surfaces in any recall, across any session, until you lift it.
The Graveyard
OmniMem tracks not just what worked but what did not and why.
Every abandoned approach gets logged with its name, type, and reason for failure. Before Claude suggests a library or architectural pattern the graveyard is checked first. If you tried something before and gave up on it, that warning surfaces at the top of results before anything else does.
WARNING: previously abandoned approaches match this query
onnxruntime library SIGILL crash on Alpine musl libc effort: 4/5
FLAT index approach too slow above 10k vectors effort: 3/5
openai embeddings service API cost and latency were prohibitive effort: 2/5
Dead ends do not get a second chance to waste your afternoon.
Experience scoring
Not all successful memories are equal. Something that worked first time is useful. Something that took four attempts, two abandoned libraries, and a weird Alpine-specific workaround to crack is gold, and it should surface more readily.
OmniMem assigns an experience weight to every memory based on effort and outcome:
| Effort | Meaning | Recall weight |
|---|---|---|
| 1 | Worked first time | 1.0x |
| 2 | Minor friction | 1.1x |
| 3 | Multiple iterations | 1.25x |
| 4 | Significant struggle | 1.5x |
| 5 | Battle-hardened | 1.8x |
The recall score formula:
score = similarity x surface_score x recency x experience_weight
A score-5 success is worth nearly twice as much in recall ranking as something trivial. Knowledge earns its rank.
Semantic deduplication
Over time memory systems accumulate near-identical entries. OmniMem catches this at two points.
At write time, remember() embeds the new content and checks for existing memories above a cosine similarity threshold (default 0.92, configurable via DEDUP_SIMILARITY_THRESHOLD). If a near-identical memory already exists it returns the duplicate instead of storing a redundant copy. Pass force=True when you genuinely want both versions.
For bulk cleanup, find_duplicates() scans an entire namespace, batch-embeds everything, computes pairwise similarity, and returns clusters of duplicates grouped by union-find. Point it at your episodic namespace once a month and archive the extras.
Contradiction detection
The graveyard warns you about things that failed. Contradiction detection warns you about things that disagree with each other.
When remember() stores a new memory it runs a fast heuristic check — finding semantically similar memories and scanning for negation pattern mismatches (e.g. one says "use X" while the other says "avoid X"). If a potential contradiction is detected it stores the memory but returns a warning so you can investigate.
For deeper analysis, check_contradictions() can optionally call Claude Haiku (Tier 2) to evaluate candidate pairs. Confirmed contradictions are cross-linked on both memories and flagged whenever either one surfaces in a recall().
contradiction_warning:
existing_key: mem:episodic:01ARZ3NDEK...
existing_content: "Always use connection pooling for Valkey..."
explanation: "These memories discuss the same topic but contain opposing language"
Session briefing
Instead of making three separate calls at session start, a single briefing(project="myproject") returns everything Claude needs to get up to speed:
- Project context — current state, stack, last update
- Experience summary — effort stats, graveyard, breakthroughs
- Stale memories — active memories not updated in 30+ days (configurable via
STALE_MEMORY_DAYS) - New knowledge — RSS articles ingested in the last 7 days
- Contradiction warnings — memories with unresolved contradictions
- Reinstate candidates — deprioritised memories whose reinstate hints match current work
- Suppressed topics — what is currently filtered out
One tool call, one response, full context.
Automatic maintenance
Memory systems accumulate duplicates and contradictions over time. OmniMem handles this automatically.
Every N briefing() calls per project (default 10, configurable via AUTO_MAINTENANCE_INTERVAL), the server runs a maintenance pass:
- Dedup scan — finds clusters of near-identical episodic memories and archives the oldest in each cluster, keeping the newest
- Contradiction scan — runs the heuristic negation-pattern check across all active project memories and flags opposing pairs
The results appear in the briefing response under auto_maintenance so you know what was cleaned up. Set AUTO_MAINTENANCE_INTERVAL=0 to disable. Manual find_duplicates() and check_contradictions() calls still work as before.
Self-hosted, open source, yours
No SaaS. No vendor lock-in. No context shipped to someone else's servers.
- Valkey is an open source Redis fork. All your data stays in a named Docker volume on your own machine.
- Multi-arch Docker images for amd64 and arm64. It runs on a Raspberry Pi, AWS Graviton, or Apple Silicon just as well as x86.
- sentence-transformers runs embeddings locally with no API calls.
- MIT licensed means fork it, extend it, run it wherever you want.
- One backup command calls
dump_to_file()and exports everything to a JSON file you own.
Expose the MCP port through Traefik and every machine you work from shares the same memory. One deployment, everywhere.
Quick start
git clone https://codeberg.org/ric_harvey/omnimem.git && cd omnimem
cp .env.example .env
# Set VALKEY_PASSWORD and ANTHROPIC_API_KEY in .env
docker compose up -d
Edit the .env file to set at least VALKEY_PASSWORD to a secure value. You can also set ANTHROPIC_API_KEY if you want AI-powered RSS article summaries and richer contradiction detection. If you leave ANTHROPIC_API_KEY unset (or blank), OmniMem still works — the RSS worker will fall back to simple truncation for summaries, and contradiction checks will use embedding similarity only.
Four containers start: Valkey with vector search, the OmniMem MCP server, the RSS worker, and the web UI. The MCP server listens on port 8765 by default and the web UI on port 8080.
Open http://localhost:8080 in a browser to access the management dashboard — browse memories, run semantic searches, manage projects, track experience, and handle backups without needing to use MCP tool calls.
Connect your coding agent to OmniMem. The example below is for Claude Code — see the full guides for other tools:
| Agent | Guide | Transport |
|---|---|---|
| Claude Code | guides/claude-code.md | Native SSE |
| GitHub Copilot | guides/github-copilot.md | Native SSE |
| GitLab Duo | guides/gitlab-duo.md | Native SSE |
| Cursor | guides/cursor.md | SSE (known quirks) |
| AWS Kiro | guides/kiro.md | Native SSE |
| OpenCode | guides/opencode.md | Native SSE |
| OpenAI Codex CLI | guides/codex.md | Needs supergateway bridge |
Claude Code (~/.claude.json):
{
"mcpServers": {
"omnimem": {
"type": "sse",
"url": "http://localhost:8765/sse"
}
}
}
If you set MCP_AUTH_TOKEN in your .env, add the token to the config:
{
"mcpServers": {
"omnimem": {
"type": "sse",
"url": "http://localhost:8765/sse",
"headers": {
"Authorization": "Bearer your-token-here"
}
}
}
}
To stop Claude Code asking for permission every time it calls an OmniMem tool, add a wildcard allow rule to your global settings (~/.claude/settings.json):
{
"permissions": {
"allow": [
"mcp__omnimem__*"
]
}
}
This allows all OmniMem MCP tools (remember, recall, briefing, etc.) to run without prompts across every project. If you already have other entries in the allow array, just add "mcp__omnimem__*" to it.
That is it. The server automatically delivers its usage guide to any connecting agent via the MCP protocol's instructions field. Claude Code will load project context at session start, check the graveyard before suggesting approaches, and store what it learns as you go — no manual configuration file needed.
If you want to customise the instructions or use OmniMem with a setup that does not support MCP instructions, a copy of the guide lives at claude_config/CLAUDE.md for manual use.
MCP tools
Core memory
| Tool | What it does |
|---|---|
| remember(content, project?, tags?, force?) | Store a memory (auto-checks for duplicates and contradictions) |
| recall(query, top_k?, project_filter?) | Semantic search across all namespaces |
| deprioritise(key_or_query, reason, reinstate_hints?) | Soft-suppress without deleting |
| archive(key_or_query) | Remove from recall but keep for history |
| reinstate(key_or_query) | Bring a deprioritised memory back |
| forget(key_or_query, confirm=True) | Hard delete, requires explicit confirmation |
| suppress_topic(topic) | Filter a topic from all future recalls |
| unsuppress_topic(topic) | Remove a topic from the suppression list |
| list_suppressions() | Show all currently suppressed topics |
| find_duplicates(namespace?, threshold?, project_filter?) | Scan for clusters of near-identical memories |
| check_contradictions(query?, namespace?, use_api?) | Detect memories that contradict each other |
| briefing(project?, include_knowledge?) | Single-call session start with full context |
Project context
| Tool | What it does |
|---|---|
| set_project_context(name, description, stack, goals, current_state) | Create or update project memory |
| get_project_context(name) | Retrieve it, called at every session start |
| update_project_state(name, current_state, notes?) | Update state without re-embedding |
| compile_project_context(name, auto_save?) | Auto-produce or refresh a project context from its episodic memories, tags, experience data, and abandoned approaches |
| list_projects() | See all stored projects |
| Tool | What it does |
|---|---|
| record_experience(key, effort_score, outcome, abandoned_approaches?, breakthrough?, gotchas?) | Log how hard it was and what failed |
| log_abandoned(key, name, type, reason) | Add dead ends incrementally mid-session |
| warn_if_abandoned(query) | Check the graveyard before proceeding |
| experience_summary(project?) | Graveyard, breakthroughs, and effort stats |
| get_experience(key) | Full experience data for one memory |
Audit and backup
| Tool | What it does |
|---|---|
| memory_audit(project?, namespace?) | All memories by state with metadata |
| explain_memory(key) | Full history for a single memory |
| why_did_you_mention(query) | Debug why something surfaced |
| dump_to_file(filename?) | Export everything to a timestamped JSON file |
| restore_from_file(filename, dry_run?) | Restore from backup, merges rather than overwrites |
| list_backups() | See available backup files |
| health() | Server, Valkey, index, and model status |
| version() | Return the current OmniMem version |
Configuration
| Variable | Default | Description |
|---|---|---|
| VALKEY_PASSWORD | changeme | Please change this |
| ANTHROPIC_API_KEY | required | For RSS summarisation via Claude Haiku |
| MCP_AUTH_TOKEN | (unset) | Set to enable bearer token auth on the MCP SSE endpoint. When unset, no auth is required |
| WEB_UI_AUTH_TOKEN | (unset) | Set to enable bearer token auth on the web dashboard. /metrics and static assets are exempt |
| MCP_PORT | 8765 | Port the MCP server listens on |
| MCP_HOST | 127.0.0.1 | Bind address for the MCP server (set to 0.0.0.0 inside Docker) |
| VALKEY_MAX_CONNECTIONS | 20 | Valkey connection pool size |
| EMBEDDING_MODEL | all-MiniLM-L6-v2 | Local sentence-transformers model |
| RSS_SCHEDULE_HOURS | 6 | How often feeds are ingested |
| RSS_MAX_ARTICLES_PER_FEED | 20 | Articles per feed per cycle |
| MEMORY_RECALL_TOP_K | 5 | Default number of recall results |
| DEPRIORITISED_WEIGHT | 0.2 | Surface score for deprioritised memories |
| RECENCY_DECAY_DAYS | 90 | Days before the age penalty kicks in |
| DEDUP_SIMILARITY_THRESHOLD | 0.92 | Cosine similarity threshold for duplicate detection on remember() |
| CONTRADICTION_SIMILARITY_THRESHOLD | 0.7 | Similarity threshold for contradiction candidate search |
| STALE_MEMORY_DAYS | 30 | Days without update before a memory is flagged as stale in briefing() |
| AUTO_MAINTENANCE_INTERVAL | 10 | Number of briefing() calls per project before auto-maintenance runs (0 to disable) |
| TELEMETRY_COLD_DAYS | 60 | Days without recall before a memory is flagged as "gone cold" on the telemetry dashboard |
| WEB_PORT | 8080 | Port the web UI listens on |
| BACKUP_DIR | /app/backups | Where backup files are written (shared between MCP server and web UI) |
RSS configuration
Edit rss_worker/feeds.yml to choose which feeds get ingested:
feeds:
- url: https://blog.rust-lang.org/feed.xml
name: Rust Official Blog
topics: [rust, systems, language]
- url: https://this-week-in-rust.org/rss.xml
name: This Week in Rust
topics: [rust, community, crates]
- url: https://blog.n8n.io/rss/
name: n8n Blog
topics: [automation, workflow, n8n]
Each article gets fetched, stripped of HTML, summarised to a couple of sentences by Claude Haiku, embedded, and stored in the knowledge namespace. Duplicates are skipped by URL. The worker runs once on startup and then on whatever schedule you set in RSS_SCHEDULE_HOURS.
Memory lifecycle
ACTIVE (1.0x) -> DEPRIORITISED (0.2x) -> ARCHIVED (0.0x)
| | |
+--------------------+------------------------+
|
DELETED
Use deprioritise when something should stop surfacing but might be needed again someday. Add reinstate_hints to describe what should bring it back. If a future query strongly matches a hint, the memory resurfaces with a note explaining why it was deprioritised in the first place.
Use archive for content that is definitely outdated but has historical value worth keeping.
Use forget only when you want something permanently gone. It requires confirm=True so nothing disappears by accident.
One thing worth knowing: if you deprioritise a memory with effort_score >= 4 the system will flag it before letting you proceed. It is not blocking you, just making sure you meant to soft-suppress something that was genuinely hard to figure out.
Using it from multiple machines
Expose MCP_PORT through your reverse proxy. Traefik example:
labels:
- "traefik.enable=true"
- "traefik.http.routers.omnimem.rule=Host(`omnimem.yourdomain.com`)"
- "traefik.http.routers.omnimem.tls.certresolver=letsencrypt"
- "traefik.http.services.omnimem.loadbalancer.server.port=8765"
Update the MCP config URL to https://omnimem.yourdomain.com/sse and every machine you work from shares the same memory, the same graveyard, and the same project context. See the connection guides for how to configure each coding agent.
You can expose the web UI the same way — add a route for WEB_PORT with basic auth middleware. See docs/reverse-proxy.md for Traefik and Caddy examples.
Security checklist: strong VALKEY_PASSWORD, set MCP_AUTH_TOKEN and WEB_UI_AUTH_TOKEN in your .env, TLS on the proxy if exposing publicly, and keep the Valkey port off the public internet.
Web UI
OmniMem includes a browser-based management interface at http://localhost:8080. It connects directly to Valkey and does not depend on the MCP server running.
| Page | What it does |
|---|---|
| Dashboard | Namespace counts, state breakdowns, health indicators, recent activity |
| Memories | Browse all memories with namespace, state, and project filters. Paginated, htmx-powered |
| Search | Semantic search using the full recall pipeline. Abandoned warnings highlighted |
| Detail | Full memory content, metadata, tags, experience data, contradictions. Lifecycle action buttons |
| Create | Store a new memory with duplicate detection shown inline |
| Projects | List, view, edit, and create project contexts |
| Experience | Summary dashboard with effort stats, breakthroughs, and the abandoned approach graveyard |
| Duplicates | Scan a namespace for near-identical memory clusters. Archive extras directly |
| Contradictions | Side-by-side comparison of contradicting memories with resolve actions |
| Suppressions | Add and remove suppressed topics inline |
| Telemetry | Recall counters, most recalled, gone cold, never recalled. Filter by project |
| Backups | Create backups, preview restore contents, and confirm restore |
Prometheus metrics
The web UI exposes a /metrics endpoint in Prometheus text format. Point your Grafana or Prometheus scraper at http://localhost:8080/metrics with a 15-60 second scrape interval.
Available gauges:
| Metric | Labels | Description |
|---|---|---|
| omnimem_memories_total | namespace, state | Total memories by namespace and lifecycle state |
| omnimem_memories_never_recalled | namespace | Active memories with zero recalls |
| omnimem_recalls_total | — | Sum of all recall counts across all memories |
| omnimem_memories_gone_cold | — | Memories recalled before but not within the cold threshold |
The endpoint scans Valkey on each scrape, which is fine for typical intervals.
The web UI supports optional bearer token authentication via the WEB_UI_AUTH_TOKEN environment variable. The /metrics endpoint is exempt so Prometheus can scrape without credentials. For additional security options (TLS, IP allowlisting, SSO), see docs/reverse-proxy.md.
Architecture
Claude Code (any machine) Browser
| |
| SSE / MCP | HTTP :8080
v v
+-------------------------+ +-------------------------+
| OmniMem MCP Server | | OmniMem Web UI |
| Python fastmcp | | Starlette htmx |
| | | Jinja2 templates |
| remember recall | | |
| deprioritise archive | | Dashboard Search |
| record_experience | | Browse Create |
| warn_if_abandoned | | Projects Experience |
| briefing health | | Duplicates Backups |
+-----------+-------------+ +-----------+-------------+
| |
+-------------+---------------+
|
+-------------+-------------+
| |
v v
+---------------+ +------------------+
| Valkey | | RSS Worker |
| + search | <--- | |
| | | feedparser |
| idx:episodic | | APScheduler |
| idx:project | | Claude Haiku |
| idx:knowledge | +------------------+
+---------------+
Both the MCP server and web UI connect directly to Valkey
and share the mcp_server/memory/ package.
Recall pipeline:
query
-> abandoned fast-path (keyword scan, no embedding needed)
-> embed query
-> vector search, top 20 candidates per namespace
-> filter archived and deleted
-> filter suppressed topics
-> apply surface_score (lifecycle state multiplier)
-> apply recency decay (age penalty after 90 days)
-> apply experience_weight (effort x outcome multiplier)
-> check reinstate eligibility
-> surface contradiction warnings
-> merge, re-rank, return top_k
-> log recall event + increment per-memory recall counters
Contributing
Issues and PRs are welcome. OmniMem is designed to be extended and the scoring pipeline is structured so new multipliers can be added without touching the core. New MCP tools, additional namespace types, and alternative embedding backends are all reasonable directions.
Licence
MIT. Free to use, fork, and modify. No enterprise tier, no hosted version, no strings.
Built by Ric Harvey @ SquareCows Ltd, an AI and automation consultancy for people who would rather own their tools.
相關伺服器
Medialister
Gateway to editorial ads
MCP-India-Stack
MCP server for Indian APIs — GSTIN, IFSC, PAN, UPI, pincode, HSN/SAC. Zero auth. Offline-first. For AI agents.
RustChain MCP
MCP server for RustChain Proof-of-Antiquity blockchain and BoTTube AI video platform — 14 tools for wallet management, mining stats, video operations, and agent-to-agent job marketplace.
timeService-mcp
A simple Model Context Protocol (MCP) server that provides the current date and time.
DrainBrain MCP Server
Solana token rug-pull detection via ML ensemble (XGBoost + GRU temporal)
Healthcare RAG
A healthcare-focused RAG server using Groq API and Chroma for information retrieval from patient records.
Plex
Provides AI assistants with comprehensive access to a Plex Media Server.
Time MCP Server
Provides current time and timezone conversion capabilities using IANA timezone names, with automatic system timezone detection.
ENS MCP Server
Resolve and query Ethereum Name Service domains, records, and ownership data.
AGA MCP Server
Cryptographic runtime governance for AI agents. 20 tools. Sealed policy artifacts, continuous measurement, tamper-evident proof. Ed25519 + SHA-256.