AHME MCP
Asynchronous Hierarchical Memory Engine
AHME
Asynchronous Hierarchical Memory Engine

Give your AI coding assistant a long-term memory — fully local, zero cloud, zero cost.
AHME is a local sidecar daemon that sits quietly beside your AI coding assistant. As you work, it compresses your conversation history into a dense Master Memory Block using a local Ollama model — no cloud, no tokens wasted, no context lost.
It integrates with any AI tool that supports MCP (Model Context Protocol): Antigravity, Claude Code, Kilo Code, Cursor, Windsurf, Cline/Roo, and more.
✨ How it works
Your AI conversation
│
▼ ingest_context
┌───────────────────┐
│ SQLite Queue │ ← persistent, survives restarts
└────────┬──────────┘
│ when CPU is idle
▼
┌───────────────────┐
│ Ollama Compressor│ ← local model (qwen2:1.5b, gemma3:1b, phi3…)
│ (structured JSON)│
└────────┬──────────┘
│ recursive tree merge
▼
┌───────────────────┐
│ Master Memory Block│ ← dense, token-efficient summary
└────────┬──────────┘
│
├── .ahme_memory.md (file — for any tool that reads files)
└── get_master_memory (MCP tool — for integrated tools)
Context-window replacement pattern: calling get_master_memory returns the compressed summary, clears the old data, and re-seeds the engine with the summary — so every new conversation starts from a dense checkpoint, not a blank slate.
🚀 Quick Start
Prerequisites
- Python 3.11+
- Ollama running locally
- A small model pulled:
ollama pull qwen2:1.5b(or any 1–4B model)
Install
git clone https://github.com/your-username/ahme
cd ahme
# Copy the example config and set your model
cp config.example.toml config.toml
# Install the package
pip install -e .
Configure
Open config.toml and set your Ollama model:
[ollama]
base_url = "http://localhost:11434"
model = "qwen2:1.5b" # ← change to any model you have pulled
That's the only line you need to change. Everything else is pre-configured.
🔌 Connect to your AI tool
AHME exposes three MCP tools: ingest_context, get_master_memory, and clear_context.
Option A — MCP (recommended)
Add AHME to your tool's MCP config. The exact file location varies by tool:
| Tool | Config location |
|---|---|
| Claude Code | --mcp-config .mcp.json flag, or ~/.claude/mcp.json |
| Kilo Code | VS Code settings.json → "kilocode.mcp.servers" |
| Cursor | Settings → MCP → paste JSON |
| Windsurf | ~/.windsurf/mcp.json |
| Cline / Roo | MCP Servers sidebar → Edit JSON |
| Antigravity | ~/.gemini/antigravity/mcp_config.json |
Config snippet (works everywhere):
{
"mcpServers": {
"ahme": {
"command": "python",
"args": ["-m", "ahme.mcp_server"],
"env": { "PYTHONPATH": "/absolute/path/to/ahme" }
}
}
}
A ready-made .mcp.json is included in the repo root — just copy it to where your tool expects it.
Option B — File watch (zero config)
After any compression, AHME writes .ahme_memory.md in the project directory. Reference it in any prompt:
@[.ahme_memory.md] use this as your long-term context before answering
Or set up persistent injection with .agents/instructions.md (Antigravity):
Before starting any task, read @[.ahme_memory.md] and treat it as background context.
🛠 MCP Tools Reference
| Tool | Input | Behaviour |
|---|---|---|
ingest_context | text: string | Partitions text into chunks and queues them for background compression |
get_master_memory | reset?: bool (default true) | Returns the compressed summary; if reset=true, clears the DB and re-seeds with the summary |
clear_context | — | Wipes all queued data with no return value |
Typical usage pattern
1. [After each conversation turn]
→ call ingest_context with the latest messages
2. [When approaching context limit, or starting a new session]
→ call get_master_memory
→ inject the result into your system prompt
→ the engine resets and starts accumulating again from this checkpoint
⚙️ Configuration Reference
config.example.toml — copy to config.toml:
[chunking]
chunk_size_tokens = 1500 # tokens per chunk
overlap_tokens = 150 # overlap between chunks (preserves context at boundaries)
[queue]
db_path = "ahme_queue.db" # SQLite database path (relative to config.toml)
max_retries = 3 # retry failed compressions before marking as failed
[monitor]
poll_interval_seconds = 2.0
cpu_idle_threshold_percent = 30.0 # only compress when CPU is below this %
[ollama]
base_url = "http://localhost:11434"
model = "qwen2:1.5b" # ← set this to your local model
timeout_seconds = 120
[merger]
batch_size = 5 # summaries per merge pass (lower = more frequent master updates)
[logging]
log_file = "ahme.log"
memory_file = ".ahme_memory.md"
max_bytes = 5242880 # 5 MB log rotation
backup_count = 3
🐍 Python API
If you'd rather control AHME directly from Python:
import asyncio
from ahme.api import AHME
engine = AHME("config.toml")
# Push text into the queue
engine.ingest("The user asked about Python async patterns. We discussed...")
# Run the daemon (this blocks; use asyncio.create_task for non-blocking)
asyncio.run(engine.run())
# Read the compressed memory
print(engine.master_memory)
# Stop the daemon
engine.stop()
📁 Project Structure
ahme/
├── ahme/
│ ├── __init__.py # Package marker & version
│ ├── config.py # Typed TOML config loader
│ ├── db.py # SQLite queue — enqueue, dequeue, clear, retry
│ ├── partitioner.py # Token-accurate overlapping chunker (tiktoken)
│ ├── monitor.py # CPU + lock-file idle detector (psutil)
│ ├── compressor.py # Ollama async caller → structured JSON summaries
│ ├── merger.py # Recursive batch-reduce tree → Master Memory Block
│ ├── daemon.py # Main event loop + graceful shutdown + file bridge
│ ├── api.py # Clean public Python API
│ └── mcp_server.py # MCP server — stdio & SSE transports
├── tests/ # 19 tests, all passing
├── .mcp.json # Ready-to-use MCP config
├── config.example.toml # Template config — copy to config.toml
├── pyproject.toml # pip-installable package
└── README.md
🧪 Testing
pip install -e ".[dev]"
python -m pytest tests/ -v
Expected output: 19 passed — all tests use mocks and never require a live Ollama instance.
🔑 Key Design Decisions
| Decision | Rationale |
|---|---|
| SQLite over Redis | Zero external dependencies, single-file persistence, survives crashes |
| tiktoken for chunking | Real BPE token counting prevents prompt overflow |
| 150-token overlap | Preserves context at chunk boundaries |
| CPU + lock-file gating | AHME never competes with your active AI session for GPU/CPU |
| Recursive tree merge | Scales compression with conversation length — O(log n) passes |
| JSON-only system prompt | Enforces structured output from Ollama for reliable parsing |
__file__-relative paths | Config and DB are always found regardless of working directory |
🤝 Contributing
Contributions welcome! Please open an issue before submitting large PRs.
📄 License
MIT — do whatever you like.
संबंधित सर्वर
StockSpark MCP Server
An MCP server for AI-powered vehicle dealership management.
Haiguitang (Turtle Soup) Game
An MCP server for '海龟汤' (Turtle Soup), a scenario-based reasoning puzzle game.
İzmir Ulaşım
Access real-time public transportation data for the city of İzmir, allowing AI assistants to query routes and schedules.
Sweeppea MCP
Manage sweepstakes, participants, and winner drawings with legal compliance in the US and Canada directly from your AI agent. Access requires an active Sweeppea subscription and API Key.
Suppr-MCP (超能文献)
Suppr - AI-powered document translation and academic search service. Supports high-quality translation of PDF, DOCX, PPTX and other formats in 11 languages with optimized mathematical formula handling. Includes PubMed-integrated intelligent literature search for researchers. https://suppr.wilddata.cn/
Upstox MCP Server
A Model Context Protocol (MCP) server that integrates with the Upstox Trading API, enabling AI agents like Claude to securely access Indian stock market data, perform technical analysis, and view account information in read-only mode.
Kite Trading
A server for performing trading operations using the Kite Connect API.
prediction-market-mcp
A simple MCP server that grabs prediction market data from polymarket, PredictIt, & Kalshi.
Stock Market Tracker
MCP server for advanced financial analysis, stock monitoring, and real-time market intelligence to support buy/sell decisions
Brokerage-MCP
An MCP server for brokerage functionalities, built with the MCP framework.