ollama-handoff MCP Server

Offload cheap work from your AI agent to a local Ollama model — summaries, drafts, extractions, first-pass reviews — at zero cloud cost.

Documentation

ollama-handoff

An MCP server that offloads cheap work from your cloud LLM agent to a local Ollama model.

CI PyPI Python MCP License: MIT

Your frontier model (Claude, GPT, etc.) is brilliant and metered. A lot of the work it gets handed — summarizing a log, drafting a commit message, pulling every URL out of a file, a quick first-pass code review — doesn't need frontier reasoning at all. ollama-handoff exposes your local Ollama instance as a handful of purpose-built MCP tools, so your agent can route that work to a model on your own GPU — at zero cloud cost — and spend its (paid) reasoning budget on the things that actually need it.

This isn't a generic "wrap the Ollama API" server. Each tool ships with a baked-in system prompt and a description written for the calling agent, so the agent knows when to hand off and gets a tuned result back without re-stating instructions every call.


Why you'd want this

  • 💸 Spend less. Routine offloads run locally and bill nothing.
  • Keep the big model focused. Summaries, extractions, and drafts don't eat its context or your budget.
  • 🧠 Tuned, not raw. summarize_local, code_review_local, draft_commit_message_local, and extract_local come with reviewer/summarizer/extractor system prompts already dialed in.
  • 🔌 Drop-in. One MCP registration; works with Claude Code, Claude Desktop, Cursor, and any MCP client.
  • 🪶 Tiny & auditable. Two dependencies (mcp, httpx), fully typed, unit-tested, no telemetry.

Requirements

  • Ollama running locally (ollama serve) with at least one model pulled, e.g. ollama pull qwen2.5-coder:14b.
  • Python 3.11+ (or just uvx, which manages it for you).

Install

The fastest path is uv — no manual venv needed:

uvx ollama-handoff          # run directly
# or
pip install ollama-handoff  # then run: ollama-handoff

Claude Code

claude mcp add ollama-handoff -- uvx ollama-handoff

Claude Desktop / Cursor (mcp config block)

{
  "mcpServers": {
    "ollama-handoff": {
      "command": "uvx",
      "args": ["ollama-handoff"],
      "env": {
        "OLLAMA_DEFAULT_MODEL": "qwen2.5-coder:14b"
      }
    }
  }
}

Run with Docker

A Dockerfile is included. The server speaks MCP over stdio, so run it interactively (-i) and point it at your Ollama instance:

docker build -t ollama-handoff .
docker run --rm -i -e OLLAMA_URL=http://host.docker.internal:11434 ollama-handoff

On native Linux (no Docker Desktop), use --network=host with OLLAMA_URL=http://localhost:11434.

Tools

ToolWhat it doesWhen the agent should reach for it
ask_localOne-shot prompt to the local modelAny handoff that doesn't need frontier reasoning
chat_localMulti-turn local chatHandoffs needing more than one turn of context
summarize_localStructured summary (headline + bullets)Long files, logs, transcripts, docs
code_review_localQuick first-pass review of a diff/codeCheap pre-filter before a deep review
draft_commit_message_localConventional commit message from a diffRoutine commits
extract_localPull structured items from unstructured textURLs, function names, error codes, TODOs
list_modelsList locally available Ollama modelsDiscovery / choosing a model
server_infoReport the effective configurationDebugging setup

Configuration

All configuration is via environment variables set in your MCP registration:

VariableDefaultDescription
OLLAMA_URLhttp://localhost:11434Base URL of the Ollama server
OLLAMA_DEFAULT_MODELqwen2.5-coder:14bDefault model for handoffs
OLLAMA_NUM_CTX32768Context window in tokens
OLLAMA_KEEP_ALIVE30mHow long to keep the model resident in VRAM
OLLAMA_TIMEOUT_S600Per-request timeout, seconds

Example

Once registered, you don't call the tools yourself — your agent does. A typical exchange:

You: Summarize the errors in build.log and draft a commit for the staged fix.

Agent: (calls summarize_local(build.log, focus="errors and stack traces") and draft_commit_message_local(git diff --staged) — both run on your GPU, nothing billed) → returns the summary + commit message.

Development

git clone https://github.com/Michael-WhiteCapData/ollama-handoff
cd ollama-handoff
uv pip install -e ".[dev]"
ruff check .
pytest          # tests use httpx.MockTransport — no running Ollama required

See CONTRIBUTING.md. Contributions welcome — especially new specialized handoff tools.

License

MIT © Michael Tierney