Context Crumb MCP Server

Compresses long files, prompt inputs, and MCP catalog descriptions into denser context for LLM agents while preserving the useful signal.

Documentation

ContextCrumb

Shake the crumbs out of bloated context.

ContextCrumb banner

Hugging Face model downloads PyPI downloads GitHub stars Last commit License Visitors Python >=3.10

Before / After - Quickstart - Playground - Install - CLI - Agent + MCP - Model


LLM context gets messy fast: notes, logs, issue threads, docs, research dumps, and tool descriptions all pile up until the useful signal is buried under filler.

ContextCrumb is a token-level compressor for LLM and agent workflows. It looks at text word by word and removes low-signal tokens while keeping the surviving text in the original order.

That is the idea behind the name: the context is still there, but the loose crumbs are shaken off before they reach your model. Less bloat in the prompt. More room for the parts that matter. Less wasted usage when Codex, Claude Code, or another agent processes long files repeatedly.

Try the ContextCrumb-32M Demo
No install needed. Paste text, compare the kept context, and see what gets shaken off.

Before / After

ContextCrumb is not a summarizer. It does not rewrite your document into a new explanation. It keeps the source sequence and deletes expendable words. This example uses target_keep_ratio=0.72.

Original

Agents spend context on notes, logs, tickets, docs, and tool descriptions. Those files contain useful facts, but they also carry filler phrases and repeated wording. ContextCrumb compresses the text before it reaches the model. It keeps the original order, removes low-value tokens, and leaves a shorter version with the names, actions, constraints, and sequence still intact.

Compressed

Agents spend context notes, logs, tickets, docs tool descriptions. Those files useful facts, carry filler phrases repeated wording. ContextCrumb compresses text before reaches model. keeps original order, removes low-value tokens, leaves shorter version names, actions, constraints sequence intact.

Same order. Less padding. More room for the next file. On prose-heavy agent inputs, ContextCrumb often saves around 30-70% of the context depending on how aggressively you compress and how much filler is in the source.

MetricOriginalCompressedSaved
Model tokens725220 tokens
Token budget100%72%28% fewer input tokens

What that feels like over a month

Assume your agent reads 8k-token notes, logs, tickets, research dumps, or docs before answering. This helps with API token bills, but also with subscription-based coding agents where heavy context reads can burn through usage faster.

WorkflowFiles read / dayContext saved / monthAPI cost avoided at $5 / 1M input tokensSubscription usage feel
Solo agent helper20~1.4M-3.4M tokens~$7-$17Fewer bulky reads in Codex or Claude Code
Busy project workspace200~14M-34M tokens~$72-$168More room for actual reasoning and edits
Agent-heavy team or eval loop2,000~144M-336M tokens~$720-$1,680Less usage spent processing padded files

The bigger win is usually not only the bill. It is keeping long-running agents from filling their context, turns, and subscription usage with words they did not need to carry in the first place.

Quickstart (30-second setup)

Teach your agent a small habit: compress the bloat before it enters context. ContextCrumb is meant to sit in the background as a skill, stepping in whenever a long note, doc, issue thread, research dump, or log would otherwise flood the context window and eat into your Codex or Claude Code usage.

  1. Add the skill.
npx skills add Yuchen20/Context-Crumb
  1. Select the agent you want to install it on.

The skill tells your agent when to compress text, how to preserve the useful sequence, and when exact raw text is required for things like code, configs, or direct quotes.

  1. Use ContextCrumb to compress long files instead of dropping the whole thing into context.
Use ContextCrumb to compress this long project note before you work from it.
  1. Voila: every long note, log, ticket, research dump, or doc enters context already trimmed, saving tokens and preserving more of your agent subscription for the work that matters.

Why ContextCrumb?

Use caseWhat changes
Agent file loadingCompress long notes, docs, research dumps, and logs before they hit the context window.
Prompt pipelinesShrink natural-language inputs without hand-writing summarizers.
MCP catalogsCompress verbose tool/resource descriptions while preserving names and schemas.
Local workflowsRun ONNX inference by default, with cached model files after first download.
Subscription-aware agentsSpend less Codex or Claude Code usage on repeatedly loading padded prose.
Inspection and tuningUse diff and inspect to see what was kept, deleted, and saved.

Best fit: docs, notes, issue threads, logs, research context, and other natural-language files. For source code where exact syntax matters, prefer raw file loading or use a conservative keep ratio.

Install

pip install contextcrumb

Optional extras:

pip install "contextcrumb[mcp]"
pip install "contextcrumb[serve]"
pip install "contextcrumb[torch]"

ContextCrumb uses the ONNX backend by default, so normal users do not need PyTorch or Transformers installed. Model files are cached locally after the first download.

CLI

The main agent-friendly command is load:

contextcrumb load notes.txt

It prints only compressed text by default, which makes it easy for agents, hooks, shell scripts, and prompt pipelines to capture stdout and move on. For subscription tools like Codex or Claude Code, that means fewer bulky file reads before the agent gets to the useful part.

Useful commands:

contextcrumb load notes.txt --json
contextcrumb load notes.txt --receipt
contextcrumb diff notes.txt
contextcrumb inspect notes.txt
contextcrumb stats

--receipt leaves compressed text on stdout and writes a compact savings receipt to stderr. ContextCrumb also refuses syntax-sensitive file types such as code, diffs, configs, lockfiles, scripts, SQL, and .env files unless you pass --force; forced output is only for exploratory reading, not exact edits or copy-paste commands.

diff marks deleted tokens like this:

kept words [-deleted words-] kept words

Agent + MCP

ContextCrumb includes an optional MCP stdio adapter for agent clients that can run Python tools through uvx.

pip install "contextcrumb[mcp]"

Published-package MCP config:

{
  "mcpServers": {
    "contextcrumb": {
      "command": "uvx",
      "args": [
        "--from",
        "contextcrumb[mcp]",
        "contextcrumb-mcp"
      ]
    }
  }
}

The MCP server exposes:

compress_text
compress_file

ContextCrumb also ships contextcrumb-shrink, an MCP proxy that compresses verbose catalog descriptions before an agent sees them while forwarding tool names, schemas, calls, results, and resource contents unchanged. This is useful when an agent client repeatedly spends context and subscription usage just looking at long tool descriptions.

Model

Model weights and a hosted demo are public on Hugging Face:

Roadmap

Planned for later:

  • Public docs for advanced compression modes and service deployment.
  • JavaScript or TypeScript client.
  • Hosted API experiments.
  • npm publishing.

Development

uv pip install --python .\.venv\Scripts\python.exe -e ".[dev,mcp]"
.\.venv\Scripts\python.exe -m pytest
.\.venv\Scripts\python.exe -m build

Release notes are tracked in CHANGELOG.md.

License

MIT. See LICENSE.