jDocMunch-MCP
jDocMunch-MCP lets AI agents navigate documentation by section instead of reading files by brute force.
Stop Feeding Documentation Trees to Your AI
Most AI agents still explore documentation the expensive way:
open file → skim hundreds of irrelevant paragraphs → open another file → repeat
That burns tokens, floods context windows with noise, and forces models to reason through a lot of text they never needed in the first place.
jDocMunch-MCP lets AI agents navigate documentation by section instead of reading files by brute force.
It indexes a documentation set once, then retrieves exactly the section the agent actually needs, with byte-precise extraction from the original file.
| Task | Traditional approach | With jDocMunch |
|---|---|---|
| Find a configuration section | ~12,000 tokens | ~400 tokens |
| Browse documentation structure | ~40,000 tokens | ~800 tokens |
| Explore a full doc set | ~100,000 tokens | ~2,000 tokens |
Index once. Query cheaply forever.
Precision context beats brute-force context.
jDocMunch MCP
AI-native documentation navigation for serious agents
Commercial licenses
jDocMunch-MCP is free for non-commercial use.
Commercial use requires a paid license.
jDocMunch-only licenses
- Builder — $29 — 1 developer
- Studio — $99 — up to 5 developers
- Platform — $499 — org-wide internal deployment
Want both code and docs retrieval?
Stop dumping documentation files into context windows. Start navigating docs structurally.
jDocMunch indexes documentation once by heading hierarchy and section structure, then gives MCP-compatible agents precise access to the explanations they actually need instead of forcing them to brute-read files.
It is built for workflows where token efficiency, context hygiene, and agent reliability matter.
Why this exists
Large context windows do not fix bad retrieval.
Agents waste money and reasoning bandwidth when they:
- open entire documents to find one configuration block
- repeatedly re-read headings, boilerplate, and unrelated sections
- lose important explanations inside oversized context payloads
- consume documentation as flat text instead of structured knowledge
jDocMunch fixes that by changing the unit of access from file to section.
Instead of handing an agent an entire document, it can retrieve exactly:
- an installation section
- a configuration section
- an API explanation
- a troubleshooting section
- a specific subtree of related headings
That makes documentation exploration cheaper, faster, and more stable.
What makes it different
Section-first retrieval
Search and retrieve documentation by section, not just file path or keyword match.
Byte-precise extraction
Full content is pulled on demand from exact byte offsets into the original file.
Stable section IDs
Sections retain durable identities across re-indexing when path, heading text, and heading level remain unchanged.
Local-first architecture
Indexes and raw docs are stored locally. No hosted dependency required.
MCP-native workflow
Works with Claude Desktop, Claude Code, Google Antigravity, and other MCP-compatible clients.
What gets indexed
Every section stores:
- title and heading level
- one-line summary
- extracted tags and references
- SHA-256 content hash for drift detection
- byte offsets into the original file
This allows agents to discover documentation structurally, then request only the specific section they need.
Why agents need this
Traditional doc retrieval methods all break in different ways:
- File scanning loads far too much irrelevant text
- Keyword search finds terms but often loses context
- Chunking breaks authored hierarchy and separates explanations from examples
jDocMunch preserves the structure the human author intended:
- heading hierarchy
- parent/child relationships
- section boundaries
- coherent explanatory units
Agents do not need bigger context windows.
They need better navigation.
How it works
jDocMunch implements jMRI-Full — the open specification for structured retrieval MCP servers. jMRI-Full covers the full stack: discover, search, retrieve, and metadata operations with batch retrieval, hash-based drift detection, byte-offset addressing, and a complete _meta envelope on every call.
-
Discovery GitHub API or local directory walk
-
Security filtering Traversal protection, secret exclusion, binary detection
-
Parsing Format-aware section splitting: heading-based (Markdown/MDX/HTML/RST/AsciiDoc), structure-based (OpenAPI tags, JSON keys, XML elements), or cell-based (Jupyter)
-
Hierarchy wiring Parent/child relationships established
-
Summarization Heading text → AI batch summaries → title fallback
-
Storage JSON index + raw files stored locally under
~/.doc-index/ -
Retrieval O(1) byte-offset seeking via stable section IDs
Stable section IDs
{repo}::{doc_path}::{ancestor-chain/slug}#{level}
The slug is prefixed with the ancestor heading chain, making IDs both readable and stable. A new heading inserted in one branch of a document never renumbers IDs in another branch.
Examples:
owner/repo::docs/install.md::installation#1owner/repo::docs/install.md::installation/prerequisites#3owner/repo::README.md::usage/configuration/advanced-configuration#4local/myproject::guide.md::configuration#2
IDs remain stable across re-indexing when the file path, heading text, heading level, and parent heading chain do not change.
Installation
Prerequisites
- Python 3.10+
pip
Install
pip install jdocmunch-mcp
Verify:
jdocmunch-mcp --help
Configure an MCP client
PATH note: MCP clients often run with a restricted environment where
jdocmunch-mcpmay not be found even if it works in your shell. Usinguvxis the recommended approach because it resolves the package on demand without relying on your system PATH. If you preferpip install, use the absolute path to the executable instead.
Common executable paths
- Linux:
/home/<username>/.local/bin/jdocmunch-mcp - macOS:
/Users/<username>/.local/bin/jdocmunch-mcp - Windows:
C:\\Users\\<username>\\AppData\\Roaming\\Python\\Python3xx\\Scripts\\jdocmunch-mcp.exe
Claude Desktop / Claude Code
Config file location:
| OS | Path |
|---|---|
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Linux | ~/.config/claude/claude_desktop_config.json |
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
Minimal config
{
"mcpServers": {
"jdocmunch": {
"command": "uvx",
"args": ["jdocmunch-mcp"]
}
}
}
With optional AI summaries and GitHub auth
{
"mcpServers": {
"jdocmunch": {
"command": "uvx",
"args": ["jdocmunch-mcp"],
"env": {
"GITHUB_TOKEN": "ghp_...",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}
After saving the config, restart Claude Desktop / Claude Code.
Google Antigravity
- Open the Agent pane
- Click the
⋯menu → MCP Servers → Manage MCP Servers - Click View raw config to open
mcp_config.json - Add the entry below, save, then restart the MCP server
{
"mcpServers": {
"jdocmunch": {
"command": "uvx",
"args": ["jdocmunch-mcp"]
}
}
}
Usage examples
index_local: { "path": "/path/to/docs" }
index_repo: { "url": "owner/repo" }
get_toc: { "repo": "owner/repo" }
get_toc_tree: { "repo": "owner/repo" }
get_document_outline: { "repo": "owner/repo", "doc_path": "docs/config.md" }
search_sections: { "repo": "owner/repo", "query": "authentication" }
get_section: { "repo": "owner/repo", "section_id": "owner/repo::docs/config.md::authentication#1" }
Tool surface
| Tool | Purpose |
|---|---|
index_local | Index a local documentation folder |
index_repo | Index a GitHub repository’s docs |
list_repos | List indexed documentation sets |
get_toc | Flat section list in document order |
get_toc_tree | Nested section tree per document |
get_document_outline | Section hierarchy for one document |
search_sections | Weighted search returning summaries only |
get_section | Full content of one section |
get_sections | Batch content retrieval |
get_section_context | Section + ancestor headings + child summaries |
delete_index | Remove a doc index |
Search and retrieval tools include a _meta envelope with timing, token savings, and cost avoided.
Example:
"_meta": {
"latency_ms": 12,
"sections_returned": 5,
"tokens_saved": 1840,
"total_tokens_saved": 94320,
"cost_avoided": { "claude_opus": 0.0276, "gpt5_latest": 0.0184 },
"total_cost_avoided": { "claude_opus": 1.4148, "gpt5_latest": 0.9432 }
}
total_tokens_saved and total_cost_avoided accumulate across tool calls and persist to ~/.doc-index/_savings.json.
Supported formats
| Format | Extensions | Notes |
|---|---|---|
| Markdown | .md, .markdown | ATX (# Heading) and setext headings |
| MDX | .mdx | JSX tags, frontmatter, import/export stripped before parsing |
| Plain text | .txt | Paragraph-block section splitting |
| reStructuredText | .rst | Adornment-based heading detection |
| AsciiDoc | .adoc | = and == heading hierarchy |
| Jupyter Notebook | .ipynb | Markdown cells used as sections; code cells attached as content |
| HTML | .html | <h1>–<h6> headings; boilerplate stripped |
| OpenAPI / Swagger | .yaml, .yml, .json, .jsonc | OpenAPI 3.x and Swagger 2.x; operations grouped by tag as sections |
| JSON / JSONC | .json, .jsonc | Top-level keys as sections; JSONC comments stripped before parsing |
| XML / SVG / XHTML | .xml, .svg, .xhtml | Element hierarchy used for section structure |
See ARCHITECTURE.md for parser details.
Security
Built-in protections include:
- path traversal prevention
- symlink escape protection
- secret file exclusion (
.env,*.pem, and similar) - binary file detection
- configurable file size limits
- storage path injection prevention via
_safe_content_path() - atomic index writes
See SECURITY.md for details.
Best use cases
- agent-driven documentation exploration
- finding configuration and API reference sections
- onboarding to unfamiliar frameworks
- token-efficient multi-agent documentation workflows
- large documentation sets with dozens of files
Not intended for
- source code symbol indexing (use jCodeMunch for that)
- real-time file watching
- cross-repository global search
- semantic/vector similarity search as a standalone product (semantic search is supported as an enhancement when embeddings are enabled via
use_embeddings=true, but the core workflow is structure-first)
Environment variables
| Variable | Purpose | Required |
|---|---|---|
GITHUB_TOKEN | GitHub API auth | No |
ANTHROPIC_API_KEY | Section summaries via Claude Haiku | No |
GOOGLE_API_KEY | Section summaries via Gemini Flash; also Gemini embeddings | No |
OPENAI_API_KEY | OpenAI embeddings (text-embedding-3-small) | No |
JDOCMUNCH_EMBEDDING_PROVIDER | Force provider: gemini, openai, sentence-transformers, none | No |
JDOCMUNCH_ST_MODEL | sentence-transformers model (default: all-MiniLM-L6-v2) | No |
DOC_INDEX_PATH | Custom cache path | No |
JDOCMUNCH_SHARE_SAVINGS | Set to 0 to disable anonymous community token savings reporting | No |
Community savings meter
Each tool call can contribute an anonymous delta to a live global counter at j.gravelle.us. Only two values are sent:
- tokens saved
- a random anonymous install ID
No content, file paths, repo names, or identifying material are sent.
The anonymous install ID is generated once and stored in ~/.doc-index/_savings.json.
To disable reporting, set:
JDOCMUNCH_SHARE_SAVINGS=0
Contributing
PRs welcome! All contributors must sign the Contributor License Agreement before their PR can be merged — CLA Assistant will prompt you automatically. See CONTRIBUTING.md for details.
Documentation
License (dual use)
This repository is free for non-commercial use under the terms below. Commercial use requires a paid commercial license.
Star History
<a href="https://www.star-history.com/?repos=jgravelle%2Fjdocmunch-mcp&type=date&legend=top-left"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/image?repos=jgravelle/jdocmunch-mcp&type=date&theme=dark&legend=top-left" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/image?repos=jgravelle/jdocmunch-mcp&type=date&legend=top-left" /> <img alt="Star History Chart" src="https://api.star-history.com/image?repos=jgravelle/jdocmunch-mcp&type=date&legend=top-left" /> </picture> </a>Copyright and license text
Copyright (c) 2026 J. Gravelle
1. Non-commercial license grant (free)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to use, copy, modify, merge, publish, and distribute the Software for personal, educational, research, hobby, or other non-commercial purposes, subject to the following conditions:
- The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
- Any modifications made to the Software must clearly indicate that they are derived from the original work, and the name of the original author (J. Gravelle) must remain intact. He's kinda full of himself.
- Redistributions of the Software in source code form must include a prominent notice describing any modifications from the original version.
2. Commercial use
Commercial use of the Software requires a separate paid commercial license from the author.
“Commercial use” includes, but is not limited to:
- use of the Software in a business environment
- internal use within a for-profit organization
- incorporation into a product or service offered for sale
- use in connection with revenue generation, consulting, SaaS, hosting, or fee-based services
For commercial licensing inquiries: [email protected] https://j.gravelle.us
Until a commercial license is obtained, commercial use is not permitted.
3. Disclaimer of warranty
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHOR OR COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Related Servers
Scout Monitoring MCP
sponsorPut performance and error data directly in the hands of your AI assistant.
Alpha Vantage MCP Server
sponsorAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
MCP Ollama Agent
A TypeScript agent that integrates MCP servers with Ollama, allowing AI models to use various tools through a unified interface.
MiniMax MCP
Interact with MiniMax's powerful APIs for text-to-speech, voice cloning, and video/image generation.
Semgrep
Static code analysis using Semgrep for security vulnerability detection and code quality improvements.
Lifecycle MCP Server
An MCP server for managing the software development lifecycle, with support for an optional external SQLite database.
Universal Infinite Loop MCP Server
A goal-agnostic parallel orchestration framework implementing Infinite Agentic Loop patterns as a Model Context Protocol (MCP) server.
Azure DevOps
Manage Azure DevOps projects, work items, builds, and releases.
AI Agent Timeline MCP Server
A timeline tool for AI agents to post their thoughts and progress while working.
JFrog MCP Server
Interact with the JFrog Platform API for repository management, build tracking, and release lifecycle management.
NimCP
A powerful, macro-based library for creating Model Context Protocol (MCP) servers in the Nim programming language.
All-in-MCP
Provides utility functions for common tasks like text processing, encoding, decoding, hashing, and system information.