MCP for Docs
Automatically downloads and converts documentation from various sources into organized markdown files.
mcp-for-docs
An MCP (Model Context Protocol) server that automatically downloads and converts documentation from various sources into organized markdown files.
Overview
mcp-for-docs is designed to crawl documentation websites, convert their content to markdown format, and organize them in a structured directory system. It can also generate condensed cheat sheets from the downloaded documentation.
Features
- ๐ท๏ธ Smart Documentation Crawler: Automatically crawls documentation sites with configurable depth
- ๐ HTML to Markdown Conversion: Preserves code blocks, tables, and formatting
- ๐ Automatic Categorization: Intelligently organizes docs into tools/APIs categories
- ๐ Cheat Sheet Generator: Creates condensed reference guides from documentation
- ๐ Smart Discovery System: Automatically detects existing documentation before crawling
- ๐ Local-First: Uses existing downloaded docs when available
- โก Rate Limiting: Respects server limits and robots.txt
- โ User Confirmation: Prevents accidental regeneration of existing content
- โ๏ธ Comprehensive Configuration: JSON-based configuration with environment variable overrides
- ๐งช Test Suite: 94 tests covering core functionality
Installation
Prerequisites
- Node.js 18+
- npm or yarn
- Claude Desktop or Claude Code CLI
Setup
- Clone the repository:
git clone https://github.com/shayonpal/mcp-for-docs.git
cd mcp-for-docs
- Install dependencies:
npm install
- Build the project:
npm run build
- Add to your MCP configuration:
For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"mcp-for-docs": {
"command": "node",
"args": ["/path/to/mcp-for-docs/dist/index.js"],
"env": {}
}
}
}
For Claude Code CLI (~/.claude.json):
{
"mcpServers": {
"mcp-for-docs": {
"command": "node",
"args": ["/path/to/mcp-for-docs/dist/index.js"],
"env": {}
}
}
}
Usage
Crawling Documentation
To download documentation from a website:
await crawl_documentation({
url: "https://docs.n8n.io/",
max_depth: 3, // Optional, defaults to 3
force_refresh: false // Optional, set to true to regenerate existing docs
});
The tool will first check for existing documentation and show you what's already available. To regenerate existing content, use force_refresh: true.
The documentation will be saved to:
- Tools:
/Users/shayon/DevProjects/~meta/docs/tools/[tool-name]/ - APIs:
/Users/shayon/DevProjects/~meta/docs/apis/[api-name]/
Generating Cheat Sheets
To create a cheat sheet from documentation:
await generate_cheatsheet({
url: "https://docs.anthropic.com/",
use_local: true, // Use local files if available (default)
force_regenerate: false // Optional, set to true to regenerate existing cheatsheets
});
Cheat sheets are saved to: /Users/shayon/DevProjects/~meta/docs/cheatsheets/
The tool will check for existing cheatsheets and show you what's already available. To regenerate existing content, use force_regenerate: true.
Listing Downloaded Documentation
To see what documentation is available locally:
await list_documentation({
category: "all", // Options: "tools", "apis", "all"
include_stats: true
});
Supported Documentation Sites
The server has been tested with:
- n8n documentation
- Anthropic API docs
- Obsidian Tasks plugin docs
- Apple Swift documentation
Most documentation sites following standard patterns should work automatically.
Recent Updates
- Configuration System (v0.4.0): Added comprehensive JSON-based configuration with environment variable support
- Smart Discovery: Automatically finds and reports existing documentation before crawling
- Improved Conversion: Fixed HTML to Markdown issues including table formatting and inline code preservation
- Dynamic Categorization: Intelligent detection of tools vs APIs based on URL patterns and content analysis
- Test Coverage: 94 tests passing with comprehensive unit and integration testing
For detailed changes, see CHANGELOG.md.
Configuration
Initial Setup
- Copy the example configuration:
cp config.example.json config.json
- Edit
config.jsonand update thedocsBasePathfor your machine:
{
"docsBasePath": "/Users/yourusername/path/to/docs"
}
Important: The config.json file is tracked in git. When you clone this repository on a different machine, you'll need to update the docsBasePath to match that machine's directory structure.
How Documentation Organization Works
The tool automatically organizes documentation based on content analysis:
- You provide a URL when calling the tool (e.g.,
https://docs.n8n.io) - The categorizer analyzes the content and determines if it's:
tools/- Software tools, applications, pluginsapis/- API references, SDK documentation
- Documentation is saved to:
{docsBasePath}/{category}/{tool-name}/
For example:
https://docs.n8n.ioโ/Users/shayon/DevProjects/~meta/docs/tools/n8n/https://docs.anthropic.comโ/Users/shayon/DevProjects/~meta/docs/apis/anthropic/
This happens automatically - you don't need to configure anything per-site!
Configuration Options
| Setting | Description | Default |
|---|---|---|
docsBasePath | Where to store all documentation | Required - no default |
crawler.defaultMaxDepth | How many levels deep to crawl | 3 |
crawler.defaultRateLimit | Requests per second | 2 |
crawler.pageTimeout | Page load timeout (ms) | 30000 |
crawler.userAgent | Browser identification | MCP-for-docs/1.0 |
cheatsheet.maxLength | Max characters in cheatsheet | 10000 |
cheatsheet.filenameSuffix | Append to cheatsheet names | -Cheatsheet.md |
Multi-Machine Setup
Since config.json is tracked in git:
- First machine: Set your
docsBasePathand commit - Other machines: After cloning, update
docsBasePathto match that machine - Use environment variable to override without changing the file:
export DOCS_BASE_PATH="/different/path/on/this/machine"
Development
# Install dependencies
npm install
# Run in development mode
npm run dev
# Run tests
npm test
# Build for production
npm run build
# Lint code
npm run lint
Architecture
- Crawler: Uses Playwright for JavaScript-rendered pages
- Parser: Extracts content using configurable selectors
- Converter: Turndown library with custom rules for markdown
- Categorizer: Smart detection of tools vs APIs
- Storage: Organized file system structure
Known Issues
- URL Structure Preservation (#15): Currently flattens URL structure when saving docs
- Large Documentation Sites (#14): No document limit for very large sites
- GitHub Repository Docs (#9): Specialized crawler for GitHub repos not yet implemented
See all open issues for the complete roadmap.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Update CHANGELOG.md
- Submit a pull request
License
This project is licensed under the GPL 3.0 License - see the LICENSE file for details.
Acknowledgments
- Built with the Model Context Protocol SDK
- Uses Playwright for web scraping
- Markdown conversion powered by Turndown
Related Servers
GitHub Issue Reproduction MCP Server
An intelligent MCP server that automates the reproduction of GitHub issues for AWS CDK projects.
Credential Manager
A server for securely managing API credentials locally through the Model Context Protocol (MCP).
ChuckNorris
A specialized MCP gateway for LLM enhancement prompts and jailbreaks with dynamic schema adaptation. Provides prompts for different LLMs using an enum-based approach.
AgentPM
A planning and orchestration system for AI-driven software development.
Photoshop MCP Server
An MCP server for integrating with and automating Adobe Photoshop using the photoshop-python-api.
MCP Selenium Server
Automate web browsers using Selenium WebDriver via MCP.
MCP Inspector
A developer tool for testing and debugging MCP servers.
ADB Friend
A CLI tool for developers to manage Android devices via ADB.
MCP Todo Server
A demo Todo application server built with a clean architecture using MCPServer and JSON Placeholder.
LLMling
An MCP server with an LLMling backend that uses YAML files to configure LLM applications.