MCP for Docs

Automatically downloads and converts documentation from various sources into organized markdown files.

mcp-for-docs

GitHub Status Platform License Version

An MCP (Model Context Protocol) server that automatically downloads and converts documentation from various sources into organized markdown files.

Overview

mcp-for-docs is designed to crawl documentation websites, convert their content to markdown format, and organize them in a structured directory system. It can also generate condensed cheat sheets from the downloaded documentation.

Features

  • ๐Ÿ•ท๏ธ Smart Documentation Crawler: Automatically crawls documentation sites with configurable depth
  • ๐Ÿ“ HTML to Markdown Conversion: Preserves code blocks, tables, and formatting
  • ๐Ÿ“ Automatic Categorization: Intelligently organizes docs into tools/APIs categories
  • ๐Ÿ“„ Cheat Sheet Generator: Creates condensed reference guides from documentation
  • ๐Ÿ” Smart Discovery System: Automatically detects existing documentation before crawling
  • ๐Ÿš€ Local-First: Uses existing downloaded docs when available
  • โšก Rate Limiting: Respects server limits and robots.txt
  • โœ… User Confirmation: Prevents accidental regeneration of existing content
  • โš™๏ธ Comprehensive Configuration: JSON-based configuration with environment variable overrides
  • ๐Ÿงช Test Suite: 94 tests covering core functionality

Installation

Prerequisites

  • Node.js 18+
  • npm or yarn
  • Claude Desktop or Claude Code CLI

Setup

  1. Clone the repository:
git clone https://github.com/shayonpal/mcp-for-docs.git
cd mcp-for-docs
  1. Install dependencies:
npm install
  1. Build the project:
npm run build
  1. Add to your MCP configuration:

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "mcp-for-docs": {
      "command": "node",
      "args": ["/path/to/mcp-for-docs/dist/index.js"],
      "env": {}
    }
  }
}

For Claude Code CLI (~/.claude.json):

{
  "mcpServers": {
    "mcp-for-docs": {
      "command": "node",
      "args": ["/path/to/mcp-for-docs/dist/index.js"],
      "env": {}
    }
  }
}

Usage

Crawling Documentation

To download documentation from a website:

await crawl_documentation({
  url: "https://docs.n8n.io/",
  max_depth: 3,           // Optional, defaults to 3
  force_refresh: false    // Optional, set to true to regenerate existing docs
});

The tool will first check for existing documentation and show you what's already available. To regenerate existing content, use force_refresh: true.

The documentation will be saved to:

  • Tools: /Users/shayon/DevProjects/~meta/docs/tools/[tool-name]/
  • APIs: /Users/shayon/DevProjects/~meta/docs/apis/[api-name]/

Generating Cheat Sheets

To create a cheat sheet from documentation:

await generate_cheatsheet({
  url: "https://docs.anthropic.com/",
  use_local: true,          // Use local files if available (default)
  force_regenerate: false   // Optional, set to true to regenerate existing cheatsheets
});

Cheat sheets are saved to: /Users/shayon/DevProjects/~meta/docs/cheatsheets/

The tool will check for existing cheatsheets and show you what's already available. To regenerate existing content, use force_regenerate: true.

Listing Downloaded Documentation

To see what documentation is available locally:

await list_documentation({
  category: "all",  // Options: "tools", "apis", "all"
  include_stats: true
});

Supported Documentation Sites

The server has been tested with:

  • n8n documentation
  • Anthropic API docs
  • Obsidian Tasks plugin docs
  • Apple Swift documentation

Most documentation sites following standard patterns should work automatically.

Recent Updates

  • Configuration System (v0.4.0): Added comprehensive JSON-based configuration with environment variable support
  • Smart Discovery: Automatically finds and reports existing documentation before crawling
  • Improved Conversion: Fixed HTML to Markdown issues including table formatting and inline code preservation
  • Dynamic Categorization: Intelligent detection of tools vs APIs based on URL patterns and content analysis
  • Test Coverage: 94 tests passing with comprehensive unit and integration testing

For detailed changes, see CHANGELOG.md.

Configuration

Initial Setup

  1. Copy the example configuration:
cp config.example.json config.json
  1. Edit config.json and update the docsBasePath for your machine:
{
  "docsBasePath": "/Users/yourusername/path/to/docs"
}

Important: The config.json file is tracked in git. When you clone this repository on a different machine, you'll need to update the docsBasePath to match that machine's directory structure.

How Documentation Organization Works

The tool automatically organizes documentation based on content analysis:

  1. You provide a URL when calling the tool (e.g., https://docs.n8n.io)
  2. The categorizer analyzes the content and determines if it's:
    • tools/ - Software tools, applications, plugins
    • apis/ - API references, SDK documentation
  3. Documentation is saved to: {docsBasePath}/{category}/{tool-name}/

For example:

  • https://docs.n8n.io โ†’ /Users/shayon/DevProjects/~meta/docs/tools/n8n/
  • https://docs.anthropic.com โ†’ /Users/shayon/DevProjects/~meta/docs/apis/anthropic/

This happens automatically - you don't need to configure anything per-site!

Configuration Options

SettingDescriptionDefault
docsBasePathWhere to store all documentationRequired - no default
crawler.defaultMaxDepthHow many levels deep to crawl3
crawler.defaultRateLimitRequests per second2
crawler.pageTimeoutPage load timeout (ms)30000
crawler.userAgentBrowser identificationMCP-for-docs/1.0
cheatsheet.maxLengthMax characters in cheatsheet10000
cheatsheet.filenameSuffixAppend to cheatsheet names-Cheatsheet.md

Multi-Machine Setup

Since config.json is tracked in git:

  1. First machine: Set your docsBasePath and commit
  2. Other machines: After cloning, update docsBasePath to match that machine
  3. Use environment variable to override without changing the file:
    export DOCS_BASE_PATH="/different/path/on/this/machine"
    

Development

# Install dependencies
npm install

# Run in development mode
npm run dev

# Run tests
npm test

# Build for production
npm run build

# Lint code
npm run lint

Architecture

  • Crawler: Uses Playwright for JavaScript-rendered pages
  • Parser: Extracts content using configurable selectors
  • Converter: Turndown library with custom rules for markdown
  • Categorizer: Smart detection of tools vs APIs
  • Storage: Organized file system structure

Known Issues

  • URL Structure Preservation (#15): Currently flattens URL structure when saving docs
  • Large Documentation Sites (#14): No document limit for very large sites
  • GitHub Repository Docs (#9): Specialized crawler for GitHub repos not yet implemented

See all open issues for the complete roadmap.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Update CHANGELOG.md
  5. Submit a pull request

License

This project is licensed under the GPL 3.0 License - see the LICENSE file for details.

Acknowledgments

Related Servers