Website to Markdown MCP Server

Fetches and converts website content to Markdown with AI-powered cleanup, OpenAPI support, and stealth browsing.

🌐 Website to Markdown MCP Server

Language: English | 繁體中文

A powerful Model Context Protocol (MCP) server designed for fetching website content and converting it to Markdown format, making it easier for AI to understand and process website information.

✨ Key Features

🌟 Enhanced ProcessingπŸ“Š OpenAPI Supportβš™οΈ Smart Analysis🎯 Advanced Extraction
AI-powered content cleanupOpenAPI 3.x/Swagger 2.0Reading time calculationMain content detection
Auto ad removalProfessional validationWord count statisticsLanguage detection
Content summarizationStructured API parsingSmart retry mechanismMulti-format support

πŸ†• What's New in v1.2.0

πŸš€ Major Enhancements

FeatureStatusDescription
🧠 Enhanced Content Processorβœ…AI-powered content cleaning and extraction
πŸ“Š Smart Analyticsβœ…Word count, reading time, content summary
🌍 Language Detectionβœ…Automatic language identification
🎯 Intelligent Retryβœ…Smart retry mechanism with exponential backoff
πŸ” Stealth Browserβœ…Anti-detection browsing capabilities
⚑ Rate Limitingβœ…Built-in rate limiting and concurrency control
🧹 Content Cleanupβœ…Remove ads, navigation, and irrelevant content
πŸ“ Enhanced Markdownβœ…Support for strikethrough, underline, highlights

πŸš€ Quick Start

🎯 Method 1: NPX Installation (🌟 Recommended)

πŸ’‘ Easiest way: No local installation needed!

Step 1: Create Configuration File πŸ“„

Create a my-websites.json file:

{
  "websites": [
    {
      "name": "your_website",
      "url": "https://your-website.com",
      "description": "Your Project Website"
    },
    {
      "name": "api_docs",
      "url": "https://api.example.com/openapi.json",
      "description": "Your API Specification"
    }
  ]
}

Step 2: Configure MCP Server βš™οΈ

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "npx",
      "args": ["-y", "website-to-markdown-mcp"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-websites.json"
      }
    }
  }
}

Step 3: Restart and Test πŸ”„

  1. Restart Cursor
  2. Open Chat and use Agent mode
  3. Test command: Please list all configured websites

πŸŽ‰ Done! No installation required!


🎯 Method 2: Local Installation

πŸ’‘ Best Practice: Use this method for development or customization!

Step 1: Clone and Build

git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp
npm install
npm run build

Step 2: Configure MCP Server

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "cmd",
      "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-websites.json"
      }
    }
  }
}

πŸ”₯ Enhanced Output Features

πŸ“Š Rich Content Analysis

Every fetched content now includes:

  • πŸ“ Content Summary: AI-generated summary of the main content
  • ⏱️ Reading Time: Estimated reading time based on content length
  • πŸ”’ Word Count: Accurate word count for both English and Chinese
  • 🌍 Language Detection: Automatic language identification
  • 🎯 Content Quality Score: Assessment of content relevance

πŸ“‹ Enhanced Markdown Output

# πŸš€ Example Website

**Source**: https://example.com
**Website**: example_site - Example Website
**πŸ“Š Reading Time**: 5 minutes
**πŸ”’ Word Count**: 1,250 words
**🌍 Language**: English
**πŸ“ Summary**: This article discusses the latest developments in web technology...

---

[Enhanced Markdown content with better formatting...]

πŸ†• Complete OpenAPI/Swagger Support

πŸ”₯ Professional API Documentation

FeatureOpenAPI 3.xSwagger 2.0Description
πŸ” Auto Detectionβœ…βœ…Support JSON/YAML formats
βœ… Professional Validationβœ…βœ…Using @readme/openapi-parser
πŸ“‹ Structured Parsingβœ…βœ…Endpoints, parameters, responses
πŸ”— Reference Resolutionβœ…βœ…Auto handle $ref references
πŸ“Š Smart Summaryβœ…βœ…Generate API overview
πŸ“ Formatted Outputβœ…βœ…Readable Markdown

🌟 Pre-configured Example Websites

{
  "websites": [
    {
      "name": "petstore_openapi",
      "url": "https://petstore3.swagger.io/api/v3/openapi.json",
      "description": "πŸ• Swagger Petstore OpenAPI 3.0 Spec (Demo)"
    },
    {
      "name": "petstore_swagger",
      "url": "https://petstore.swagger.io/v2/swagger.json",
      "description": "🐱 Swagger Petstore Swagger 2.0 Spec (Demo)"
    },
    {
      "name": "github_api",
      "url": "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json",
      "description": "πŸ™ GitHub REST API OpenAPI Spec"
    }
  ]
}

πŸ“¦ Installation & Setup

πŸ› οΈ System Requirements

  • Node.js 20.18.1+ (Recommended: v22.15.0 LTS)
  • npm 10.0.0+ or yarn
  • Cursor Editor

⚠️ Important: Some dependencies require Node.js v20.18.1 or higher. Please update your Node.js version if you encounter engine compatibility warnings.

⚑ NPM Package Installation

# Global installation
npm install -g website-to-markdown-mcp

# Or use directly with npx (recommended)
npx website-to-markdown-mcp

πŸ”§ Development Setup

# 1. Clone repository
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp

# 2. Install dependencies
npm install

# 3. Build project
npm run build

πŸŽ›οΈ Advanced Configuration Options

Configuration Priority Order

graph TD
    A[πŸ” Check Environment Variable<br/>WEBSITES_CONFIG_PATH] --> B{File exists?}
    B -->|Yes| C[βœ… Load External Config File]
    B -->|No| D[πŸ” Check Environment Variable<br/>WEBSITES_CONFIG]
    D --> E{Valid JSON?}
    E -->|Yes| F[βœ… Load Embedded Config]
    E -->|No| G[πŸ” Check config.json]
    G --> H{File exists?}
    H -->|Yes| I[βœ… Load Local Config]
    H -->|No| J[πŸ”§ Use Default Config]

🎨 Configuration Method Details

πŸ“‹ Method 1: External Configuration File (🌟 Recommended)

πŸ’‘ Advantages: Easy to edit, syntax highlighting, version control friendly

  1. Create Configuration File

    # Can be placed anywhere
    touch my-api-configs.json
    
  2. Edit Configuration Content

    {
      "websites": [
        {
          "name": "my_docs",
          "url": "https://docs.example.com",
          "description": "πŸ“š My Documentation Website"
        }
      ]
    }
    
  3. Set Environment Variable

    {
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-api-configs.json"
      }
    }
    

πŸ“‹ Method 2: Embedded JSON (Backward Compatible)

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "cmd",
      "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG": "{\"websites\":[{\"name\":\"example\",\"url\":\"https://example.com\",\"description\":\"Example Website\"}]}"
      }
    }
  }
}

πŸ“‹ Method 3: Local config.json

Directly edit config.json in the project root directory:

{
  "websites": [
    {
      "name": "local_site",
      "url": "https://local.example.com",
      "description": "🏠 Local Test Website"
    }
  ]
}

πŸ”§ Available Tools

🌐 General Tools

Tool NameFunctionParametersExample
fetch_websiteFetch any websiteurl: Website URLFetch OpenAPI spec files
list_configured_websitesList configured websitesNoneView all available websites

🎯 Dedicated Tools

Each configured website automatically generates corresponding dedicated tools:

  • fetch_petstore_openapi - Fetch Petstore OpenAPI 3.0 spec
  • fetch_petstore_swagger - Fetch Petstore Swagger 2.0 spec
  • fetch_github_api - Fetch GitHub API spec
  • fetch_tailwind_css - Fetch Tailwind CSS documentation

πŸ“Š Enhanced Output Format Examples

🌐 General Website Content with Analytics

# Website Title

**Source**: https://example.com
**Website**: example_site - Example Website
**πŸ“Š Reading Time**: 3 minutes
**πŸ”’ Word Count**: 650 words
**🌍 Language**: English
**πŸ“ Summary**: This article provides a comprehensive overview of modern web development practices, covering frontend frameworks, backend technologies, and deployment strategies.

---

[Enhanced cleaned Markdown content with ads removed and main content extracted...]

πŸ“‹ OpenAPI 3.x Specification File

# πŸš€ Example API (v2.1.0)

**Source**: https://api.example.com/openapi.json
**OpenAPI Version**: 3.0.3
**Validation Status**: βœ… Valid
**πŸ“Š Processing Time**: 1.2 seconds
**πŸ”’ Endpoints**: 25 endpoints
**🌍 Server Locations**: 3 servers

---

## πŸ“‹ API Basic Information

- **API Name**: Example API
- **Version**: 2.1.0
- **OpenAPI Version**: 3.0.3
- **Description**: A powerful example API for modern applications

## 🌐 Servers

1. **https://api.example.com**
   - 🏒 Production server
2. **https://staging-api.example.com**
   - πŸ§ͺ Testing server

## πŸ› οΈ API Endpoints

Total of **25** endpoints:

### πŸ‘₯ `/users`
- **GET**: Get user list
- **POST**: Create new user

### πŸ” `/users/{id}`
- **GET**: Get specific user
- **PUT**: Update user information
- **DELETE**: Delete user

## 🧩 Components

- **Schemas**: 12 data models
- **Parameters**: 8 reusable parameters  
- **Responses**: 15 reusable responses
- **Security Schemes**: 3 security mechanisms

🎯 Usage Examples

πŸ’» Basic Usage

Please fetch the content from https://docs.example.com and convert to markdown

πŸ” OpenAPI Specification Fetching

Please use the fetch_petstore_openapi tool to fetch Petstore OpenAPI specification

πŸ“š Documentation Website Fetching

Please fetch React official documentation content

🚨 Troubleshooting

πŸ“‹ Complete Troubleshooting Guide: See TROUBLESHOOTING.md for detailed solutions to common issues.

❓ Quick Solutions

Error: npm WARN EBADENGINE Unsupported engine

Error: Cannot find module './db.json'

  • Solution 1: Clear npm cache: npm cache clean --force
  • Solution 2: Update Node.js version
  • Solution 3: Use local installation instead of npx

Q: Configuration changes not taking effect?

  • βœ… Confirm JSON format is correct
  • βœ… Restart Cursor
  • βœ… Check environment variable names

Q: JSON format errors?

  • πŸ› οΈ Use JSON Validator
  • πŸ› οΈ Confirm using double quotes
  • πŸ› οΈ Check for extra commas

πŸ” Debug Mode

Detailed logs are output to stderr at startup:

# View debug messages
npm run dev 2> debug.log

πŸ“ˆ Performance & Optimization

⚑ Performance Features

  • πŸš€ Smart Retry: Intelligent retry with exponential backoff
  • πŸ’Ύ Rate Limiting: Built-in rate limiting to prevent overload
  • 🎯 Content Filtering: Remove irrelevant content for faster processing
  • 🧹 Ad Removal: Automatic ad and popup removal
  • πŸ“Š Stealth Mode: Anti-detection browsing capabilities

πŸ›‘οΈ Security Considerations

  • πŸ”’ HTTPS websites only (recommended)
  • πŸ› οΈ Auto filter malicious scripts
  • πŸ“ Limit output content length
  • πŸ” Stealth browsing to avoid detection

πŸ“¦ Dependencies

PackageVersionPurpose
@modelcontextprotocol/sdk^1.0.0MCP Core Framework
@readme/openapi-parser^4.1.0Professional OpenAPI Parsing
axios^1.6.0HTTP Request Handling
cheerio^1.0.0HTML Parsing Engine
turndown^7.1.2HTML to Markdown
yaml^2.8.0YAML Format Support
zod^3.22.0Data Validation Framework
playwright^1.40.0Browser automation

πŸ“ Changelog

πŸŽ‰ v1.2.0 (Latest)

πŸš€ Major Feature Updates

  • ✨ Added Enhanced content processing with AI-powered cleanup
  • ✨ Added Smart analytics: word count, reading time, content summary
  • ✨ Added Language detection and multi-language support
  • ✨ Added Stealth browser capabilities for anti-detection
  • ✨ Added Built-in rate limiting and retry mechanisms
  • ✨ Added Advanced content filtering and ad removal
  • πŸ”§ Enhanced Markdown processing with more HTML element support
  • πŸ“Š Improved Output format with rich metadata
  • 🎯 Fixed Various technical issues and dependencies

🎯 v1.1.0 (Previous)

πŸš€ Major Feature Updates

  • ✨ Added Full OpenAPI 3.x/Swagger 2.0 support
  • ✨ Added JSON/YAML format auto-detection
  • ✨ Added Professional-grade spec validation and reference resolution
  • ✨ Added Version auto-adaptation mechanism
  • ✨ Added Structured API documentation summary
  • πŸ”§ Pre-configured Multiple OpenAPI/Swagger examples
  • πŸ“¦ Added NPM package distribution with npx support
  • 🎯 Enhanced Installation methods for better user experience

🎯 v1.0.0 (Stable)

  • πŸŽ‰ Initial Release
  • 🌐 Basic Functions Website content fetching
  • πŸ“ Core Functions Markdown conversion
  • βš™οΈ Configuration Support Multi-website management

🀝 Contributing

πŸ’‘ How to Contribute

  1. 🍴 Fork this project
  2. 🌟 Create feature branch (git checkout -b feature/AmazingFeature)
  3. πŸ“ Commit changes (git commit -m 'Add some AmazingFeature')
  4. πŸ“€ Push to branch (git push origin feature/AmazingFeature)
  5. πŸ”„ Open Pull Request

πŸ› Issue Reporting

Report issues on the Issues page, please include:

  • πŸ” Issue Description
  • πŸ”„ Reproduction Steps
  • πŸ’» Environment Information
  • πŸ“Έ Screenshots or Logs

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


🌟 If this project helps you, please give it a Star!

πŸ’¬ Have questions or suggestions? Feel free to open an Issue!


Made by Sun ❀️ for the Developer Community

Related Servers