Website to Markdown MCP Server
Fetches and converts website content to Markdown with AI-powered cleanup, OpenAPI support, and stealth browsing.
π Website to Markdown MCP Server
Language: English | ηΉι«δΈζ
A powerful Model Context Protocol (MCP) server designed for fetching website content and converting it to Markdown format, making it easier for AI to understand and process website information.
β¨ Key Features
| π Enhanced Processing | π OpenAPI Support | βοΈ Smart Analysis | π― Advanced Extraction |
|---|---|---|---|
| AI-powered content cleanup | OpenAPI 3.x/Swagger 2.0 | Reading time calculation | Main content detection |
| Auto ad removal | Professional validation | Word count statistics | Language detection |
| Content summarization | Structured API parsing | Smart retry mechanism | Multi-format support |
π What's New in v1.2.0
π Major Enhancements
| Feature | Status | Description |
|---|---|---|
| π§ Enhanced Content Processor | β | AI-powered content cleaning and extraction |
| π Smart Analytics | β | Word count, reading time, content summary |
| π Language Detection | β | Automatic language identification |
| π― Intelligent Retry | β | Smart retry mechanism with exponential backoff |
| π Stealth Browser | β | Anti-detection browsing capabilities |
| β‘ Rate Limiting | β | Built-in rate limiting and concurrency control |
| π§Ή Content Cleanup | β | Remove ads, navigation, and irrelevant content |
| π Enhanced Markdown | β | Support for strikethrough, underline, highlights |
π Quick Start
π― Method 1: NPX Installation (π Recommended)
π‘ Easiest way: No local installation needed!
Step 1: Create Configuration File π
Create a my-websites.json file:
{
"websites": [
{
"name": "your_website",
"url": "https://your-website.com",
"description": "Your Project Website"
},
{
"name": "api_docs",
"url": "https://api.example.com/openapi.json",
"description": "Your API Specification"
}
]
}
Step 2: Configure MCP Server βοΈ
Add to .cursor/mcp.json:
{
"mcpServers": {
"website-to-markdown": {
"command": "npx",
"args": ["-y", "website-to-markdown-mcp"],
"disabled": false,
"env": {
"WEBSITES_CONFIG_PATH": "./my-websites.json"
}
}
}
}
Step 3: Restart and Test π
- Restart Cursor
- Open Chat and use Agent mode
- Test command:
Please list all configured websites
π Done! No installation required!
π― Method 2: Local Installation
π‘ Best Practice: Use this method for development or customization!
Step 1: Clone and Build
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp
npm install
npm run build
Step 2: Configure MCP Server
Add to .cursor/mcp.json:
{
"mcpServers": {
"website-to-markdown": {
"command": "cmd",
"args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
"disabled": false,
"env": {
"WEBSITES_CONFIG_PATH": "./my-websites.json"
}
}
}
}
π₯ Enhanced Output Features
π Rich Content Analysis
Every fetched content now includes:
- π Content Summary: AI-generated summary of the main content
- β±οΈ Reading Time: Estimated reading time based on content length
- π’ Word Count: Accurate word count for both English and Chinese
- π Language Detection: Automatic language identification
- π― Content Quality Score: Assessment of content relevance
π Enhanced Markdown Output
# π Example Website
**Source**: https://example.com
**Website**: example_site - Example Website
**π Reading Time**: 5 minutes
**π’ Word Count**: 1,250 words
**π Language**: English
**π Summary**: This article discusses the latest developments in web technology...
---
[Enhanced Markdown content with better formatting...]
π Complete OpenAPI/Swagger Support
π₯ Professional API Documentation
| Feature | OpenAPI 3.x | Swagger 2.0 | Description |
|---|---|---|---|
| π Auto Detection | β | β | Support JSON/YAML formats |
| β Professional Validation | β | β | Using @readme/openapi-parser |
| π Structured Parsing | β | β | Endpoints, parameters, responses |
| π Reference Resolution | β | β | Auto handle $ref references |
| π Smart Summary | β | β | Generate API overview |
| π Formatted Output | β | β | Readable Markdown |
π Pre-configured Example Websites
{
"websites": [
{
"name": "petstore_openapi",
"url": "https://petstore3.swagger.io/api/v3/openapi.json",
"description": "π Swagger Petstore OpenAPI 3.0 Spec (Demo)"
},
{
"name": "petstore_swagger",
"url": "https://petstore.swagger.io/v2/swagger.json",
"description": "π± Swagger Petstore Swagger 2.0 Spec (Demo)"
},
{
"name": "github_api",
"url": "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json",
"description": "π GitHub REST API OpenAPI Spec"
}
]
}
π¦ Installation & Setup
π οΈ System Requirements
- Node.js 20.18.1+ (Recommended: v22.15.0 LTS)
- npm 10.0.0+ or yarn
- Cursor Editor
β οΈ Important: Some dependencies require Node.js v20.18.1 or higher. Please update your Node.js version if you encounter engine compatibility warnings.
β‘ NPM Package Installation
# Global installation
npm install -g website-to-markdown-mcp
# Or use directly with npx (recommended)
npx website-to-markdown-mcp
π§ Development Setup
# 1. Clone repository
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp
# 2. Install dependencies
npm install
# 3. Build project
npm run build
ποΈ Advanced Configuration Options
Configuration Priority Order
graph TD
A[π Check Environment Variable<br/>WEBSITES_CONFIG_PATH] --> B{File exists?}
B -->|Yes| C[β
Load External Config File]
B -->|No| D[π Check Environment Variable<br/>WEBSITES_CONFIG]
D --> E{Valid JSON?}
E -->|Yes| F[β
Load Embedded Config]
E -->|No| G[π Check config.json]
G --> H{File exists?}
H -->|Yes| I[β
Load Local Config]
H -->|No| J[π§ Use Default Config]
π¨ Configuration Method Details
π Method 1: External Configuration File (π Recommended)
π‘ Advantages: Easy to edit, syntax highlighting, version control friendly
π§ Detailed Setup Steps
-
Create Configuration File
# Can be placed anywhere touch my-api-configs.json -
Edit Configuration Content
{ "websites": [ { "name": "my_docs", "url": "https://docs.example.com", "description": "π My Documentation Website" } ] } -
Set Environment Variable
{ "env": { "WEBSITES_CONFIG_PATH": "./my-api-configs.json" } }
π Method 2: Embedded JSON (Backward Compatible)
π§ Configuration Example
{
"mcpServers": {
"website-to-markdown": {
"command": "cmd",
"args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
"disabled": false,
"env": {
"WEBSITES_CONFIG": "{\"websites\":[{\"name\":\"example\",\"url\":\"https://example.com\",\"description\":\"Example Website\"}]}"
}
}
}
}
π Method 3: Local config.json
π§ Local Configuration
Directly edit config.json in the project root directory:
{
"websites": [
{
"name": "local_site",
"url": "https://local.example.com",
"description": "π Local Test Website"
}
]
}
π§ Available Tools
π General Tools
| Tool Name | Function | Parameters | Example |
|---|---|---|---|
fetch_website | Fetch any website | url: Website URL | Fetch OpenAPI spec files |
list_configured_websites | List configured websites | None | View all available websites |
π― Dedicated Tools
Each configured website automatically generates corresponding dedicated tools:
fetch_petstore_openapi- Fetch Petstore OpenAPI 3.0 specfetch_petstore_swagger- Fetch Petstore Swagger 2.0 specfetch_github_api- Fetch GitHub API specfetch_tailwind_css- Fetch Tailwind CSS documentation
π Enhanced Output Format Examples
π General Website Content with Analytics
# Website Title
**Source**: https://example.com
**Website**: example_site - Example Website
**π Reading Time**: 3 minutes
**π’ Word Count**: 650 words
**π Language**: English
**π Summary**: This article provides a comprehensive overview of modern web development practices, covering frontend frameworks, backend technologies, and deployment strategies.
---
[Enhanced cleaned Markdown content with ads removed and main content extracted...]
π OpenAPI 3.x Specification File
# π Example API (v2.1.0)
**Source**: https://api.example.com/openapi.json
**OpenAPI Version**: 3.0.3
**Validation Status**: β
Valid
**π Processing Time**: 1.2 seconds
**π’ Endpoints**: 25 endpoints
**π Server Locations**: 3 servers
---
## π API Basic Information
- **API Name**: Example API
- **Version**: 2.1.0
- **OpenAPI Version**: 3.0.3
- **Description**: A powerful example API for modern applications
## π Servers
1. **https://api.example.com**
- π’ Production server
2. **https://staging-api.example.com**
- π§ͺ Testing server
## π οΈ API Endpoints
Total of **25** endpoints:
### π₯ `/users`
- **GET**: Get user list
- **POST**: Create new user
### π `/users/{id}`
- **GET**: Get specific user
- **PUT**: Update user information
- **DELETE**: Delete user
## π§© Components
- **Schemas**: 12 data models
- **Parameters**: 8 reusable parameters
- **Responses**: 15 reusable responses
- **Security Schemes**: 3 security mechanisms
π― Usage Examples
π» Basic Usage
Please fetch the content from https://docs.example.com and convert to markdown
π OpenAPI Specification Fetching
Please use the fetch_petstore_openapi tool to fetch Petstore OpenAPI specification
π Documentation Website Fetching
Please fetch React official documentation content
π¨ Troubleshooting
π Complete Troubleshooting Guide: See TROUBLESHOOTING.md for detailed solutions to common issues.
β Quick Solutions
π§ Node.js Version Issues
Error: npm WARN EBADENGINE Unsupported engine
- Solution: Update Node.js to v20.18.1 or higher
- Download: Node.js Official Website
- Verify:
node --version
π Module Not Found Issues
Error: Cannot find module './db.json'
- Solution 1: Clear npm cache:
npm cache clean --force - Solution 2: Update Node.js version
- Solution 3: Use local installation instead of npx
βοΈ Configuration Issues
Q: Configuration changes not taking effect?
- β Confirm JSON format is correct
- β Restart Cursor
- β Check environment variable names
Q: JSON format errors?
- π οΈ Use JSON Validator
- π οΈ Confirm using double quotes
- π οΈ Check for extra commas
π Debug Mode
Detailed logs are output to stderr at startup:
# View debug messages
npm run dev 2> debug.log
π Performance & Optimization
β‘ Performance Features
- π Smart Retry: Intelligent retry with exponential backoff
- πΎ Rate Limiting: Built-in rate limiting to prevent overload
- π― Content Filtering: Remove irrelevant content for faster processing
- π§Ή Ad Removal: Automatic ad and popup removal
- π Stealth Mode: Anti-detection browsing capabilities
π‘οΈ Security Considerations
- π HTTPS websites only (recommended)
- π οΈ Auto filter malicious scripts
- π Limit output content length
- π Stealth browsing to avoid detection
π¦ Dependencies
| Package | Version | Purpose |
|---|---|---|
@modelcontextprotocol/sdk | ^1.0.0 | MCP Core Framework |
@readme/openapi-parser | ^4.1.0 | Professional OpenAPI Parsing |
axios | ^1.6.0 | HTTP Request Handling |
cheerio | ^1.0.0 | HTML Parsing Engine |
turndown | ^7.1.2 | HTML to Markdown |
yaml | ^2.8.0 | YAML Format Support |
zod | ^3.22.0 | Data Validation Framework |
playwright | ^1.40.0 | Browser automation |
π Changelog
π v1.2.0 (Latest)
π Major Feature Updates
- β¨ Added Enhanced content processing with AI-powered cleanup
- β¨ Added Smart analytics: word count, reading time, content summary
- β¨ Added Language detection and multi-language support
- β¨ Added Stealth browser capabilities for anti-detection
- β¨ Added Built-in rate limiting and retry mechanisms
- β¨ Added Advanced content filtering and ad removal
- π§ Enhanced Markdown processing with more HTML element support
- π Improved Output format with rich metadata
- π― Fixed Various technical issues and dependencies
π― v1.1.0 (Previous)
π Major Feature Updates
- β¨ Added Full OpenAPI 3.x/Swagger 2.0 support
- β¨ Added JSON/YAML format auto-detection
- β¨ Added Professional-grade spec validation and reference resolution
- β¨ Added Version auto-adaptation mechanism
- β¨ Added Structured API documentation summary
- π§ Pre-configured Multiple OpenAPI/Swagger examples
- π¦ Added NPM package distribution with npx support
- π― Enhanced Installation methods for better user experience
π― v1.0.0 (Stable)
- π Initial Release
- π Basic Functions Website content fetching
- π Core Functions Markdown conversion
- βοΈ Configuration Support Multi-website management
π€ Contributing
π‘ How to Contribute
- π΄ Fork this project
- π Create feature branch (
git checkout -b feature/AmazingFeature) - π Commit changes (
git commit -m 'Add some AmazingFeature') - π€ Push to branch (
git push origin feature/AmazingFeature) - π Open Pull Request
π Issue Reporting
Report issues on the Issues page, please include:
- π Issue Description
- π Reproduction Steps
- π» Environment Information
- πΈ Screenshots or Logs
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π If this project helps you, please give it a Star!
π¬ Have questions or suggestions? Feel free to open an Issue!
Made by Sun β€οΈ for the Developer Community
Server Terkait
Bright Data
sponsorDiscover, extract, and interact with the web - one interface powering automated access across the public internet.
Skyvern
AI-powered browser automation MCP server β navigate sites, fill forms, extract data, and handle logins via Claude Code CLI
UseScraper
A server for web scraping using the UseScraper API.
Automatic MCP Discovery
AI powered automation toolkit which acts as an agent that discovers MCP servers for you. Point it at GitHub/npm/configure your own discovery, let GPT or Claude analyze the API or MCP or any tool, get ready-to-ship plugin configs. Zero manual work.
Puppeteer
A server for browser automation using Puppeteer, enabling web scraping, screenshots, and JavaScript execution.
Intelligence Aeternum (Fluora MCP)
AI training dataset marketplace β 2M+ museum artworks across 7 world-class institutions with on-demand 111-field Golden Codex AI enrichment. x402 USDC micropayments on Base L2. First monetized art/provenance MCP server. Research-backed: dense metadata improves VLM capability by +25.5% (DOI: 10.5281/zenodo.18667735)
Context Scraper MCP Server
A server for web crawling and content extraction using the Crawl4AI library.
Leporello
Remote MCP for Opera & Classical Music Event Schedules
Readability Parser
Extracts and transforms webpage content into clean, LLM-optimized Markdown using the Readability algorithm.
Redfin MCP Server
Property search, price history, comparable sales, and neighborhood analysis
News MCP Server
Real-time news aggregation from AP, BBC, NPR, Hacker News, and Google News