Mozilla Readability Parser
Extracts and transforms webpage content into clean, LLM-optimized Markdown using Mozilla's Readability algorithm.
Mozilla Readability Parser MCP Server
An model context protocol (MCP) server that extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure. More about MCP.
Features
- Removes ads, navigation, footers and other non-essential content
- Converts clean HTML into well-formatted Markdown (also uses Turndown)
- Returns article metadata (title, excerpt, byline, site name)
- Handles errors gracefully
Why Not Just Fetch?
Unlike simple fetch requests, this server:
- Extracts only relevant content using Mozilla's Readability algorithm
- Eliminates noise like ads, popups, and navigation menus
- Reduces token usage by removing unnecessary HTML/CSS
- Provides consistent Markdown formatting for better LLM processing
- Includes useful metadata about the content
Installation
Installing via Smithery
To install Mozilla Readability Parser for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install server-moz-readability --client claude
Manual Installation
npm install server-moz-readability
Tool Reference
parse
Fetches and transforms webpage content into clean Markdown.
Arguments:
{ "url": { "type": "string", "description": "The website URL to parse", "required": true } }
Returns:
{ "title": "Article title", "content": "Markdown content...", "metadata": { "excerpt": "Brief summary", "byline": "Author information", "siteName": "Source website name" } }
Usage with Claude Desktop
Add to your claude_desktop_config.json:
{ "mcpServers": { "readability": { "command": "npx", "args": ["-y", "server-moz-readability"] } } }
Dependencies
- @mozilla/readability - Content extraction
- turndown - HTML to Markdown conversion
- jsdom - DOM parsing
- axios - HTTP requests
License
MIT
Máy chủ liên quan
Bright Data
nhà tài trợDiscover, extract, and interact with the web - one interface powering automated access across the public internet.
LinkedIn
Scrape LinkedIn profiles, companies, and jobs using direct URLs. Features Claude AI integration and secure credential storage.
Postman V2
An MCP server that provides access to Postman using V2 api version.
LinkedIn MCP
Scrape LinkedIn profiles and companies, get recommended jobs, and perform job searches.
Read URL MCP
Extracts web content from a URL and converts it to clean Markdown format.
iReader MCP
Tools for reading and extracting content from the internet.
Puppeteer
A server for browser automation using Puppeteer, enabling web scraping, screenshots, and JavaScript execution.
Wayback Machine
Access the Internet Archive's Wayback Machine to retrieve archived web pages and check for available snapshots of URLs.
Scrapezy
Turn websites into datasets with Scrapezy
Notte
Leverage Notte Web AI agents & cloud browser sessions for scalable browser automation & scraping workflows
Puppeteer Vision
Scrape webpages and convert them to markdown using Puppeteer. Features AI-driven interaction capabilities.