MewCP Firecrawl MCP Server
Hosted, Stateless & Multitenant Firecrawl MCP server enables AI assistants to crawl, scrape, and extract structured web data through Firecrawl.
Documentation
Turn any website into clean, AI-ready data.
A Model Context Protocol (MCP) server that exposes Firecrawl's API for scraping, crawling, mapping, searching, and extracting structured data from websites.
Overview
The Firecrawl MCP Server provides powerful web data extraction capabilities:
- Scrape individual pages or crawl entire websites into markdown, HTML, JSON, and more
- Search the web and extract structured data using LLMs with async job management
- Map website URL structures and run autonomous agent-based data extraction
Perfect for:
- AI assistants that need to fetch and process live web content
- Automating structured data extraction from multiple websites
- Building research and competitive intelligence pipelines
Tools
health_check — Check server readiness
Verifies the server is operational. No authentication required.
Inputs:
None
Output:
{
"success": true,
"status": "ok",
"server": "CL Firecrawl MCP Server",
"version": "0.1.0"
}
scrape — Scrape a single URL
Scrapes a single URL and returns the content in one or more formats. Supports JavaScript rendering, mobile emulation, and HTML tag filtering.
Inputs:
- `url` (string, required) — The URL to scrape
- `formats` (string, optional) — Comma-separated output formats: markdown, html, rawHtml, json, screenshot, links, images, summary, audio, branding, changeTracking (default: markdown)
- `only_main_content` (bool, optional) — Extract only main content, excluding headers/footers/navs (default: true)
- `include_tags` (string, optional) — Comma-separated HTML tags to include in output
- `exclude_tags` (string, optional) — Comma-separated HTML tags to exclude from output
- `wait_for_selector` (string, optional) — CSS selector to wait for before scraping
- `timeout_ms` (int, optional) — Request timeout in milliseconds (1000–300000, default: 30000)
- `mobile` (bool, optional) — Emulate mobile device (default: false)
- `skip_tls_verification` (bool, optional) — Skip TLS certificate verification (default: true)
- `proxy` (string, optional) — Proxy type: basic, enhanced, or auto (default: auto)
- `block_ads` (bool, optional) — Block ads and cookie popups (default: true)
- `remove_base64_images` (bool, optional) — Remove base64 encoded images from output (default: true)
Output:
{
"success": true,
"data": {
"markdown": "# Page Title\n...",
"metadata": { "title": "...", "url": "..." }
}
}
crawl — Crawl an entire website
Starts a crawl job from a base URL, following links and scraping all discovered pages. Returns a job ID for async processing.
Inputs:
- `url` (string, required) — Base URL to start crawling from
- `prompt` (string, optional) — Natural language prompt to generate crawler options
- `exclude_paths` (string, optional) — Comma-separated regex patterns for URLs to exclude
- `include_paths` (string, optional) — Comma-separated regex patterns for URLs to include
- `max_discovery_depth` (int, optional) — Maximum crawl depth from the start URL
- `sitemap` (string, optional) — Sitemap mode: skip, include, or only (default: include)
- `ignore_query_parameters` (bool, optional) — Don't re-scrape same path with different query params (default: false)
- `limit` (int, optional) — Maximum number of pages to crawl (default: 10000)
- `crawl_entire_domain` (bool, optional) — Follow sibling and parent URLs (default: false)
- `allow_external_links` (bool, optional) — Follow links to external domains (default: false)
- `allow_subdomains` (bool, optional) — Follow links to subdomains (default: false)
- `delay` (float, optional) — Delay in seconds between requests
- `max_concurrency` (int, optional) — Maximum concurrent scrapes
- `formats` (string, optional) — Output formats, comma-separated (default: markdown)
- `only_main_content` (bool, optional) — Extract main content only (default: true)
- `zero_data_retention` (bool, optional) — Enable zero data retention (default: false)
Output:
{
"success": true,
"id": "crawl-job-uuid",
"url": "https://api.firecrawl.dev/v2/crawl/crawl-job-uuid"
}
map — Map all URLs on a website
Discovers and lists all URLs found on a website. Useful for site auditing and understanding site structure before crawling.
Inputs:
- `url` (string, required) — Base URL to start mapping from
- `search` (string, optional) — Filter and rank results by relevance to this query
- `sitemap` (string, optional) — Sitemap mode: skip, include, or only (default: include)
- `include_subdomains` (bool, optional) — Include subdomains (default: true)
- `ignore_query_parameters` (bool, optional) — Exclude URLs with query parameters (default: true)
- `ignore_cache` (bool, optional) — Bypass sitemap cache for fresh results (default: false)
- `limit` (int, optional) — Maximum URLs to return (max: 100000, default: 5000)
- `timeout_ms` (int, optional) — Timeout in milliseconds
- `country` (string, optional) — ISO 3166-1 alpha-2 country code (e.g., US, DE)
- `languages` (string, optional) — Comma-separated preferred languages (e.g., en-US,de-DE)
Output:
{
"success": true,
"links": ["https://example.com/", "https://example.com/about", "..."]
}
search — Search the web and scrape results
Searches the web using a query and optionally scrapes the full content of result pages.
Inputs:
- `query` (string, required) — Search query (max 500 characters)
- `limit` (int, optional) — Number of results to return (1–100, default: 5)
- `sources` (string, optional) — Comma-separated sources: web, images, news (default: web)
- `categories` (string, optional) — Comma-separated filters: github, research, pdf
- `tbs` (string, optional) — Time filter: qdr:d (day), qdr:w (week), qdr:m (month)
- `location` (string, optional) — Geographic location (e.g., San Francisco,California,United States)
- `country` (string, optional) — ISO country code for geo-targeting (default: US)
- `timeout` (int, optional) — Timeout in milliseconds (1000–300000, default: 60000)
- `ignore_invalid_urls` (bool, optional) — Exclude invalid URLs from results (default: false)
- `formats` (string, optional) — Scrape output formats, comma-separated (default: markdown)
- `mobile` (bool, optional) — Emulate mobile device when scraping (default: false)
- `proxy` (string, optional) — Proxy type: basic, enhanced, or auto (default: auto)
- `block_ads` (bool, optional) — Block ads and cookie popups (default: true)
Output:
{
"success": true,
"data": [
{ "url": "https://...", "markdown": "...", "metadata": { "title": "..." } }
]
}
agent — Autonomous website navigation and data extraction
Starts an autonomous agent that navigates websites and extracts data based on a natural language prompt. Returns a job ID — poll with agent_status for results.
Inputs:
- `prompt` (string, required) — Natural language description of what data to extract (max 10000 characters)
- `urls` (string, optional) — Comma-separated URLs to constrain the agent to
- `schema` (string, optional) — JSON schema string to structure extracted data
- `max_credits` (float, optional) — Maximum credits to spend (default: 2500)
- `strict_constrain_to_urls` (bool, optional) — Only visit URLs listed in the urls param (default: false)
- `model` (string, optional) — Model to use: spark-1-mini (default, cheaper) or spark-1-pro (higher accuracy)
Output:
{
"success": true,
"jobId": "agent-job-uuid"
}
agent_status — Check agent job status
Polls the status of an agent job started by agent. Poll every 15–30 seconds for up to 2–3 minutes.
Inputs:
- `job_id` (string, required) — Agent job ID returned by the agent tool (UUID format)
Output:
{
"success": true,
"status": "completed",
"data": { "extracted": "..." }
}
extract — Async structured data extraction using LLMs
Starts an async job to extract structured data from one or more URLs using LLMs and an optional schema. Returns a job ID — poll with extract_status.
Inputs:
- `urls` (string, required) — Comma-separated URLs to extract from (glob format supported)
- `prompt` (string, optional) — Custom prompt to guide the extraction
- `schema` (string, optional) — JSON schema string for structured output
- `enable_web_search` (bool, optional) — Use web search for additional context (default: false)
- `ignore_sitemap` (bool, optional) — Ignore sitemap.xml files (default: false)
- `include_subdomains` (bool, optional) — Include subdomains in scanning (default: true)
- `show_sources` (bool, optional) — Include extraction sources in response (default: false)
- `ignore_invalid_urls` (bool, optional) — Skip invalid URLs instead of failing (default: true)
- `formats` (string, optional) — Scrape output formats, comma-separated (default: markdown)
- `only_main_content` (bool, optional) — Extract main content only (default: true)
- `mobile` (bool, optional) — Emulate mobile device (default: false)
- `proxy` (string, optional) — Proxy type: basic, enhanced, or auto (default: auto)
- `block_ads` (bool, optional) — Block ads and cookie popups (default: true)
Output:
{
"success": true,
"id": "extract-job-uuid"
}
extract_status — Check extraction job status
Polls the status of an extraction job started by extract.
Inputs:
- `job_id` (string, required) — Extraction job ID returned by the extract tool (UUID format)
Output:
{
"success": true,
"status": "completed",
"data": { "field": "extracted value" }
}
API Parameters Reference
Output Formats
All scraping tools accept a comma-separated formats parameter:
markdown— Clean markdown (default)html— Cleaned HTMLrawHtml— Raw page HTMLjson— Structured JSON extractionscreenshot— Page screenshotlinks— All links found on the pageimages— All image URLssummary— AI-generated page summary
Proxy Options
basic— Standard proxy for general useenhanced— Advanced proxy for bot-protected sitesauto— Automatically selects the best proxy (default)
Async Job Workflow
Tools crawl, agent, and extract are asynchronous:
- Call the tool → receive a
job_id - Poll
agent_statusorextract_statuswith thejob_id - Keep polling until
statusiscompleted,failed, orcancelled
Recommended polling interval: every 15–30 seconds for at least 2–3 minutes before considering a job failed.
Time-Based Search Filters (tbs)
Use the tbs parameter in search to filter results by recency:
qdr:h — Past hour
qdr:d — Past day
qdr:w — Past week
qdr:m — Past month
qdr:y — Past year
Getting Your Firecrawl API Key
Steps
- Go to Firecrawl and sign in or create an account
- Navigate to API Keys in your dashboard
- Click Create API Key
- Copy the generated key — you will only see it once
Troubleshooting
Missing or Invalid Headers
- Cause: API key not provided in request headers or incorrect format
- Solution:
- Verify
Authorization: Bearer YOUR_API_KEYandX-Mewcp-Credential-Id: CREDENTIAL-IDheaders are present - Check API key is active in your MewCP account
- Verify
Insufficient Credits
- Cause: API calls have exceeded your request limits
- Solution:
- Check credit usage in your Curious Layer dashboard
- Upgrade to a paid plan or add credits for higher limits
- Contact support for credit adjustments
Credential Not Connected
- Cause: No Firecrawl credential linked to your account
- Solution:
- Go to Credentials in your MewCP dashboard
- Add your Firecrawl API key
- Retry the request with the correct
X-Mewcp-Credential-Idheader
Malformed Request Payload
- Cause: JSON payload is invalid or missing required fields
- Solution:
- Validate JSON syntax before sending
- Ensure all required tool parameters are included
- Check parameter types match expected values (e.g.
timeout_msmust be 1000–300000)
Server Not Found
- Cause: Incorrect server name in the API endpoint
- Solution:
- Verify endpoint format:
{server-name}/mcp/{tool-name} - Use correct server name from documentation
- Check available servers in your Curious Layer account
- Verify endpoint format:
Firecrawl API Error
- Cause: Upstream Firecrawl API returned an error
- Solution:
- Check Firecrawl service status at Firecrawl Status
- Verify your API key has sufficient credits for the operation
- Review the error message returned in the response for specific details
Resources
- Firecrawl Documentation — Official API reference
- Firecrawl API Reference — Complete endpoint reference
- FastMCP Docs — FastMCP specification
- FastMCP Credentials — FastMCP Credentials package for credential handling