Fetch, extract, and process web and API content. Supports resource blocking, authentication, and Google Custom Search.
Google Custom Search API is free with usage limits (e.g., 100 queries per day for free, with additional queries requiring payment). For full details on quotas, pricing, and restrictions, see the official documentation.
Developed by Rayss
🚀 Open Source Project
🛠️ Built with Node.js & TypeScript (Node.js v18+ required)
Web-curl is a powerful tool for fetching and extracting text content from web pages and APIs. Use it as a standalone CLI or as an MCP (Model Context Protocol) server. Web-curl leverages Puppeteer for robust web scraping and supports advanced features such as resource blocking, custom headers, authentication, and Google Custom Search.
src/index.ts
fetch_webpage
, fetch_api
, google_search
, and smart_command
.src/rest-client.ts
To integrate web-curl as an MCP server, add the following configuration to your mcp_settings.json
:
{
"mcpServers": {
"web-curl": {
"command": "node",
"args": [
"build/index.js"
],
"disabled": false,
"alwaysAllow": [
"fetch_webpage",
"fetch_api",
"google_search",
"smart_command"
],
"env": {
"APIKEY_GOOGLE_SEARCH": "YOUR_GOOGLE_API_KEY",
"CX_GOOGLE_SEARCH": "YOUR_CX_ID"
}
}
}
}
Get a Google API Key:
Get a Custom Search Engine (CX) ID:
Enable Custom Search API:
Replace YOUR_GOOGLE_API_KEY
and YOUR_CX_ID
in the config above.
# Clone the repository
git clone https://github.com/rayss868/MCP-Web-Curl
cd web-curl
# Install dependencies
npm install
# Build the project
npm run build
Windows: Just run npm install
.
Linux: You must install extra dependencies for Chromium. Run:
sudo apt-get install -y \
ca-certificates fonts-liberation libappindicator3-1 libasound2 libatk-bridge2.0-0 \
libatk1.0-0 libcups2 libdbus-1-3 libdrm2 libgbm1 libnspr4 libnss3 \
libx11-xcb1 libxcomposite1 libxdamage1 libxrandr2 xdg-utils
For more details, see the Puppeteer troubleshooting guide.
The CLI supports fetching and extracting text content from web pages.
# Basic usage
node build/index.js https://example.com
# With options
node build/index.js --timeout 30000 --no-block-resources https://example.com
# Save output to a file
node build/index.js -o result.json https://example.com
--timeout <ms>
: Set navigation timeout (default: 60000)--no-block-resources
: Disable blocking of images, stylesheets, and fonts-o <file>
: Output result to specified fileWeb-curl can be run as an MCP server for integration with Roo Context or other MCP-compatible environments.
npm run start
The server will communicate via stdin/stdout and expose the tools as defined in src/index.ts
.
{
"name": "fetch_webpage",
"arguments": {
"url": "https://example.com",
"blockResources": true,
"timeout": 60000,
"maxLength": 10000
}
}
Set the following environment variables for Google Custom Search:
APIKEY_GOOGLE_SEARCH
: Your Google API keyCX_GOOGLE_SEARCH
: Your Custom Search Engine ID{
"name": "fetch_webpage",
"arguments": {
"url": "https://en.wikipedia.org/wiki/Web_scraping",
"blockResources": true,
"maxLength": 5000
}
}
{
"name": "fetch_api",
"arguments": {
"url": "https://api.github.com/repos/nodejs/node",
"method": "GET",
"headers": {
"Accept": "application/vnd.github.v3+json"
}
}
}
{
"name": "google_search",
"arguments": {
"query": "web scraping best practices",
"num": 5
}
}
timeout
parameter if requests are timing out.resourceTypesToBlock
.APIKEY_GOOGLE_SEARCH
and CX_GOOGLE_SEARCH
are set in your environment.logs/error-log.txt
file for detailed error messages.maxLength
and startIndex
to paginate content extraction.src/index.ts
for all available options.Contributions are welcome! If you want to contribute, fork this repository and submit a pull request.
If you find any issues or have suggestions, please open an issue on the repository page.
This project was developed by Rayss.
For questions, improvements, or contributions, please contact the author or open an issue in the repository.
Note: Google Search API is free with usage limits. For details, see: Google Custom Search API Overview
Download video and audio from various platforms like YouTube, Facebook, and TikTok using yt-dlp.
Integrate real-time Scrapeless Google SERP(Google Search, Google Flight, Google Map, Google Jobs....) results into your LLM applications. This server enables dynamic context retrieval for AI workflows, chatbots, and research tools.
Fetch YouTube subtitles
Fetch the content of a remote URL as Markdown with Jina Reader.
Dynamically scan and analyze potentially malicious URLs using the urlDNA.io
An automated tool to search notes, retrieve content, and post comments on Xiaohongshu (RedBook) using Playwright.
Interact with WebScraping.AI for web data extraction and scraping.
Discover, extract, and interact with the web - one interface powering automated access across the public internet.
High-quality screenshot capture optimized for Claude Vision API. Automatically tiles full pages into 1072x1072 chunks (1.15 megapixels) with configurable viewports and wait strategies for dynamic content.
Download webpages as markdown files using the r.jina.ai service, with configurable directories and persistent settings.