MCP Go Colly Crawler
A web crawling framework that integrates the Model Context Protocol (MCP) with the Colly web scraping library.
MCP Go Colly Crawler
Overview
MCP Go Colly is a sophisticated web crawling framework that integrates the Model Context Protocol (MCP) with the powerful Colly web scraping library. This project aims to provide a flexible and extensible solution for extracting web content for large language model (LLM) applications.
Features
- Concurrent web crawling with configurable depth and domain restrictions
- MCP server integration for tool-based crawling
- Graceful shutdown handling
- Robust error handling and result formatting
- Support for both single URL and batch URL crawling
Building from Source
Prerequisites
- Go 1.21 or later
- Make (for using Makefile commands)
Installation
- Clone the repository:
git clone https://github.com/yourusername/mcp-go-colly.git
cd mcp-go-colly
- Install dependencies:
make deps
Building
The project includes a Makefile with several useful commands:
# Build the binary (outputs to bin/mcp-go-colly)
make build
# Build for all platforms (Linux, Windows, macOS)
make build-all
# Run tests
make test
# Clean build artifacts
make clean
# Format code
make fmt
# Run linter
make lint
All binaries will be generated in the bin/ directory.
Then you need to add the following configuration to the claude_desktop_config.json file:
{
"mcpServers": {
"web-scraper": {
"command": "<add path here>/mcp-go-colly/bin/mcp-go-colly"
}
}
}
Usage
As an MCP Tool
The crawler is implemented as an MCP tool that can be called with the following parameters:
{
"urls": ["https://example.com"], // Single URL or array of URLs
"max_depth": 2 // Optional: Maximum crawl depth (default: 2)
}
Example MCP Tool Call
result, err := crawlerTool.Call(ctx, mcp.CallToolRequest{
Params: struct{ Arguments map[string]interface{} }{
Arguments: map[string]interface{}{
"urls": []string{"https://example.com"},
"max_depth": 2,
},
},
})
Configuration Options
max_depth: Set maximum crawl depth (default: 2)urls: Single URL string or array of URLs to crawl- Domain restrictions are automatically applied based on the provided URLs
Contributing
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
License
MIT
Acknowledgments
- Colly Web Scraping Framework
- Mark3 Labs MCP Project
Máy chủ liên quan
Bright Data
nhà tài trợDiscover, extract, and interact with the web - one interface powering automated access across the public internet.
MCP Webscan Server
Fetch, analyze, and extract information from web pages.
YouTube Transcript MCP Server
A high-performance MCP server for fetching YouTube video transcripts, with support for caching, rate limiting, and proxy rotation.
Browserbase
Automate browser interactions in the cloud (e.g. web navigation, data extraction, form filling, and more)
MCP Deep Web Research Server
An advanced web research server with intelligent search queuing, enhanced content extraction, and deep research capabilities.
freesound-mcp
A Model Context Protocol (MCP) server that enables AI applications to search and download audio resources from the Freesound platform via natural language commands.
youtube-summarize
MCP server that fetches YouTube video transcripts and summarizes them using your LLM client
Intelligence Aeternum (Fluora MCP)
AI training dataset marketplace — 2M+ museum artworks across 7 world-class institutions with on-demand 111-field Golden Codex AI enrichment. x402 USDC micropayments on Base L2. First monetized art/provenance MCP server. Research-backed: dense metadata improves VLM capability by +25.5% (DOI: 10.5281/zenodo.18667735)
Redfin MCP Server
Property search, price history, comparable sales, and neighborhood analysis
Google News Trends MCP
Access Google News and Google Trends data without paid APIs.
HDW MCP Server
Access and manage LinkedIn data and user accounts using the HorizonDataWave API.