A web crawling framework that integrates the Model Context Protocol (MCP) with the Colly web scraping library.
MCP Go Colly is a sophisticated web crawling framework that integrates the Model Context Protocol (MCP) with the powerful Colly web scraping library. This project aims to provide a flexible and extensible solution for extracting web content for large language model (LLM) applications.
git clone https://github.com/yourusername/mcp-go-colly.git
cd mcp-go-colly
make deps
The project includes a Makefile with several useful commands:
# Build the binary (outputs to bin/mcp-go-colly)
make build
# Build for all platforms (Linux, Windows, macOS)
make build-all
# Run tests
make test
# Clean build artifacts
make clean
# Format code
make fmt
# Run linter
make lint
All binaries will be generated in the bin/
directory.
Then you need to add the following configuration to the claude_desktop_config.json
file:
{
"mcpServers": {
"web-scraper": {
"command": "<add path here>/mcp-go-colly/bin/mcp-go-colly"
}
}
}
The crawler is implemented as an MCP tool that can be called with the following parameters:
{
"urls": ["https://example.com"], // Single URL or array of URLs
"max_depth": 2 // Optional: Maximum crawl depth (default: 2)
}
result, err := crawlerTool.Call(ctx, mcp.CallToolRequest{
Params: struct{ Arguments map[string]interface{} }{
Arguments: map[string]interface{}{
"urls": []string{"https://example.com"},
"max_depth": 2,
},
},
})
max_depth
: Set maximum crawl depth (default: 2)urls
: Single URL string or array of URLs to crawlMIT
A web search and content extraction tool using the Firecrawl API for advanced web scraping, searching, and content analysis.
Download webpages as markdown files using the r.jina.ai service, with configurable directories and persistent settings.
A Java-based MCP server for interacting with the Crawl4ai web scraping API.
Provides real-time financial market data from Yahoo Finance.
Scrape Weibo user information, feeds, and perform searches.
Interact with Yahoo Finance to get stock data, market news, and financial information using the yfinance Python library.
Fetches and converts website content to Markdown with AI-powered cleanup, OpenAPI support, and stealth browsing.
AI tools for web scraping, crawling, browser control, and web search via the Oxylabs AI Studio API.
Enable AI agents to get structured data from unstructured web with AgentQL.
Integrate real-time Scrapeless Google SERP(Google Search, Google Flight, Google Map, Google Jobs....) results into your LLM applications. This server enables dynamic context retrieval for AI workflows, chatbots, and research tools.