WebSearch
A web search and content extraction tool using the Firecrawl API for advanced web scraping, searching, and content analysis.
WebSearch - Advanced Web Search and Content Extraction Tool
A powerful web search and content extraction tool built with Python, leveraging the Firecrawl API for advanced web scraping, searching, and content analysis capabilities.
🚀 Features
- Advanced Web Search: Perform intelligent web searches with customizable parameters
- Content Extraction: Extract specific information from web pages using natural language prompts
- Web Crawling: Crawl websites with configurable depth and limits
- Web Scraping: Scrape web pages with support for various output formats
- MCP Integration: Built as a Model Context Protocol (MCP) server for seamless integration
📋 Prerequisites
- Python 3.8 or higher
- uv package manager
- Firecrawl API key
- OpenAI API key (optional, for enhanced features)
- Tavily API key (optional, for additional search capabilities)
🛠️ Installation
- Install uv:
# On Windows (using pip)
pip install uv
# On Unix/MacOS
curl -LsSf https://astral.sh/uv/install.sh | sh
# Add uv to PATH (Unix/MacOS)
export PATH="$HOME/.local/bin:$PATH"
# Add uv to PATH (Windows - add to Environment Variables)
# Add: %USERPROFILE%\.local\bin
- Clone the repository:
git clone https://github.com/yourusername/websearch.git
cd websearch
- Create and activate a virtual environment with uv:
# Create virtual environment
uv venv
# Activate on Windows
.\.venv\Scripts\activate.ps1
# Activate on Unix/MacOS
source .venv/bin/activate
- Install dependencies with uv:
# Install from requirements.txt
uv sync
- Set up environment variables:
# Create .env file
touch .env
# Add your API keys
FIRECRAWL_API_KEY=your_firecrawl_api_key
OPENAI_API_KEY=your_openai_api_key
🎯 Usage
Setting Up With Claude for Desktop
Instead of running the server directly, you can configure Claude for Desktop to access the WebSearch tools:
-
Locate or create your Claude for Desktop configuration file:
- Windows:
%env:AppData%\Claude\claude_desktop_config.json - macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
-
Add the WebSearch server configuration to the
mcpServerssection:
{
"mcpServers": {
"websearch": {
"command": "uv",
"args": [
"--directory",
"D:\\ABSOLUTE\\PATH\\TO\\WebSearch",
"run",
"main.py"
]
}
}
}
-
Make sure to replace the directory path with the absolute path to your WebSearch project folder.
-
Save the configuration file and restart Claude for Desktop.
-
Once configured, the WebSearch tools will appear in the tools menu (hammer icon) in Claude for Desktop.
Available Tools
-
Search
-
Extract Information
-
Crawl Websites
-
Scrape Content
📚 API Reference
Search
query(str): The search query- Returns: Search results in JSON format
Extract
urls(List[str]): List of URLs to extract information fromprompt(str): Instructions for extractionenableWebSearch(bool): Enable supplementary web searchshowSources(bool): Include source references- Returns: Extracted information in specified format
Crawl
url(str): Starting URLmaxDepth(int): Maximum crawl depthlimit(int): Maximum pages to crawl- Returns: Crawled content in markdown/HTML format
Scrape
url(str): Target URL- Returns: Scraped content with optional screenshots
🔧 Configuration
Environment Variables
The tool requires certain API keys to function. We provide a .env.example file that you can use as a template:
- Copy the example file:
# On Unix/MacOS
cp .env.example .env
# On Windows
copy .env.example .env
- Edit the
.envfile with your API keys:
# OpenAI API key - Required for AI-powered features
OPENAI_API_KEY=your_openai_api_key_here
# Firecrawl API key - Required for web scraping and searching
FIRECRAWL_API_KEY=your_firecrawl_api_key_here
Getting the API Keys
-
OpenAI API Key:
- Visit OpenAI's platform
- Sign up or log in
- Navigate to API keys section
- Create a new secret key
-
Firecrawl API Key:
- Visit Firecrawl's website
- Create an account
- Navigate to your dashboard
- Generate a new API key
If everything is configured correctly, you should receive a JSON response with search results.
Troubleshooting
If you encounter errors:
- Ensure all required API keys are set in your
.envfile - Verify the API keys are valid and have not expired
- Check that the
.envfile is in the root directory of the project - Make sure the environment variables are being loaded correctly
🤝 Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Firecrawl for their powerful web scraping API
- OpenAI for AI capabilities
- MCPThe MCP community for the protocol specification
📬 Contact
José Martín Rodriguez Mortaloni - @m4s1t425 - [email protected]
Made with ❤️ using Python and Firecrawl
Servidores relacionados
Bright Data
patrocinadorDiscover, extract, and interact with the web - one interface powering automated access across the public internet.
MCP360
MCP360 is a unified gateway and marketplace that provides 100+ external tools and custom MCPs through a single integration for AI agents.
Puppeteer
Provides browser automation using Puppeteer, enabling interaction with web pages, taking screenshots, and executing JavaScript.
Career Site Jobs
A MCP server to retrieve up-to-date jobs from company career sites.
MCP-Puppeteer-Linux
Automate web browsers on Linux using Puppeteer. Enables LLMs to interact with web pages, take screenshots, and execute JavaScript.
Google Flights
An MCP server to interact with Google Flights data for finding flight information.
Oxylabs AI Studio
AI-powered tools for web scraping, crawling, and browser automation.
Claimify
Extracts factual claims from text using the Claimify methodology. Requires an OpenAI API key.
Notte
Leverage Notte Web AI agents & cloud browser sessions for scalable browser automation & scraping workflows
MCP Web Research Server
A server for web research that brings real-time information into AI models like Claude.
Scrapeless
Integrate real-time Scrapeless Google SERP(Google Search, Google Flight, Google Map, Google Jobs....) results into your LLM applications. This server enables dynamic context retrieval for AI workflows, chatbots, and research tools.