Crawl4AI
Web scraping skill for Claude AI. Crawl websites, extract structured data with CSS/LLM strategies, handle dynamic JavaScript content. Built on crawl4ai with complete SDK reference, example scripts, and tests.
Crawl4AI Claude Skill
A comprehensive Claude skill for web crawling and data extraction using Crawl4AI. This skill enables Claude to scrape websites, extract structured data, handle JavaScript-heavy pages, crawl multiple URLs, and build automated web data pipelines.
Features
- Web Crawling: Extract content from any website with full JavaScript support
- Data Extraction: Schema-based CSS extraction (LLM-free) and LLM-based extraction
- Markdown Generation: Clean, well-formatted markdown output optimized for LLM consumption
- Content Filtering: Relevance-based filtering using BM25 and quality-based pruning
- Session Management: Persistent sessions for authenticated crawling
- Batch Processing: Concurrent multi-URL crawling
- CLI & SDK: Both command-line interface and Python SDK support
Installation
Method 1: Import as ZIP (Recommended for Claude Desktop)
-
Download or clone this repository
-
Create a ZIP file of the
crawl4aidirectory:cd crawl4ai-skill zip -r crawl4ai.zip crawl4ai/ -
In Claude Desktop, go to Settings → Developer → Import Skill
-
Select the
crawl4ai.zipfile
Method 2: Git Clone
git clone https://github.com/brettdavies/crawl4ai-skill.git
cd crawl4ai-skill
Then add the skill directory to Claude's skills folder or import via Claude Desktop.
Prerequisites
This skill requires the Crawl4AI Python library:
pip install crawl4ai
crawl4ai-setup
# Verify installation
crawl4ai-doctor
Quick Start
CLI Usage (Recommended for Quick Tasks)
# Basic crawling - returns markdown
crwl https://example.com
# Get markdown output
crwl https://example.com -o markdown
# JSON output with cache bypass
crwl https://example.com -o json -v --bypass-cache
Python SDK Usage
import asyncio
from crawl4ai import AsyncWebCrawler
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun("https://example.com")
print(result.markdown[:500])
asyncio.run(main())
Documentation
- SKILL.md - Complete skill documentation with examples
- CLI Guide - Command-line interface reference
- SDK Guide - Python SDK quick reference
- Complete SDK Reference - Full API documentation (5900+ lines)
Common Use Cases
Documentation to Markdown
crwl https://docs.example.com -o markdown > docs.md
E-commerce Product Monitoring
# Generate schema once (uses LLM)
python crawl4ai/scripts/extraction_pipeline.py --generate-schema https://shop.com "extract products"
# Use schema for extraction (no LLM costs)
crwl https://shop.com -e extract_css.yml -s product_schema.json -o json
News Aggregation
# Multiple sources with filtering
for url in news1.com news2.com news3.com; do
crwl "https://$url" -f filter_bm25.yml -o markdown-fit
done
Scripts
The skill includes helper scripts in crawl4ai/scripts/:
- basic_crawler.py - Simple markdown extraction
- batch_crawler.py - Multi-URL processing
- extraction_pipeline.py - Schema generation and extraction
Testing
Run the test suite to verify the skill works correctly:
cd crawl4ai/tests
python run_all_tests.py
Marketplace
This skill is available on Claude Skills marketplaces:
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For issues, questions, or feature requests, please open an issue on the GitHub repository.
Changelog
See CHANGELOG.md for version history and updates.
Server Terkait
Bright Data
sponsorDiscover, extract, and interact with the web - one interface powering automated access across the public internet.
Markdown Downloader
Download webpages as markdown files using the r.jina.ai service, with configurable directories and persistent settings.
Leporello
Remote MCP for Opera & Classical Music Event Schedules
Puppeteer Vision
Scrape webpages and convert them to markdown using Puppeteer. Features AI-driven interaction capabilities.
Yahoo Finance
Interact with Yahoo Finance to get stock data, market news, and financial information using the yfinance Python library.
CrawlForge MCP
CrawlForge MCP is a production-ready MCP server with 18 web scraping tools for AI agents. It gives Claude, Cursor, and any MCP-compatible client the ability to fetch URLs, extract structured data with CSS/XPath selectors, run deep multi-step research, bypass anti-bot detection with TLS fingerprint randomization, process documents, monitor page changes, and more. Credit-based pricing with a free tier (1,000 credits/month, no credit card required).
MCP Server Collector
Discovers and collects MCP servers from the internet.
HDW MCP Server
Access and manage LinkedIn data and user accounts using the HorizonDataWave API.
Opengraph.io
Opengraph data, web scraping, screenshot features in a handy MCP tool
CodingBaby Browser
A Node.js server that enables AI assistants to control the Chrome browser via WebSocket. Requires the CodingBaby Chrome Extension.
Intelligence Aeternum (Fluora MCP)
AI training dataset marketplace — 2M+ museum artworks across 7 world-class institutions with on-demand 111-field Golden Codex AI enrichment. x402 USDC micropayments on Base L2. First monetized art/provenance MCP server. Research-backed: dense metadata improves VLM capability by +25.5% (DOI: 10.5281/zenodo.18667735)