firecrawl-scrape
作者: firecrawl
抓取一个或多个URL。返回干净、针对大语言模型优化的Markdown格式内容。多个URL会并发抓取。
npx skills add https://github.com/firecrawl/firecrawl-claude-plugin --skill firecrawl-scrapefirecrawl scrape
Scrape one or more URLs. Returns clean, LLM-optimized markdown. Multiple URLs are scraped concurrently.
When to use
- You have a specific URL and want its content
- The page is static or JS-rendered (SPA)
- Step 2 in the workflow escalation pattern: search → scrape → map → crawl → interact
Quick start
# Basic markdown extraction
firecrawl scrape "<url>" -o .firecrawl/page.md
# Main content only, no nav/footer
firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md
# Wait for JS to render, then scrape
firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md
# Multiple URLs (each saved to .firecrawl/)
firecrawl scrape https://example.com https://example.com/blog https://example.com/docs
# Get markdown and links together
firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json
# Ask a question about the page
firecrawl scrape "https://example.com/pricing" --query "What is the enterprise plan price?"
Options
| Option | Description |
|---|---|
-f, --format <formats> | Output formats: markdown, html, rawHtml, links, screenshot, json |
-Q, --query <prompt> | Ask a question about the page content (5 credits) |
-H | Include HTTP headers in output |
--only-main-content | Strip nav, footer, sidebar — main content only |
--wait-for <ms> | Wait for JS rendering before scraping |
--include-tags <tags> | Only include these HTML tags |
--exclude-tags <tags> | Exclude these HTML tags |
--redact-pii | Redact personally identifiable information from output |
-o, --output <path> | Output file path |
Tips
- Prefer plain scrape over
--query. Scrape to a file, then usegrep,head, or read the markdown directly — you can search and reason over the full content yourself. Use--queryonly when you want a single targeted answer without saving the page (costs 5 extra credits). - Try scrape before interact. Scrape handles static pages and JS-rendered SPAs. Only escalate to
interactwhen you need interaction (clicks, form fills, pagination). - Multiple URLs are scraped concurrently — check
firecrawl --statusfor your concurrency limit. - Single format outputs raw content. Multiple formats (e.g.,
--format markdown,links) output JSON. - Always quote URLs — shell interprets
?and&as special characters. - Naming convention:
.firecrawl/{site}-{path}.md
See also
- firecrawl-search — find pages when you don't have a URL
- firecrawl-interact — when scrape can't get the content, use
interactto click, fill forms, etc. - firecrawl-download — bulk download an entire site to local files
来自 firecrawl 的更多技能
oracle
firecrawl
使用oracle CLI的最佳实践(提示词与文件打包、引擎、会话及文件附件模式)。
official
firecrawl-monitor
firecrawl
检测网站内容变化,并通过webhook或邮件接收通知——无需cron任务、爬虫或差异脚本。当用户想要追踪页面变化、监控竞争对手定价、在新职位或博客发布时接收提醒、监测文档/更新日志/状态页面,或说出“监控”、“关注”、“追踪”、“当……时提醒我”、“当X变化时通知我”、“如果……请通知我”、“当……时发邮件给我”或“当……时发送webhook”时,使用此技能。内置AI判断器会过滤掉格式、时间戳和……
officialweb-scrapingresearch
firecrawl-deep-research
firecrawl
使用 Firecrawl 进行多源深度研究。当用户要求研究某个主题、比较不同观点、生成带来源的简报、调查技术或市场问题,或综合多个来源的网络证据时使用。
officialresearchweb-scraping
firecrawl-research-papers
firecrawl
使用Firecrawl查找并综合研究论文、白皮书、PDF文件、技术报告及学术来源。适用于用户需要文献综述、论文摘要、研究现状分析,或从PDF及学术/行业出版物中获取有来源的综合内容时。
officialresearchweb-scraping
firecrawl-market-research
firecrawl
使用Firecrawl提取市场、财务、收益、行业和公司指标。当用户询问市场研究、行业趋势、上市公司数据、财务比较、收益研究或结构化市场报告时使用。
officialresearchweb-scraping
firecrawl-website-design-clone
firecrawl
使用 Firecrawl 抓取证据,将任意网站的设计系统提取为可供智能体使用的 DESIGN.md 文件。当用户需要从网站获取颜色、字体、间距、组件、布局模式或品牌/UI 指导,以便 AI 智能体创建新网站、克隆外观或受该设计启发构建页面时使用。
officialdesignweb-scraping
firecrawl-knowledge-base
firecrawl
使用Firecrawl从网页内容构建知识库。适用于本地参考文档、RAG就绪文本块、微调数据集、文档镜像、主题语料库,或从网页来源整理的LLM就绪Markdown。
officialweb-scrapingresearch
firecrawl-lead-research
firecrawl
使用Firecrawl生成会前潜在客户情报简报。适用于用户在销售通话、合作会议、投资者对话或客户访谈前需要公司调研、人物调研、最新动态、谈话要点、痛点分析或外联准备时。
officialresearchweb-scraping