firecrawl-scrape

bởi firecrawl

Trích xuất markdown sạch từ bất kỳ URL nào, bao gồm cả các ứng dụng trang đơn được render bằng JavaScript. Xử lý cả trang tĩnh và SPA render bằng JS với thời gian chờ có thể cấu hình để render. Hỗ trợ trích xuất nhiều URL đồng thời với các tùy chọn định dạng đầu ra bao gồm markdown, HTML, liên kết và ảnh chụp màn hình. Bao gồm các tùy chọn lọc nội dung như chế độ chỉ lấy nội dung chính để loại bỏ thanh điều hướng và chân trang, cùng với tùy chọn bao gồm/loại trừ thẻ. Tùy chọn trả lời câu hỏi nội tuyến qua cờ --query cho các mục tiêu...

npx skills add https://github.com/firecrawl/cli --skill firecrawl-scrape

Tải ZIP GitHub

489

firecrawl scrape

Scrape one or more URLs. Returns clean, LLM-optimized markdown. Multiple URLs are scraped concurrently.

When to use

You have a specific URL and want its content
The page is static or JS-rendered (SPA)
Step 2 in the workflow escalation pattern: search → scrape → map → crawl → interact

Quick start

# Basic markdown extraction
firecrawl scrape "<url>" -o .firecrawl/page.md

# Main content only, no nav/footer
firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md

# Wait for JS to render, then scrape
firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md

# Multiple URLs (each saved to .firecrawl/)
firecrawl scrape https://example.com https://example.com/blog https://example.com/docs

# Get markdown and links together
firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json

# Ask a question about the page
firecrawl scrape "https://example.com/pricing" --query "What is the enterprise plan price?"

Options

Option	Description
`-f, --format <formats>`	Output formats: markdown, html, rawHtml, links, screenshot, json
`-Q, --query <prompt>`	Ask a question about the page content (5 credits)
`-H`	Include HTTP headers in output
`--only-main-content`	Strip nav, footer, sidebar — main content only
`--wait-for <ms>`	Wait for JS rendering before scraping
`--include-tags <tags>`	Only include these HTML tags
`--exclude-tags <tags>`	Exclude these HTML tags
`--redact-pii`	Redact personally identifiable information from output
`-o, --output <path>`	Output file path

Tips

Prefer plain scrape over --query. Scrape to a file, then use grep, head, or read the markdown directly — you can search and reason over the full content yourself. Use --query only when you want a single targeted answer without saving the page (costs 5 extra credits).
Try scrape before interact. Scrape handles static pages and JS-rendered SPAs. Only escalate to interact when you need interaction (clicks, form fills, pagination).
Multiple URLs are scraped concurrently — check firecrawl --status for your concurrency limit.
Single format outputs raw content. Multiple formats (e.g., --format markdown,links) output JSON.
Always quote URLs — shell interprets ? and & as special characters.
Naming convention: .firecrawl/{site}-{path}.md